Skip to content

Fix streaming skip accounting and next_extraction task leak#266

Merged
gvanrossum merged 1 commit intomicrosoft:mainfrom
KRRT7:fix/streaming-skip-accounting-and-task-leak
May 3, 2026
Merged

Fix streaming skip accounting and next_extraction task leak#266
gvanrossum merged 1 commit intomicrosoft:mainfrom
KRRT7:fix/streaming-skip-accounting-and-task-leak

Conversation

@KRRT7
Copy link
Copy Markdown
Contributor

@KRRT7 KRRT7 commented May 3, 2026

Summary

Follow-up to #265. Fixes two issues identified during review:

  • Skip accounting gap. ingest_email.py only reported generator-level skips (counters["skipped"]). Batch-level skips from _filter_ingested (result.messages_skipped) were never surfaced in the final summary. The two populations are disjoint — a source caught by the generator never reaches the batch layer. The summary now reports total_skipped = counters["skipped"] + counters["batch_skipped"]. Survives ^C since on_batch_committed fires per committed batch.

  • next_extraction task leak. In _submit_batch, if _drain_commit() raises after next_extraction = asyncio.create_task(...) but before it's awaited, the task leaked. Promoted next_extraction to a nonlocal (pending_extraction) so the except BaseException block can cancel it alongside pending_commit.

Changes

  • tools/ingest_email.py: Add batch_skipped counter, track result.messages_skipped in on_batch_committed, report combined total in summary
  • src/typeagent/knowpro/conversation_base.py: Track pending_extraction, cancel in except block
  • tests/test_add_messages_streaming.py: 4 new tests covering both cancellation paths and edge cases

Test plan

  • make format check test passes (701 tests)
  • Coverage for conversation_base.py: 94% → 96%
  • New test: pending_extraction cancelled when prior commit raises during _drain_commit
  • New test: pending_commit cancelled when message iterator raises
  • New test: empty iterator returns zeros
  • New test: messages with empty text_chunks skip extraction entirely

- Surface batch-level skips (from _filter_ingested) in ingest_email.py
  summary. Previously only generator-level skips were reported; the two
  populations are disjoint so total_skipped is now their sum. Survives
  ^C since on_batch_committed fires per committed batch.

- Promote next_extraction to a nonlocal (pending_extraction) in
  add_messages_streaming so the except BaseException block can cancel it
  if _drain_commit raises while extraction is still in-flight.

- Add 4 tests covering both cancellation paths and edge cases (empty
  chunks, empty iterator). Coverage for conversation_base.py: 94% → 96%.
@gvanrossum gvanrossum merged commit 7d66982 into microsoft:main May 3, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants