Fix streaming skip accounting and next_extraction task leak#266
Merged
gvanrossum merged 1 commit intomicrosoft:mainfrom May 3, 2026
Merged
Conversation
- Surface batch-level skips (from _filter_ingested) in ingest_email.py summary. Previously only generator-level skips were reported; the two populations are disjoint so total_skipped is now their sum. Survives ^C since on_batch_committed fires per committed batch. - Promote next_extraction to a nonlocal (pending_extraction) in add_messages_streaming so the except BaseException block can cancel it if _drain_commit raises while extraction is still in-flight. - Add 4 tests covering both cancellation paths and edge cases (empty chunks, empty iterator). Coverage for conversation_base.py: 94% → 96%.
gvanrossum
approved these changes
May 3, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Follow-up to #265. Fixes two issues identified during review:
Skip accounting gap.
ingest_email.pyonly reported generator-level skips (counters["skipped"]). Batch-level skips from_filter_ingested(result.messages_skipped) were never surfaced in the final summary. The two populations are disjoint — a source caught by the generator never reaches the batch layer. The summary now reportstotal_skipped = counters["skipped"] + counters["batch_skipped"]. Survives ^C sinceon_batch_committedfires per committed batch.next_extractiontask leak. In_submit_batch, if_drain_commit()raises afternext_extraction = asyncio.create_task(...)but before it's awaited, the task leaked. Promotednext_extractionto anonlocal(pending_extraction) so theexcept BaseExceptionblock can cancel it alongsidepending_commit.Changes
tools/ingest_email.py: Addbatch_skippedcounter, trackresult.messages_skippedinon_batch_committed, report combined total in summarysrc/typeagent/knowpro/conversation_base.py: Trackpending_extraction, cancel in except blocktests/test_add_messages_streaming.py: 4 new tests covering both cancellation paths and edge casesTest plan
make format check testpasses (701 tests)conversation_base.py: 94% → 96%pending_extractioncancelled when prior commit raises during_drain_commitpending_commitcancelled when message iterator raisestext_chunksskip extraction entirely