Context
Follow-up from #561. The structural interrupt-scheduling fix landed: voiceProceed({ interruptions }) now schedules AGENT pre-step + fires the barge-in via inline TTS, honoring delayRange. The user_interrupt event fires with outcome === "fired_after_speech", the cut-off-boundary transcriptTruncated label fires correctly, and the unit test (proceed-interrupt.test.ts) locks it in.
But the bundled Pipecat stub bot in python/examples/voice/_bot/bot.py generates TTS in a burst (~50 ms of wall time for a several-second reply) and streams faster than realtime. By the time adapter.interrupt() runs (even with delayRange: [0.5, 2.0]), the bot has already sent all frames. The random_interruptions demo recording ends up 10 s with 4 segments — the structural fix is correct but the bot can't demonstrate real audio truncation.
Additionally, the real judgeAgent fires inside voiceProceed and concludes success=false after one truncated agent turn (can't satisfy criteria), collapsing the demo to 4 segments regardless.
Options for the follow-up
- Re-target
random_interruptions to a realtime-streaming transport (OpenAI Realtime or Gemini Live). They support real server-side cancel that prevents late-frame delivery. Loses TS-Python parity for this specific demo (Python uses Pipecat).
- Modify the bundled Pipecat bot to stream TTS at realtime pace — requires Python edits + changes the reference implementation.
- Suppress
judgeAgent during voiceProceed — let proceed exhaust all turns before judge fires. Helps both demos.
Recommend (3) first (cheap, helps multiple demos), then (1) for the audio-cut demonstration if still needed.
Current state at #561 merge
random-interruptions.test.ts assertions encode what the Pipecat bot CAN prove (interrupt fires + canned-phrase strategy + fired_after_speech outcome + truncation label + recovery + multi-turn). The "median-shorter" assertion (added then dropped) is intentionally omitted — see commit 0b9dd1e.
- The recording (
javascript/recordings/random_interruptions/full.wav, 10 s, 4 segments) is honest about the bot's limit but thin as a demo.
gemini_live_interruption already demonstrates real mid-stream audio cut-off on a realtime transport.
Acceptance
A random_interruptions recording that shows: 30 s+ duration, 5+ segments, agent says >= 1.5 s of substantive audio before each barge-in, agent recovers with non-empty audio after barge-ins, multi-turn conversation.
Context
Follow-up from #561. The structural interrupt-scheduling fix landed:
voiceProceed({ interruptions })now schedules AGENT pre-step + fires the barge-in via inline TTS, honoringdelayRange. Theuser_interruptevent fires withoutcome === "fired_after_speech", the cut-off-boundarytranscriptTruncatedlabel fires correctly, and the unit test (proceed-interrupt.test.ts) locks it in.But the bundled Pipecat stub bot in
python/examples/voice/_bot/bot.pygenerates TTS in a burst (~50 ms of wall time for a several-second reply) and streams faster than realtime. By the timeadapter.interrupt()runs (even withdelayRange: [0.5, 2.0]), the bot has already sent all frames. Therandom_interruptionsdemo recording ends up 10 s with 4 segments — the structural fix is correct but the bot can't demonstrate real audio truncation.Additionally, the real
judgeAgentfires insidevoiceProceedand concludessuccess=falseafter one truncated agent turn (can't satisfy criteria), collapsing the demo to 4 segments regardless.Options for the follow-up
random_interruptionsto a realtime-streaming transport (OpenAI Realtime or Gemini Live). They support real server-side cancel that prevents late-frame delivery. Loses TS-Python parity for this specific demo (Python uses Pipecat).judgeAgentduringvoiceProceed— let proceed exhaust allturnsbefore judge fires. Helps both demos.Recommend (3) first (cheap, helps multiple demos), then (1) for the audio-cut demonstration if still needed.
Current state at #561 merge
random-interruptions.test.tsassertions encode what the Pipecat bot CAN prove (interrupt fires + canned-phrase strategy +fired_after_speechoutcome + truncation label + recovery + multi-turn). The "median-shorter" assertion (added then dropped) is intentionally omitted — see commit0b9dd1e.javascript/recordings/random_interruptions/full.wav, 10 s, 4 segments) is honest about the bot's limit but thin as a demo.gemini_live_interruptionalready demonstrates real mid-stream audio cut-off on a realtime transport.Acceptance
A
random_interruptionsrecording that shows: 30 s+ duration, 5+ segments, agent says >= 1.5 s of substantive audio before each barge-in, agent recovers with non-empty audio after barge-ins, multi-turn conversation.