Problem: TTS waits for the full LLM reply before speaking, adding noticeable latency. JARVIS should start talking on the first sentence while the rest still streams - the way the films feel.
Where:
- Backend TTS:
jarvis/plugins/voice_tools_optional.py (Piper / Riva / Edge tiers)
- WS streaming:
jarvis/server/ws.py
- Frontend playback:
web/src/lib/voice.ts
Approach: Chunk the assistant stream on sentence boundaries (. ! ? / newline). Synthesize + enqueue audio per sentence so playback starts after sentence #1. Keep an ordered queue.
Acceptance:
- First audio plays before generation finishes
- Sentences play in order, no overlap
- "cancel" still stops the whole queue
Difficulty: medium.
Problem: TTS waits for the full LLM reply before speaking, adding noticeable latency. JARVIS should start talking on the first sentence while the rest still streams - the way the films feel.
Where:
jarvis/plugins/voice_tools_optional.py(Piper / Riva / Edge tiers)jarvis/server/ws.pyweb/src/lib/voice.tsApproach: Chunk the assistant stream on sentence boundaries (
. ! ?/ newline). Synthesize + enqueue audio per sentence so playback starts after sentence #1. Keep an ordered queue.Acceptance:
Difficulty: medium.