Skip to content

Streaming TTS - speak as tokens arrive, not after the full reply #16

@Dix01

Description

@Dix01

Problem: TTS waits for the full LLM reply before speaking, adding noticeable latency. JARVIS should start talking on the first sentence while the rest still streams - the way the films feel.

Where:

  • Backend TTS: jarvis/plugins/voice_tools_optional.py (Piper / Riva / Edge tiers)
  • WS streaming: jarvis/server/ws.py
  • Frontend playback: web/src/lib/voice.ts

Approach: Chunk the assistant stream on sentence boundaries (. ! ? / newline). Synthesize + enqueue audio per sentence so playback starts after sentence #1. Keep an ordered queue.

Acceptance:

  • First audio plays before generation finishes
  • Sentences play in order, no overlap
  • "cancel" still stops the whole queue

Difficulty: medium.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions