Skip to content

Add voice output (text-to-speech) to HumanCLI for Go2 #1273

@spomichter

Description

@spomichter

Issue

Add voice output capability to HumanCLI so the agent can speak responses instead of just displaying text. Validate that it works on the Unitree Go2 platform.

Requirements

  • Implement text-to-speech using Python audio libraries (e.g., pyttsx3, gTTS, espeak, or similar)
  • Add directly to HumanCLI module (dimos/agents/cli/human.py)
  • Support audio output on Go2 (speakers/audio device)
  • Handle audio device selection/configuration
  • Test on actual Go2 hardware

Implementation Considerations

  • Use lightweight TTS that runs on-device or can call external API
  • Low latency for real-time responses
  • Voice should be clear and understandable in robot environment
  • Toggle for enabling/disabling voice output
  • Option to stream long responses instead of waiting for full synthesis

Acceptance Criteria

  • Voice output works on Go2
  • Agent responses are spoken aloud
  • Audio quality is acceptable for human understanding
  • User can toggle voice on/off
  • Works alongside text output (not exclusive)

Related

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions