-
Notifications
You must be signed in to change notification settings - Fork 13
Open
Description
Issue
Add voice output capability to HumanCLI so the agent can speak responses instead of just displaying text. Validate that it works on the Unitree Go2 platform.
Requirements
- Implement text-to-speech using Python audio libraries (e.g.,
pyttsx3,gTTS,espeak, or similar) - Add directly to HumanCLI module (
dimos/agents/cli/human.py) - Support audio output on Go2 (speakers/audio device)
- Handle audio device selection/configuration
- Test on actual Go2 hardware
Implementation Considerations
- Use lightweight TTS that runs on-device or can call external API
- Low latency for real-time responses
- Voice should be clear and understandable in robot environment
- Toggle for enabling/disabling voice output
- Option to stream long responses instead of waiting for full synthesis
Acceptance Criteria
- Voice output works on Go2
- Agent responses are spoken aloud
- Audio quality is acceptable for human understanding
- User can toggle voice on/off
- Works alongside text output (not exclusive)
Related
- Pair with: Voice input (STT) issue (Add voice input (speech-to-text) to HumanCLI for Go2 #1272)
Reactions are currently unavailable