Low latency voice control for Home Assistant using ESP32 devices and LLM streaming.
A voice gateway that enables true realtime, conversational voice control for Home Assistant. Instead of the traditional STT β Conversation β TTS pipeline, this streams audio directly between your ESP32 device and an LLM (Gemini, OpenAI, etc.), providing:
- β‘ Low latency - Instant responses, feels like talking to a person
- ποΈ Full-duplex audio - Interrupt the assistant anytime (barge-in)
- π€ Smart home control - LLM can call Home Assistant services as tools
- π ESP32 Voice devices - Works with modified streaming firmware
- π Automatic discovery - No manual tool configuration needed
Example conversation:
You: "Turn on the living room lights"
Assistant: [responds and turns on lights simultaneously]
You: "Actually, make them dimmer" β Can interrupt mid-response!
Assistant: [adjusts brightness while responding]
- Natural conversations with instant responses and barge-in support
- Auto-discovers all Home Assistant devices and services
- Test without hardware using laptop mic/speakers
- Works with ESP32 devices running modified streaming firmware
- Pluggable backends (Gemini, OpenAI, Anthropic, local models)
- Event-driven architecture for realtime audio + tool calling
- Production-ready with retry logic, timeouts, and error handling
- Comprehensive test suite (100+ tests)
- Domain allow-lists and entity deny-lists
- Service filtering - control exactly what LLM can do
- Audit logging - track all actions
- Configurable timeouts - safety controls
[Your ESP32 Device]
β
β WebSocket (ws://gateway:8080/voice-stream)
β Raw PCM Audio + JSON control
β
[This Gateway]
ββ> [Gemini/OpenAI/etc.] β Streaming LLM
ββ> [Home Assistant API] β Tool execution (lights, switches, etc.)
The gateway:
- Accepts WebSocket connections from ESP32 devices
- Streams audio to/from LLM backend (Gemini, OpenAI, etc.)
- Manages conversation state (listening β thinking β speaking β done)
- Executes Home Assistant tool calls
- Handles interruptions and full-duplex conversations
- Home Assistant with Long-Lived Access Token
- ESP32 Voice device with modified streaming firmware
- Or use laptop mic/speakers for testing
- LLM API Key:
- Gemini API Key β Free tier available!
- OpenAI API Key (coming soon)
- Docker (recommended) or Go 1.25+
- You have ESP32 devices with modified streaming firmware
- You want ultra-low latency conversations (<500ms)
- You want full-duplex, interruptible conversations (barge-in)
- You want LLM to directly control HA without going through HA's pipeline
# 1. Create docker-compose.yml
cat > docker-compose.yml << 'EOF'
version: '3.8'
services:
gateway:
image: ghcr.io/rw4lll/ha-realtime-voice-gateway:latest
container_name: ha-voice-gateway
restart: unless-stopped
environment:
# Home Assistant
- HA_URL=http://homeassistant:8123
- HA_TOKEN=your_long_lived_token_here
# LLM Backend (Gemini example)
- BACKEND_TYPE=gemini
- GEMINI_API_KEY=your_gemini_api_key_here
# WebSocket Server (optional customization)
- WEBSOCKET_ADDR=0.0.0.0:8080
- WEBSOCKET_PATH=/voice-stream
ports:
- "8080:8080"
networks:
- homeassistant
EOF
# 2. Edit credentials
nano docker-compose.yml
# 3. Start gateway
docker-compose up -d
# 4. Check logs
docker-compose logs -f# 1. Clone repository
git clone https://github.com/rw4lll/ha-realtime-voice-gateway.git
cd ha-realtime-voice-gateway
# 2. Create .env file
cat > .env << 'EOF'
# Home Assistant
HA_URL=http://homeassistant:8123
HA_TOKEN=your_long_lived_token_here
# Backend (Gemini)
BACKEND_TYPE=gemini
GEMINI_API_KEY=your_api_key_here
# WebSocket Server (optional customization)
WEBSOCKET_ADDR=0.0.0.0:8080
WEBSOCKET_PATH=/voice-stream
EOF
# 3. Build and run
go build -o gateway ./cmd/gateway
./gatewayYou should see:
INFO Gateway ready websocket_addr=0.0.0.0:8080
INFO WebSocket server listening. Waiting for device connections...
Follow the instructions at ESP32 Streaming Firmware Guide
In the device settings, set the gateway URL:
ws://YOUR_GATEWAY_IP:8080/voice-stream
Example: ws://192.168.1.100:8080/voice-stream
Press the button 4 times to switch from standard HA Voice Preview mode to direct streaming mode.
The device will:
- Send your voice to the gateway via WebSocket
- Receive JSON state updates (
listening,thinking,speaking,done) - Play AI responses in real-time
# Home Assistant
HA_URL=http://homeassistant:8123 # Your HA URL
HA_TOKEN=your_long_lived_token # Settings β Profile β Long-Lived Access Tokens
# Backend
BACKEND_TYPE=gemini # gemini | openai | mock
GEMINI_API_KEY=your_key # Get from https://aistudio.google.com/apikey
# WebSocket Server (always enabled)
WEBSOCKET_ADDR=0.0.0.0:8080 # Listen on all interfaces
WEBSOCKET_PATH=/voice-stream # WebSocket endpoint# Limit which domains the LLM can control
HA_AUTODISCOVERY_DOMAINS=light,switch,climate
# Deny specific entities
HA_AUTODISCOVERY_DENIED=lock.*,alarm_control_panel.*
# Restrict to specific services
HA_ALLOW_LIST=light.turn_on,light.turn_off,switch.toggle# Session timeouts
SESSION_SAFETY_TIMEOUT=5m # Max conversation duration
SESSION_SILENCE_TIMEOUT=3s # End after 3s of silence
SESSION_AUDIO_BUFFER_MS=500 # Audio buffering before playbackSee docs/CONFIGURATION.md for all options.
# Check logs
docker-compose logs -f gateway
# Common issues:
# - Invalid HA_TOKEN β Check Settings β Profile β Long-Lived Access Tokens
# - Can't reach Home Assistant β Use http://homeassistant:8123 in Docker network
# - Invalid GEMINI_API_KEY β Get new key from https://aistudio.google.com/apikey# Verify gateway is listening
curl http://YOUR_GATEWAY_IP:8080/voice-stream
# Should return "Upgrade Required" (WebSocket endpoint)
# Check device firmware:
# - Is streaming mode enabled? (4-click button)
# - Is gateway URL correct? ws://IP:8080/voice-stream
# - Is device on same network?# Check autodiscovery
docker-compose logs gateway | grep "tools_discovered"
# Should show: "Autodiscovery completed tools_discovered=45"
# Verify HA connection
docker-compose logs gateway | grep "Home Assistant"
# Should show: "Successfully connected to Home Assistant"See docs/TROUBLESHOOTING.md for more help.
You can test the gateway using your laptop's mic/speakers:
cd test/
pip install -r requirements.txt
python3 audio_bridge.pySpeak into your laptop mic - you'll hear the LLM respond!
See test/README.md for complete testing guide.
- Configuration Guide - All settings explained
- Audio Resampling Config - Configure 24kHzβ16kHz conversion
- Troubleshooting - Common issues and fixes
- Security Guide - Securing your gateway
- Architecture - How it works internally
- Audio Resampling - Technical deep dive on sample rate conversion
- Audio Formats Reference - Format specs and calculations
- WebSocket Protocol - Device communication spec
- Testing Resampling - How to test audio resampling
- Test Script Guide - Using audio_bridge.py
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Home Assistant Forum: Community Thread
| Backend | Status | Latency | Free Tier | Notes |
|---|---|---|---|---|
| Gemini Live | β Production | ~200ms | β Yes | β¨ Recommended - Free tier available! |
| OpenAI Realtime | π§ Coming soon | ~250ms | β No | Higher quality, paid only |
| Mock | β Testing | N/A | β Yes | For development/testing |
Apache License 2.0 - See LICENSE for details.
- ESP32 Firmware: Modified streaming firmware
- Home Assistant: Smart home platform
- Gemini API: LLM backend
- Wyoming Protocol: Original inspiration (now replaced with WebSocket)
Status: β Ready for Use
Works with ESP32 devices running modified streaming firmware. Supports Gemini backend (free tier available). OpenAI support coming soon.
Get Started: Jump to Quick Start β¬οΈ