C++ Voice Assistant

A voice assistant using whisper.cpp for speech recognition, ollama for text processing, and espeak for text-to-speech responses.

Components

Speech Recognition: Uses whisper.cpp for fast, local speech-to-text
Processing: Uses ollama to run local LLMs with speech-optimized output
Text-to-Speech: Uses espeak for speech synthesis with natural-sounding responses

Dependencies

CMake (build system)
C++17 compatible compiler (g++ or clang++)
libcurl (for HTTP requests to ollama)
nlohmann/json (for JSON parsing)
ALSA utilities (for audio recording)
espeak (for text-to-speech)
whisper.cpp (included as a submodule)

Setup

Make the setup script executable:
```
chmod +x setup.sh
```
Run the setup script:
```
./setup.sh
```
The setup script will:
- Install required system dependencies
- Clone and build whisper.cpp
- Download whisper model
- Create a default configuration file
- Build the voice assistant
Install and start Ollama:
```
chmod +x install_ollama.sh
./install_ollama.sh
```
This script will:
- Install Ollama if it's not already installed
- Start the Ollama server if it's not already running
- Pull the model specified in your config.json

Usage

Make sure ollama is running:
```
ollama serve
```
Pull the model you want to use (if not already available):
```
ollama pull llama3
```
Run the voice assistant:
```
./build/voice_assistant
```
Speak when prompted and the assistant will respond

Configuration

The assistant can be configured by editing the config.json file, which includes settings for:

Whisper model and parameters
Ollama model and system prompt
Text-to-speech engine and voice
Audio recording settings

Example configuration:

{
  "whisper": {
    "model": "base.en",
    "executable": "./whisper.cpp/build/bin/whisper-cli",
    "params": "-l en --no-timestamps"
  },
  "ollama": {
    "model": "llama3",
    "system_prompt": "You are a helpful voice assistant. Provide concise responses.",
    "host": "http://localhost:11434"
  },
  "tts": {
    "engine": "espeak",
    "voice": "en",
    "speed": 150,
    "output_device": "default"
  },
  "audio": {
    "device": "default",
    "sample_rate": 16000,
    "duration": 5
  }
}

Command Line Options

You can pass command-line arguments:

./voice_assistant --config custom_config.json --continuous

Options:

--config: Specify a custom config file path
--continuous: Run in continuous mode (keep listening for commands)
--streaming-mode: Use real-time audio streaming instead of file-based recording
--input-device: Specify audio input device (e.g., webcam microphone)
--output-device: Specify audio output device (e.g., speakers)
--list-devices: List all available audio input and output devices
--debug: Run in debug mode with extra diagnostics
--setup: Run interactive setup to configure model, personality, and voice
--log, --enable-logging: Enable conversation logging to a file
--log-file: Specify custom log file path (default: timestamp-based filename)
--help: Show help message

Setup Options

Run the interactive setup to configure your assistant:

./build/voice_assistant --setup

This will allow you to choose:

Model - Select from various Ollama models:
- llama3
- gemma3:1b
- gemma3:4b
- gemma3:12b
Personality - Choose your assistant's role:
- Tech Co-Worker: Technical expert for software and IT help
- Personal Friend: Casual, supportive conversational companion
- Tutor: Patient explainer of complex concepts
- Life Coach: Motivational guide focused on personal growth
Voice - Select voice characteristics:
- English (US) - Male
- English (US) - Female
- English (UK) - Male
- English (UK) - Female

To install all available models at once, run:

./install_models.sh

Audio Processing Modes

The voice assistant supports two different audio processing modes:

1. File-Based Mode (Default)

In file-based mode, the assistant:

Records audio to a temporary file for a fixed duration (default: 5 seconds)
Processes the complete file with Whisper after recording finishes
Provides the most accurate transcription quality
Works best for slower, more deliberate interactions

To use file-based mode:

./build/voice_assistant

2. Streaming Mode

In streaming mode, the assistant:

Captures audio continuously in real-time
Uses Voice Activity Detection (VAD) to detect when you're speaking
Processes speech immediately when you stop talking
Provides a more natural, conversational experience
Automatically adjusts to different speech patterns
Has no fixed time limit for how long you can speak

To use streaming mode:

./build/voice_assistant --streaming-mode

Streaming mode can be configured in config.json with these parameters:

"streaming": {
  "enabled": true,
  "vad_threshold": 0.6,
  "vad_freq_threshold": 100.0,
  "min_speech_ms": 300,
  "max_silence_ms": 1000,
  "padding_ms": 500
}

These parameters control the VAD (Voice Activity Detection):

vad_threshold: Energy threshold for detecting speech (0.0-1.0)
vad_freq_threshold: Frequency threshold for speech vs. background noise
min_speech_ms: Minimum duration in ms to be considered speech
max_silence_ms: How long to wait after speech ends before processing
padding_ms: Extra audio to capture before and after speech

Voice-Optimized Responses

All responses are automatically processed to be more voice-friendly:

Removes markdown formatting, code blocks, and other text styling
Converts URLs and technical notation to speech-friendly formats
Expands common abbreviations (e.g., "e.g." becomes "for example")
Improves sentence flow by adding natural pauses
Makes numbers and special characters more speech-friendly
Converts bullet points to a more natural spoken format

Each personality is also instructed to provide responses that are:

Conversational and natural-sounding
Free of complex formatting or visual elements
Structured with complete sentences and natural pauses
Similar to how a person would speak in a real-life conversation

System Self-Awareness

The assistant has knowledge of its own components and configuration:

You can ask questions like:
- "What whisper model are you using?"
- "What LLM model are you running on?"
- "What voice are you using to speak?"
- "What are your system specifications?"
The assistant is automatically provided with:
- Whisper.cpp version information
- Ollama version information
- Current model selections
- System specifications
- Build information

This information is injected into the system prompt, allowing the assistant to answer questions about its own configuration accurately.

Using a Webcam Microphone

To use a webcam microphone:

List available devices:
```
./build/voice_assistant --list-devices
```

Look for your webcam in the list of input devices. You'll see something like:

Available audio input devices:
Name: alsa_input.usb-046d_HD_Pro_Webcam_C920_XXXXXXXX-00.analog-stereo
Description: HD Pro Webcam C920 Analog Stereo

Run the assistant with the selected input device:

./build/voice_assistant --input-device alsa_input.usb-046d_HD_Pro_Webcam_C920_XXXXXXXX-00.analog-stereo

Alternatively, set it permanently in the config.json file:

"audio": {
  "device": "alsa_input.usb-046d_HD_Pro_Webcam_C920_XXXXXXXX-00.analog-stereo",
  "sample_rate": 16000,
  "duration": 5
}

Troubleshooting

Debug Mode

If you encounter issues with the voice assistant not detecting your speech, run in debug mode:

./build/voice_assistant --debug

Debug mode will:

Run diagnostics to check your audio setup and whisper.cpp installation
Increase recording duration to give you more time to speak
Provide verbose output during recording and transcription
Play back your recorded audio for verification
Save a copy of any failed recordings for later inspection

Common Issues

"No speech detected" error: This typically means there was a problem with audio recording or the speech recognition failed. Try the following:
1. Run with --debug to run diagnostics
2. Check if your microphone is working properly:
```
arecord -d 5 test.wav && aplay test.wav
```
3. Try using a different input device:
```
./build/voice_assistant --list-devices
./build/voice_assistant --input-device DEVICE_NAME
```
4. Increase audio recording duration in config.json (for file-based mode):
```
"audio": {
  "duration": 10,
  ...
}
```
5. Try a different whisper model:
```
"whisper": {
  "model": "tiny.en",
  ...
}
```
6. Try switching to streaming mode, which can be more responsive:
```
./build/voice_assistant --streaming-mode
```
7. Adjust VAD parameters for streaming mode if speech isn't being detected properly:
```
"streaming": {
  "vad_threshold": 0.4,  # Lower threshold for more sensitivity
  "min_speech_ms": 200,  # Detect shorter speech segments
  ...
}
```
Audio recording issues: Make sure your microphone is properly connected and selected. You can specify a different device in the configuration.
Whisper.cpp errors: Make sure the models are downloaded correctly by running ./whisper.cpp/models/download-ggml-model.sh base.en.
Ollama errors: Ensure ollama is running with ollama serve and that you've pulled the model you want to use.
Build errors: If you encounter build errors, make sure all dependencies are installed correctly.

Building Manually

If you prefer to build manually:

mkdir -p build
cd build
cmake ..
make

Running Tests

To run the tests:

mkdir -p build_tests
cd build_tests
cmake .. -DBUILD_TESTING=ON
make
make run_tests

The tests check each component of the voice assistant:

Config loading and saving
Whisper STT functionality
Ollama client API interaction
TTS engine operation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

C++ Voice Assistant

Components

Dependencies

Setup

Usage

Configuration

Command Line Options

Setup Options

Audio Processing Modes

1. File-Based Mode (Default)

2. Streaming Mode

Voice-Optimized Responses

System Self-Awareness

Using a Webcam Microphone

Troubleshooting

Debug Mode

Common Issues

Building Manually

Running Tests

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
include		include
src		src
tests		tests
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
config.json		config.json
install_models.sh		install_models.sh
install_ollama.sh		install_ollama.sh
setup.sh		setup.sh

License

LeeMatthewHiggins/voice_assistant

Folders and files

Latest commit

History

Repository files navigation

C++ Voice Assistant

Components

Dependencies

Setup

Usage

Configuration

Command Line Options

Setup Options

Audio Processing Modes

1. File-Based Mode (Default)

2. Streaming Mode

Voice-Optimized Responses

System Self-Awareness

Using a Webcam Microphone

Troubleshooting

Debug Mode

Common Issues

Building Manually

Running Tests

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages