Skip to content

kwindla/macos-local-voice-agents

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Local voice agents on macOS with Pipecat

screenshot

Pipecat is an open-source, vendor-neutral framework for building real-time voice (and video) AI applications.

This repository contains an example of a voice agent running with all local models on macOS. On an M-series mac, you can achieve voice-to-voice latency of <800 ms with relatively strong models.

The server/bot.py file uses these models:

  • Silero VAD
  • smart-turn v2
  • MLX Whisper
  • Gemma3n 4B
  • Kokoro TTS

But you can swap any of them out for other models, or completely reconfigure the pipeline. It's easy to add tool calling, MCP server integrations, use parallel pipelines to do async inference alongside the voice conversations, add custom processing steps, configure interrupt handling to work differently, etc.

The bot and web client here communicate using a low-latency, local, serverless WebRTC connection. For more information on serverless WebRTC, see the Pipecat SmallWebRTCTransport docs and this article. You could switch over to a different Pipecat transport (for example, a WebSocket-based transport), but WebRTC is the best choice for realtime audio.

For a deep dive into voice AI, including network transport, optimizing for latency, and notes on designing tool calling and complex workflows, see the Voice AI & Voice Agents Illustrated Guide.

Models and dependencies

Silero VAD and MLX Whisper run inside the Pipecat process. When the agent code starts, it will need to download model weights that aren't already cached, so first startup can take some time.

The LLM service in this bot uses the OpenAI-compatible chat completion HTTP API. So you will need to run a local OpenAI-compatible LLM server.

One easy, high-performance, way to run a local LLM server on macOS is LM Studio. From inside the LM Studio graphical interface, go to the "Developer" tab on the far left to start an HTTP server.

Run the voice agent

The core voice agent code lives in a single file: server/bot.py. There's one custom service here that's not included in Pipecat core: we implemented a local MLX-Audio frame processor on top of the excellent mlx-audio library.

Note that the first time you start the bot it will take some time to initialize the three models. It can be 30 seconds or more before the bot is fully ready to go. Subsequent startups will be much faster.

It's not a bad idea to run a quick mlx-audio.generate process from the command line before you run the bot the first time, so you're not waiting for a relatively bug HuggingFace model download for the voice model.

mlx-audio.generate --model "Marvis-AI/marvis-tts-250m-v0.1" --text "Hello, I'm Pipecat!" --output "output.wav"
# or
mlx-audio.generate --model "mlx-community/Kokoro-82M-bf16" --text "Hello, I'm Pipecat!" --output "output.wav"
cd server/

If you're using uv

uv run bot.py

If you're using pip

python3.12 -m venv venv
source venv/bin/activate

pip install -r requirements.txt

python bot.py

After you run the first time and have all the models cached, you can set the HF_HUB_OFFLINE environment variable to prevent the Hugging Face libraries from going to the network and checking for model updates. This makes the initial bot startup and first conversation turn a lot faster.

HF_HUB_OFFLINE=1 uv run bot.py

Start the web client

The web client is a React app. You can connect to your local macOS agent using any client that can negotiate a serverless WebRTC connection. The client in this repo is based on voice-ui-kit and just uses that library's standard debug console template.

cd client/

npm i

npm run dev

# Navigate to URL shown in terminal in your web browser

About

Pipecat voice AI agents running locally on macOS

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published