Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
144 changes: 144 additions & 0 deletions vonage-ac-chatbot/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
# Vonage Chatbot (Pipecat)

A real-time voice chatbot built using **Pipecat AI** with **Vonage Audio Connector** over **WebSocket**.
This project streams caller audio to **OpenAI STT**, processes the conversation using an LLM, converts the AI's response to speech via **OpenAI TTS**, and streams it back to the caller in real time. The server exposes a WebSocket endpoint (via **VonageAudioConnectorTransport**) that the Vonage **/connect API** connects to, bridging a live session into the **OpenAI STT → LLM → TTS** pipeline.


## Table of Contents

- [How It Works](#how-it-works)
- [Features](#features)
- [Prerequisites](#prerequisites)
- [Setup](#setup)
- [Environment Configuration](#environment-configuration)
- [Run your Server Application](#run-your-server-application)
- [Testing the Chatbot](#testing-the-chatbot)

## How It Works

1. **Vonage connects to your Pipecat WebSocket server -** The /connect API creates a virtual participant and starts streaming audio frames.
2. **Parse the WebSocket messages -** Your Pipecat server reads incoming audio packets from Vonage and sets up a transport.
3. **Start the Pipecat pipeline -** Speech is transcribed by OpenAI STT
4. **The LLM generates responses -** OpenAI TTS converts the text to audio
5. **Return speech back to Vonage -** Pipecat sends audio frames back through the WebSocket, and Vonage injects them into the session in real time.

## Features

- **Real-time, bidirectional audio** using WebSockets via Vonage Audio Connector
- **OpenAI-powered pipeline:** STT → LLM → TTS
- **Silero VAD** for accurate speech-pause detection
- **Docker support** for simple deployment and isolation

## Prerequisites

- A **Vonage(Opentok) account**
- An **OpenAI API Key**
- Python **3.10+**
- `uv` package manager
- **ngrok** (or any WS tunnel) for local testing
- Docker (optional)

## Setup

1. **Set up a virtual environment and install dependencies**:

```sh
uv sync
```

2. **Create your .env file**:

```sh
cp env.example .env
```
Update .env with your credentials and session ID as mentioned in below Section.

## Environment Configuration

1. **Create an Opentok/Vonage Session and Publish a Stream**
A Session ID is required for the Audio Connector.
Note: You can use either Opentok or Vonage platform to create the session. Open the Playground (or your own app) to create a session and publish a stream.
Copy the Session ID and set it in `.env` file:
```sh
VONAGE_SESSION_ID=<paste-your-session-id-here>
```
Always use **credentials from the same project** that created the `sessionId`.

2. **Set the Keys in `.env`**
If the session was created using the OpenTok (API key + secret), set the following in your `.env`:

```sh
# OpenAI Key (no quotes)
OPENAI_API_KEY=sk-proj-xxxxxxxxxxxxxxxxxxxxxxxx

# OpenTok credentials
VONAGE_API_KEY=YOUR_API_KEY
VONAGE_API_SECRET=YOUR_API_SECRET

# Session ID created in Step 6
VONAGE_SESSION_ID=1_MX4....

# Leave blank; this is auto-filled after `/connect` API call
VONAGE_CONNECTION_ID=...

```
If the session was created using the Vonage platform (App ID + Private Key), set the following in your `.env`:

```sh
# Vonage Platform API credentials
VONAGE_APPLICATION_ID=YOUR_APPLICATION_ID
VONAGE_PRIVATE_KEY=YOUR_PRIVATE_KEY_PATH

# Session ID created in Step 6
VONAGE_SESSION_ID=1_MX4....

# Leave blank; auto-filled by client.py
VONAGE_CONNECTION_ID=...

```

3. **Install ngrok**:

Follow the instructions on the [ngrok website](https://ngrok.com/download) to download and install ngrok. You’ll use this to securely expose your local WebSocket server for testing.

4. **Start ngrok to expose the local WebSocket server**:

**Run in a separate terminal**, start ngrok to tunnel the local server:

```sh
ngrok http 8005
```

You will see something like:

```sh
Forwarding https://a5db22f57efa.ngrok-free.app -> http://localhost:8005
```

To form the **WSS** URL, replace https:// with wss://.

Example like for above Forwarding URL below is the wss:// url:

```sh
"websocket": {
"uri": "wss://a5db22f57efa.ngrok-free.app",
"audioRate": 16000,
"bidirectional": true
}
```

## Run your Server Application

You can run the server application using the command below:

```sh
uv run server.py
```
The server will start on port 8005 and wait for incoming Audio Connector connections.

## Testing the Chatbot

1. Follow the instructions in: `examples/vonage-ac-chatbot/client/README.md`.
2. Run the client program (`connect_and_stream.py`) to invoke the **/connect API**.
3. Once the connection is established, begin speaking in the Vonage Video session. Your audio will be forwarded through the Audio Connector to the Pipecat pipeline processed by OpenAI STT → LLM → TTS and the synthesized response will be sent back into the session.
4. You will hear the AI’s voice reply in real time, played back as audio from the virtual participant created by the `/connect` API.
121 changes: 121 additions & 0 deletions vonage-ac-chatbot/client/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
# Python Client for Server Testing

This Python client allows you to test the **Vonage Pipecat WebSocket server** by calling the Vonage **/connect** API. It creates a virtual Audio Connector participant inside your Vonage Video session, streams audio from the session to your Pipecat pipeline and plays back the generated response in real time.

## Setup Instructions

1. **Open the client directory in separate terminal**
You do **not** need to clone the repository again.
If you already cloned it for the server setup, simply open a new terminal and navigate to:
```sh
cd vonage-pipecat/examples/vonage-ac-chatbot/client
```

2. **Install dependencies**:
```sh
uv sync
```

3. **Create .env**:

```sh
cp env.example .env
```
This `.env` stores all configuration required by the client.

4. **Use the existing Vonage/Opentok Session from the server setup**
During the server setup, you already:
1. Created a Vonage/Opentok Video Session
2. Published a stream
3. Verified audio is flowing inside the session
The client does **not** need a new session.
The `/connect` API will attach to this existing session.
Simply copy the Session ID you used earlier into your `.env` file:
```sh
VONAGE_SESSION_ID=<paste-your-session-id-here>
```

If you are using Opentok platform, set OPENTOK_API_URL in your .env:
```sh
OPENTOK_API_URL=https://api.opentok.com
```
If you are using Vonage platform, set VONAGE_API_URL in your .env:
```sh
VONAGE_API_URL=api.vonage.com
```

**Note:** Ensure you use the **credentials** from the **same project** that created this session.

5. **Configure credentials and WebSocket settings in `.env`**
If you created the session in Opentok platform, set the following in your `.env`:
```sh
# OpenTok credentials
VONAGE_API_KEY=YOUR_API_KEY
VONAGE_API_SECRET=YOUR_API_SECRET

# WebSocket URL of your Pipecat server (ngrok or production)
WS_URI=wss://<your-ngrok-domain>

# Session ID from Step 5
VONAGE_SESSION_ID=1_MX4....

# API base
OPENTOK_API_URL=https://api.opentok.com

# Leave blank — this is auto-filled after `/connect` API call
VONAGE_CONNECTION_ID=

# Keep rest as same.
```
If you created the session in Vonage platform, set the following in your `.env`:

```sh
# Vonage SDK credentials
VONAGE_APPLICATION_ID=YOUR_APPLICATION_ID
VONAGE_PRIVATE_KEY=YOUR_PRIVATE_KEY_PATH

# Websocket URL of your Pipecat Server (ngrok or production)
WS_URI=wss://<your-ngrok-domain>

# Session ID from Step 5
VONAGE_SESSION_ID=1_MX4....

# API base
VONAGE_API_URL=api.vonage.com

# Leave blank — this is auto-filled after `/connect` API call
VONAGE_CONNECTION_ID=

# Keep rest as same.
```

6. **Ensure your Pipecat WebSocket Server is running**:
Before running the client, ensure Websocket Server is running. The client cannot connect unless the WebSocket endpoint is reachable.

7. **Run the Client**:
The client triggers the `/connect` API → Vonage creates an Audio Connector → audio begins flowing.
If using the Opentok API Key + Secret, run:
```sh
uv run connect_and_stream.py
```

If using Vonage Application ID + Private Key, run:
```sh
uv run connect_and_stream_vonage.py
```
When successful:
1) `VONAGE_CONNECTION_ID` is automatically added to `.env`.
2) The caller's audio is streamed into Pipecat
3) The AI-generated TTS response is injected back into the session

**Overriding `.env` Values (Optional)**
The script reads everything from .env via os.getenv().
You can still override via flags if you want, e.g.:

```sh
# Example
uv run connect_and_stream.py --ws-uri wss://my-ngrok/ws --audio-rate 16000

# OR
uv run connect_and_stream_vonage.py --ws-uri wss://my-ngrok/ws --audio-rate 16000
```
Loading