Skip to content

Conversation

@varunps2003
Copy link

@varunps2003 varunps2003 commented Nov 21, 2025

Summary

This PR introduces the Vonage Audio Connector integration including a custom serializer, the VonageAudioConnectorTransport + VonageAudioConnectorOutputTransport and a foundational example.

Changes

  • Added foundational example: examples/foundational/50-vonage-audio-connector-openai.py
  • Added VonageFrameSerializer under src/pipecat/serializers/vonage.py
  • Added VonageAudioConnectorTransport and VonageAudioConnectorOutputTransport under src/pipecat/transports/vonage/audio_connector.py
  • Added new package folder src/pipecat/transports/vonage/ with __init__.py
  • Updated env.example
  • Updated pyproject.toml and uv.lock

Why This Is Needed

This integration enables Pipecat to work with the Vonage Voice API Audio Connector supporting real-time STT → LLM → TTS pipelines and will be used to expand the ecosystem of community-maintained integrations.

Testing

  • Basic end-to-end pipeline validated (audio in → STT → LLM → TTS → audio out)
  • Serializer and transport tested for encoding/decoding correctness
  • Verified pacing behavior (sleep-per-chunk timing) matches Vonage Audio Connector requirements
  • Confirmed WAV-header wrapping when enabled

@varunps2003
Copy link
Author

Hi @jamsea
I’ve created the PR for the Vonage Audio Connector integration (serializer, transport, foundational example).
Please take a look whenever you get a chance — happy to make any changes needed. Thanks!

@varunps2003
Copy link
Author

I’ve just pushed a follow-up commit to switch the foundational example from the dev OpenTok API URL to the production https://api.opentok.com.

@varunps2003
Copy link
Author

Hi @markbackman and @filipi87 Can you please find sometime to review this PR.

@markbackman
Copy link
Contributor

markbackman commented Dec 5, 2025

Sorry for the delay. We're backlogged on PR reviews. I took a quick look at this and think it's a good plan to split it up. First, can you create a PR for only the VonageFrameSerializer? Along with this, it would be helpful to submit an example for pipecat-examples showing how to dial-in and dial-out. This would be similar to the examples that exist for Twilio, Telnyx, Plivo, and Exotel.

That's a big enough change to add and test that I think we should start there. It will also help developers get started right away as they can easily test and run the example. WDYT?


The VonageFrameSerializer should be written to work with the FastAPIWebsocketTransport. Is there a reason to add a new websocket transport to work specifically with the VonageFrameSerializer?

@varunps2003
Copy link
Author

varunps2003 commented Dec 11, 2025

Sorry for the delay. We're backlogged on PR reviews. I took a quick look at this and think it's a good plan to split it up. First, can you create a PR for only the VonageFrameSerializer? Along with this, it would be helpful to submit an example for pipecat-examples showing how to dial-in and dial-out. This would be similar to the examples that exist for Twilio, Telnyx, Plivo, and Exotel.

That's a big enough change to add and test that I think we should start there. It will also help developers get started right away as they can easily test and run the example. WDYT?

The VonageFrameSerializer should be written to work with the FastAPIWebsocketTransport. Is there a reason to add a new websocket transport to work specifically with the VonageFrameSerializer?

Hi @markbackman thank you so much for your initial review comments. Please find the reasons to keep the transport + foundational example along with VonageFrameSerializer:

  1. Regarding splitting the PR — in this case the VonageFrameSerializer cannot be meaningfully reviewed or tested on its own. It requires the accompanying Vonage-specific WebSocket transport and the foundational example. All three pieces form a single atomic unit:
    a) The serializer and transport are tightly coupled because the Vonage Audio Connector expects specific binary framing, sequencing, and pacing.
    b) Without the transport, the serializer cannot be executed.
    c) Without the example, there’s no runnable validation for reviewers.
  2. If you check out this branch, everything works end-to-end with the current serializer + transport + example. Splitting them would make the serializer untestable in isolation and make the PR harder to validate.
  3. On the dial-in/dial-out point — Vonage’s workflow differs from Twilio/Telnyx/Plivo/Exotel, so the foundational example here is the correct equivalent for Vonage. It demonstrates the Audio Connector flow as the intended usage pattern.
  4. Regarding FastAPIWebsocketTransport: the Vonage Audio Connector requires low-level binary frame control (opcodes, sequence numbers, 20 ms chunk pacing), which the existing transport doesn’t expose. The custom transport keeps this logic isolated without modifying core transports.

Happy to iterate further, but keeping these three components together ensures the reviewer can run and validate the integration immediately.

Additionally, today I created two PRs in the pipecat-examples repository:

  1. Add vonage-ac-chatbot example pipecat-examples#129
  2. Add vonage-ac-s2s example pipecat-examples#130
    These examples require the vonage-audio-connector dependency. The dependency itself is added in the Pipecat main repository, and this current PR defines it in the pyproject.toml, which the examples rely on.

…foundational example)

- Added foundational example: 49-vonage-audio-connector-openai.py
- Added VonageFrameSerializer under src/pipecat/serializers/vonage.py
- Added AudioConnectorTransport under src/pipecat/transports/vonage/audio_connector.py
- Added new package folder src/pipecat/transports/vonage with __init__.py
- Updated env.example
- Updated pyproject.toml and uv.lock
@varunps2003 varunps2003 force-pushed the feature/vonage-audio-connector branch from d21b127 to bca3f0e Compare December 16, 2025 14:26
@varunps2003
Copy link
Author

Hi @markbackman and @filipi87

I’ve rebased the feature branch onto the latest main to resolve conflicts and verify the changes against the current Pipecat codebase.
I also renumbered the foundational example from 49-* to 50-*, since 49 was already in use.

To try it out, install the optional dependencies and run it the same way as other foundational examples:

uv run examples/foundational/50-vonage-audio-connector-openai.py

Please ensure the required OpenAI and Vonage environment variables are set (via .env).
If running locally, you can use:

ngrok http 8005

to obtain the wss URL and set it in the Vonage-related environment variables.

Thanks for taking a look!

@markbackman
Copy link
Contributor

markbackman commented Dec 19, 2025

Sorry for the delay on this review. It's been a busy week!

I kept thinking about your proposal and really wanted to avoid adding a new transport. Instead, I spent a little bit of time looking at how to implement this within the existing FastAPIWebsocketTransport constraints. Check out this PR:
#3265

It adds a new mode for handling text and binary messages to the FastAPIWebsocketTransport. It also adds a new VonageFrameSerializer.

I'd propose this: let's work on PR #3265 and get the core of this work implemented. I see you have more features for the serializer in your PR. Once 3265 is merged, you can follow up with a PR to add auto hangup and any other desired features to the serializer. Does that make sense?

Also, we don't need the foundational example. We do need a pipecat-example for this. In building this out myself, I wrote the inbound example:
pipecat-ai/pipecat-examples#133

I'd love feedback on it. Also, we'll need an outbound example, which I'm happy to have you contribute.

How does this all sound?

@varunps2003
Copy link
Author

Sorry for the delay on this review. It's been a busy week!

I kept thinking about your proposal and really wanted to avoid adding a new transport. Instead, I spent a little bit of time looking at how to implement this within the existing FastAPIWebsocketTransport constraints. Check out this PR: #3265

It adds a new mode for handling text and binary messages to the FastAPIWebsocketTransport. It also adds a new VonageFrameSerializer.

I'd propose this: let's work on PR #3265 and get the core of this work implemented. I see you have more features for the serializer in your PR. Once 3265 is merged, you can follow up with a PR to add auto hangup and any other desired features to the serializer. Does that make sense?

Also, we don't need the foundational example. We do need a pipecat-example for this. In building this out myself, I wrote the inbound example: pipecat-ai/pipecat-examples#133

I'd love feedback on it. Also, we'll need an outbound example, which I'm happy to have you contribute.

How does this all sound?

Hi @markbackman , thanks a lot for taking the time to review this and for putting together PR #3265 — I really appreciate the effort and the direction you’re proposing.

Before getting into next steps, I just want to briefly clarify one point to avoid any lingering confusion. Although we reference Vonage Video APIs, in this integration we never stream video over WebSockets. The setup uses Vonage Video Audio Connector, which streams audio only from an existing WebRTC session to a server-side WebSocket endpoint. The /connect API, which is called by our client application in the examples PR is just to simply instructs the Vonage platform to forward the audio of one or more participants from an existing WebRTC session to the WebSocket server further where the Pipecat pipeline run. This is a documented and supported Vonage pattern: https://developer.vonage.com/en/video/guides/audio-connector .

So the flow is strictly server-to-server and it is always audio packets only . In this setup, a FrameSerializer (and transport) is required to correctly handle the audio framing and our timing requirements.

I’d appreciate it if you could review this again with this context in mind, and I’m happy to adjust or iterate further if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants