-
Notifications
You must be signed in to change notification settings - Fork 1.6k
feat: add Vonage Audio Connector integration (serializer, transport, foundational example) #3111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: add Vonage Audio Connector integration (serializer, transport, foundational example) #3111
Conversation
|
Hi @jamsea |
|
I’ve just pushed a follow-up commit to switch the foundational example from the dev OpenTok API URL to the production https://api.opentok.com. |
|
Hi @markbackman and @filipi87 Can you please find sometime to review this PR. |
|
Sorry for the delay. We're backlogged on PR reviews. I took a quick look at this and think it's a good plan to split it up. First, can you create a PR for only the That's a big enough change to add and test that I think we should start there. It will also help developers get started right away as they can easily test and run the example. WDYT? The |
Hi @markbackman thank you so much for your initial review comments. Please find the reasons to keep the transport + foundational example along with VonageFrameSerializer:
Happy to iterate further, but keeping these three components together ensures the reviewer can run and validate the integration immediately. Additionally, today I created two PRs in the pipecat-examples repository:
|
…foundational example) - Added foundational example: 49-vonage-audio-connector-openai.py - Added VonageFrameSerializer under src/pipecat/serializers/vonage.py - Added AudioConnectorTransport under src/pipecat/transports/vonage/audio_connector.py - Added new package folder src/pipecat/transports/vonage with __init__.py - Updated env.example - Updated pyproject.toml and uv.lock
d21b127 to
bca3f0e
Compare
|
Hi @markbackman and @filipi87 I’ve rebased the feature branch onto the latest To try it out, install the optional dependencies and run it the same way as other foundational examples:
Please ensure the required OpenAI and Vonage environment variables are set (via
to obtain the wss URL and set it in the Vonage-related environment variables. Thanks for taking a look! |
|
Sorry for the delay on this review. It's been a busy week! I kept thinking about your proposal and really wanted to avoid adding a new transport. Instead, I spent a little bit of time looking at how to implement this within the existing It adds a new mode for handling text and binary messages to the FastAPIWebsocketTransport. It also adds a new VonageFrameSerializer. I'd propose this: let's work on PR #3265 and get the core of this work implemented. I see you have more features for the serializer in your PR. Once 3265 is merged, you can follow up with a PR to add auto hangup and any other desired features to the serializer. Does that make sense? Also, we don't need the foundational example. We do need a pipecat-example for this. In building this out myself, I wrote the inbound example: I'd love feedback on it. Also, we'll need an outbound example, which I'm happy to have you contribute. How does this all sound? |
Hi @markbackman , thanks a lot for taking the time to review this and for putting together PR #3265 — I really appreciate the effort and the direction you’re proposing. Before getting into next steps, I just want to briefly clarify one point to avoid any lingering confusion. Although we reference Vonage Video APIs, in this integration we never stream video over WebSockets. The setup uses Vonage Video Audio Connector, which streams audio only from an existing WebRTC session to a server-side WebSocket endpoint. The /connect API, which is called by our client application in the examples PR is just to simply instructs the Vonage platform to forward the audio of one or more participants from an existing WebRTC session to the WebSocket server further where the Pipecat pipeline run. This is a documented and supported Vonage pattern: https://developer.vonage.com/en/video/guides/audio-connector . So the flow is strictly server-to-server and it is always audio packets only . In this setup, a FrameSerializer (and transport) is required to correctly handle the audio framing and our timing requirements. I’d appreciate it if you could review this again with this context in mind, and I’m happy to adjust or iterate further if needed. |
Summary
This PR introduces the Vonage Audio Connector integration including a custom serializer, the
VonageAudioConnectorTransport+VonageAudioConnectorOutputTransportand a foundational example.Changes
examples/foundational/50-vonage-audio-connector-openai.pyVonageFrameSerializerundersrc/pipecat/serializers/vonage.pyVonageAudioConnectorTransportandVonageAudioConnectorOutputTransportundersrc/pipecat/transports/vonage/audio_connector.pysrc/pipecat/transports/vonage/with__init__.pyenv.examplepyproject.tomlanduv.lockWhy This Is Needed
This integration enables Pipecat to work with the Vonage Voice API Audio Connector supporting real-time STT → LLM → TTS pipelines and will be used to expand the ecosystem of community-maintained integrations.
Testing