You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Once voice agents land in the Scenario SDK (#355 against #350), audio bytes are currently emitted inline as base64 inside input_audio.data in trace events. Switch to: upload the audio to LangWatch's audio-asset endpoint (presigned PUT) and emit a stable audio_ref content part in the trace event instead.
This implements the SDK-side of the contract defined in langwatch/langwatch#3964.
When a voice scenario produces audio (TTS output, captured user simulator audio, recorded responses), Scenario SDK calls the LangWatch backend at POST /api/audio-assets/presigned-put to mint a URL, PUTs the bytes, and emits the audio_ref content part in the resulting trace event.
Behind a flag (default on) so users can revert to inline base64 if LangWatch ingest is unreachable. Flag name something like LANGWATCH_AUDIO_INLINE=true to force-fallback.
Both directions: user simulator audio AND agent audio responses both get offloaded.
Streaming: if the audio is being captured chunk-by-chunk (Pipecat, LiveKit, OpenAI Realtime), buffer + upload once per logical utterance, not per chunk. One ref per utterance.
Failure mode: if presigned PUT fails (network, auth), fall back to inline base64 emission with a warning log.
Tests: unit test for the upload path with a mocked LangWatch endpoint; integration test against staging LangWatch if feasible.
Out of scope
TS SDK parity for the Scenario adapters — that's covered in langwatch/scenario#372 (Voice TS SDK parity) and depends on this issue landing first.
Hard dep on langwatch/langwatch#3964 for the endpoint and ref shape — but can develop against fixtures once #3964's PR description spec is published.
Why this matters
Without ref-based emission, every voice scenario will either (a) bloat trace payloads with megabytes of base64 WAV, or (b) hit payload caps and ship truncated audio. The LangWatch backend already supports audio asset storage (#3964); this issue is the consumer-side change that makes voice scenarios actually usable for replay/debug.
Part of langwatch/langwatch#1727 (audio-storage epic). Related: langwatch/langwatch#3552 (current inline player), langwatch/scenario#370 (voice epic), langwatch/scenario#350 (voice foundation).
Summary
Once voice agents land in the Scenario SDK (#355 against #350), audio bytes are currently emitted inline as base64 inside
input_audio.datain trace events. Switch to: upload the audio to LangWatch's audio-asset endpoint (presigned PUT) and emit a stableaudio_refcontent part in the trace event instead.This implements the SDK-side of the contract defined in
langwatch/langwatch#3964.Reference shape to emit
Replace:
{ "type": "input_audio", "input_audio": {"data": "<base64 wav>", "format": "wav"} }with:
{ "type": "audio_ref", "key": "audio/<projectId>/<traceId>/<uuid>.wav", "contentType": "audio/wav", "durationMs": 4320, "format": "wav" }Acceptance Criteria
POST /api/audio-assets/presigned-putto mint a URL, PUTs the bytes, and emits theaudio_refcontent part in the resulting trace event.LANGWATCH_AUDIO_INLINE=trueto force-fallback.Out of scope
langwatch/scenario#372(Voice TS SDK parity) and depends on this issue landing first.AudioChunkformat (still PCM16 @ 24kHz mono per Voice Agents #370 design lock Add Mintlify documentation #1).Dependencies
langwatch/langwatch#3964for the endpoint and ref shape — but can develop against fixtures once #3964's PR description spec is published.Why this matters
Without ref-based emission, every voice scenario will either (a) bloat trace payloads with megabytes of base64 WAV, or (b) hit payload caps and ship truncated audio. The LangWatch backend already supports audio asset storage (#3964); this issue is the consumer-side change that makes voice scenarios actually usable for replay/debug.
Part of langwatch/langwatch#1727 (audio-storage epic). Related: langwatch/langwatch#3552 (current inline player),
langwatch/scenario#370(voice epic),langwatch/scenario#350(voice foundation).