chore(examples/voice/#486): retire legacy gpt-4o-audio-preview surface, migrate supported audio examples to gpt-audio-mini by drewdrewthis · Pull Request #612 · langwatch/scenario

drewdrewthis · 2026-06-04T16:54:49Z

What

Cohesive retirement of the legacy gpt-4o-audio-preview voice/audio example surface, consolidating three threads into one PR:

Folds in fix(examples/voice): swap deleted gpt-4o-audio-preview → gpt-audio-mini #607 (model swap gpt-4o-audio-preview → gpt-audio-mini) — its commits are cherry-picked here, so fix(examples/voice): swap deleted gpt-4o-audio-preview → gpt-audio-mini #607 is superseded and will be closed.
Supersedes ci: migrate voice-agent example tests off deleted gpt-4o-audio-preview, then unskip #486 (the "unskip the voice/audio tests" campaign) — see "Why we don't unskip" below; ci: migrate voice-agent example tests off deleted gpt-4o-audio-preview, then unskip #486 will be closed with reasoning.
Original chore(examples/voice/#486): retire legacy gpt-4o-audio-preview surface, migrate supported audio examples to gpt-audio-mini #612 scope (delete the deprecated voice-to-voice test) — retained.

Why we don't unskip (the key correction)

#486 framed the goal as "remove the skipif(CI) markers and restore CI coverage." That premise was never achievable: these audio/voice example tests are live end-to-end tests — they call real OpenAI (gpt-audio-mini) and the real LangWatch backend, incur cost, and produce non-deterministic audio. They are correctly CI-skipped regardless of model (same class as the other live voice/* example tests). The dead gpt-4o-audio-preview model was only the historical reason for the skip; swapping it doesn't make these CI-runnable.

So the right end-state is migrated + intentionally CI-skipped, not "unskipped." We migrate the model so the examples work when run live/locally, keep the skip markers, and fix the stale skip comments to state the real reason.

What changed

Genuinely-dead → retired:

Tombstone docs/docs/pages/examples/multimodal/voice-to-voice.mdx and testing-voice-agents.mdx → a short pointer to /voice/getting-started. The pages' URLs still return 200 (the vocs fork has no redirect layer, so a tombstone avoids 404s on previously-public URLs).
Delete the now-unused LegacyVoiceDeprecation.mdx snippet (zero importers after the edits).
test_voice_to_voice_conversation.py deleted (it was explicitly DEPRECATED — the legacy single-call pattern).

Supported → kept + migrated to gpt-audio-mini:

audio-to-text.mdx / audio-to-audio.mdx kept and updated (prereq prose now names gpt-audio-mini; LegacyVoiceDeprecation banner removed) — these document the current supported single-call pattern, not a legacy one.
TS example tests (multimodal-audio-to-text, multimodal-audio-to-audio, multimodal-voice-to-voice-conversation, helpers/openai-voice-agent.ts) migrated to gpt-audio-mini with updated skip-comments (via fix(examples/voice): swap deleted gpt-4o-audio-preview → gpt-audio-mini #607's cherry-picked commits).
Python test_audio_to_text.py / test_audio_to_audio.py skip-comments rewritten to "live-E2E, not model"; skipif(CI) markers retained (no model-literal change — they route through the helper's gpt-audio-mini default).
_generated example partials regenerated to match the migrated test sources.
overview.mdx voice-agents link repointed to /voice/getting-started.

Verification

pnpm build (in docs/) exits 0, no broken-link/missing-import errors.
Tombstone routes build and render the /voice/getting-started pointer (confirmed in dist/examples/multimodal/voice-to-voice/index.html).
audio-to-text built page references gpt-audio-mini (x6); no live gpt-4o-audio-preview model literal remains anywhere (grep over model=/model: is empty — remaining mentions are explanatory comments/tombstone prose only).
Python/TS audio tests retain their CI-skip markers (live E2E by design).

Closes / supersedes

Closes ci: migrate voice-agent example tests off deleted gpt-4o-audio-preview, then unskip #486 (premise invalid — tests are live-E2E, correctly skipped; see "Why we don't unskip").
Supersedes fix(examples/voice): swap deleted gpt-4o-audio-preview → gpt-audio-mini #607 (commits folded in here).

🤖 Generated with Claude Code

github-actions

Approved by automation: PR qualifies as low-risk-change under the documented policy.

drewdrewthis · 2026-06-04T17:06:52Z

[grinder] READY for human review

CI: green (zero failing, zero pending)
ACs: met — deleted `test_voice_to_voice_conversation.py` (self-annotated DEPRECATED), removed its MDX import/tab, removed its `voice-integration.yml` entry, updated manifest; Closes #486 (closing link confirmed by GitHub)
Threads: zero unresolved, zero outdated

Verified by:
`gh pr checks 612` → all 17 checks pass/skip, zero pending/failing
`gh api graphql reviewThreads` → `nodes: []` (zero threads)
`gh api graphql closingIssuesReferences` → `nodes: [{number: 486}]`
`python/tests/voice/test_feature_file_contract.py` contract counts updated (127 scenarios, 79/13/35 tag split) via cherry-pick of 6ea8b8d — `test (3.12)` passes

drewdrewthis · 2026-06-04T17:55:04Z

✅ Review + prove-it: READY (after closing-ref correction)

Review: deleting test_voice_to_voice_conversation.py is correct — its own docstring marked it DEPRECATED ("legacy gpt-4o-audio-preview single-call pattern"), it pinned the deleted model, and the capability is covered by the VoiceAgentAdapter demos in python/examples/voice/ (30+ files) + the TS multimodal-voice-to-voice-conversation.test.ts. No coverage regression.

Prove-it:

Collection clean: uv run pytest examples/ --collect-only → no import errors from the deletion (the 3 errors are pre-existing Missing OPENAI_API_KEY on unrelated remote-agent SSE tests).
grep -rn test_voice_to_voice_conversation python/ → zero dangling refs.

Fixed before ready: the body said Closes #486, but #486 is a 7-file unskip issue and this PR (+#607+#610) addresses only 2; five files (test_audio_to_audio.py, test_audio_to_text.py, 3 JS voice tests) still carry the dead-model skip. Corrected the body → Part of #486, and posted the full file-by-file status on #486 so it stays open until all seven are green. (If GitHub's cached link still auto-closes #486 on merge, reopen it.)

Minor: carries a disclosed cherry-picked contract-count fix to test_feature_file_contract.py (borrowed from #610) — may trivially conflict if #610 lands first.

…gacy test Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…t file Companion to the delete commit: `test_voice_to_voice_conversation.py` was removed but two references remained: - docs/scripts/mdx-examples-manifest.js: remove sourcePath entry - .github/workflows/voice-integration.yml: remove from pytest command Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ython example The deleted `test_voice_to_voice_conversation.py` was referenced in `docs/docs/pages/examples/multimodal/voice-to-voice.mdx` as: - a generated MDX import (breaking the docs build) - a Python LanguageTabs.CodeTab (now empty) - a prose GitHub link in "Complete Sources" Remove the import, the Python tab, and update the prose link to point to the helper utilities instead with a note about the legacy pattern removal. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The voice-to-voice example helper and the audio-to-text example pinned `gpt-4o-audio-preview`, which OpenAI has removed (404 model_not_found since 2026-05-19). Any user running the canonical voice example hit an immediate 404. Switch to `gpt-audio-mini` — OpenAI's current cost-efficient GA audio-chat model — matching the Python twin, which already migrated (python/scenario/config/voice_models.py:44 OPENAI_AUDIO_CHAT_MODEL, python/examples/test_audio_to_text.py:157). Verified live: gpt-audio-mini accepts the identical chat.completions shape (modalities:["text","audio"], audio:{voice,format}) and returns audio. Re-ran the voice-to-voice e2e against prod LangWatch — success: true, real 2-turn conversation, traces landed (project_bZspxwkhCD4POvqmIgOr2). SDK core was unaffected (OpenAIRealtimeAgentAdapter uses gpt-realtime-mini). This closes a py↔ts example-parity gap left by #561. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

… refs after model swap Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…e, keep+migrate supported audio examples Cohesive retirement of the legacy gpt-4o-audio-preview voice/audio example surface, folding in the model swap from #607 (cherry-picked) and superseding the unskip plan in #486. Genuinely-dead (retired): - Tombstone docs/docs/pages/examples/multimodal/voice-to-voice.mdx and testing-voice-agents.mdx -> pointer to /voice/getting-started (URLs still 200; the langwatch vocs fork has no redirect layer, so a tombstone is how we avoid 404s on previously-public URLs). - Delete the now-unused LegacyVoiceDeprecation.mdx snippet (no importers left). - (test_voice_to_voice_conversation.py already deleted in an earlier #486 commit.) Supported (kept + migrated to gpt-audio-mini): - audio-to-text.mdx / audio-to-audio.mdx kept and updated: prereq prose now names gpt-audio-mini; LegacyVoiceDeprecation banner removed (these document the CURRENT supported single-call pattern, not a legacy one). - Python test_audio_to_text.py / test_audio_to_audio.py: skip COMMENTS rewritten to the real reason (live E2E -- real OpenAI gpt-audio-mini + LangWatch backend, cost, non-deterministic audio); skipif(CI) markers retained by design. No model literal change (they route through the helper's gpt-audio-mini default). - _generated example partials regenerated to match the migrated test sources. overview.mdx voice-agents link repointed to /voice/getting-started. Why the audio tests stay CI-skipped: they are live end-to-end tests; #486's "unskip to restore CI coverage" premise was never achievable (cost + non-determinism). The right end-state is migrated-and-intentionally-skipped. Docs build: pnpm build exits 0, no broken-link/missing-import errors; tombstone routes render the /voice/getting-started pointer. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

drewdrewthis · 2026-06-10T19:05:34Z

Review verdict: READY

Re-reviewed at HEAD 6256f4d (was NOT-READY at 98d970a). All three blocking findings from the prior pass are resolved and verified on the branch; CI is green (python-complete + javascript-complete both SUCCESS — the two required checks). Verified each fix directly against the branch this session, not from the author report.

Resolved since last review

[principles][hygiene] Spec ↔ code contradiction — specs/voice-docs-surface.feature rewritten: the retirement scenario group now asserts the tombstone-and-delete reality. AC18 → "the legacy source file is deleted from the repo" (+ canonical demos under python/examples/voice/); AC17 → "the shared LegacyVoiceDeprecation.mdx snippet no longer exists, the pointer living directly in each page"; AC15 → tombstones still resolve 200 (no 404) and the two audio pages stay live/migrated to gpt-audio-mini. Coherent Gherkin, full-read verified.
[hygiene] Dangling tombstone link — docs/docs/pages/examples/multimodal/multimodal-images.mdx:173 repointed to /voice/getting-started; grep confirms it was the only remaining un-repointed link.
[principles] Asymmetric retirement — the JS voice-to-voice twin is retired symmetrically: multimodal-voice-to-voice-conversation.test.ts + its mdx-examples-manifest.js entry + the _generated partial all deleted (manifest now has 0 voice-to-voice refs; nothing else imported the partial).

Non-blocking

[test] Dead helper exports — RESOLVED: save_conversation_audio + concatenate_wav_files removed (grep-confirmed no other consumer); encode_audio_to_base64 correctly kept (still used by 2 tests).
[test] Dead CI step (voice-integration.yml:129) + brittle judge-criteria parity — pre-existing, intentionally out of scope; tracked as separate New-Issue follow-ups.

Evidence

CI on 6256f4d: python-complete SUCCESS, javascript-complete SUCCESS.
Docs pnpm build exit 0; both tombstones render /voice/getting-started; multimodal-images.html shows 0 stale links; pytest --co 880 collected (the FIX4 trim broke no imports).

… symmetric JS twin retirement, drop dead helper exports FIX 1 [blocker]: rewrite specs/voice-docs-surface.feature deprecation group to assert the retirement reality, not the old keep-and-banner strategy. - AC15: reframed to "tombstoned pages still 200, point to /voice/getting-started" and clarifies the supported audio pages stay live+migrated (not tombstoned). - AC17: shared LegacyVoiceDeprecation.mdx snippet is gone; rewritten to an inline per-page tombstone pointer. - AC18: flipped from "source file is not deleted" to assert the file IS deleted and canonical demos live at python/examples/voice/*. Background + AC Coverage Map updated to match. FIX 2 [blocker]: repoint dangling tombstone link in docs/docs/pages/examples/multimodal/multimodal-images.mdx from ./testing-voice-agents to /voice/getting-started (mirrors audio-to-*.mdx). FIX 3 [blocker]: symmetric retirement of the JS voice-to-voice twin — delete the test, its mdx-examples-manifest.js entry, and the orphaned generated mdx. Confirmed no other page imports the generated mdx. FIX 4 [non-blocker]: drop dead helper exports save_conversation_audio + concatenate_wav_files (deleted test was their only consumer; grep shows no other importer) from helpers/__init__.py and their defs in audio_helpers.py. encode_audio_to_base64 is kept (still used by audio-to-audio/text examples). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-10T22:02:08Z

Automated low-risk assessment

This PR was evaluated against the repository's Low-Risk Pull Requests procedure and does not qualify as low risk.

This PR modifies files in restricted directories that require manual review per policy.

This PR requires a manual review before merging.

github-actions Bot added the low-risk-change PR qualifies as low-risk per policy and can be merged without manual review label Jun 4, 2026

github-actions Bot previously approved these changes Jun 4, 2026

View reviewed changes

drewdrewthis dismissed github-actions[bot]’s stale review via 55deabb June 4, 2026 16:55

drewdrewthis added the grinding Grinder is actively managing this PR label Jun 4, 2026

drewdrewthis mentioned this pull request Jun 4, 2026

ci: migrate voice-agent example tests off deleted gpt-4o-audio-preview, then unskip #486

Closed

github-actions Bot removed the low-risk-change PR qualifies as low-risk per policy and can be merged without manual review label Jun 4, 2026

drewdrewthis added pr-ready and removed grinding Grinder is actively managing this PR labels Jun 4, 2026

This was referenced Jun 4, 2026

fix(voice): main python-ci red — stale feature-file contract counts (108→127) after #561 #609

Closed

refactor(voice): reconcile dangling docs/proposals/ references — the directory is absent from the repo #613

Open

rogeriochaves reviewed Jun 5, 2026

View reviewed changes

Comment thread docs/docs/pages/examples/multimodal/voice-to-voice.mdx Outdated

rogeriochaves previously approved these changes Jun 5, 2026

View reviewed changes

drewdrewthis and others added 6 commits June 5, 2026 14:10

chore(examples/voice/#486): delete deprecated gpt-4o-audio-preview le…

ca5c8ec

…gacy test Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs(examples/voice/#607): refresh stale gpt-4o-audio-preview comment…

4643a8a

… refs after model swap Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

drewdrewthis dismissed rogeriochaves’s stale review via 98d970a June 5, 2026 12:22

drewdrewthis force-pushed the fix/486-delete-deprecated-voice-test branch from 4acb9db to 98d970a Compare June 5, 2026 12:22

drewdrewthis changed the title ~~chore(examples/voice/#486): delete deprecated voice-to-voice legacy test~~ chore(examples/voice/#486): retire legacy gpt-4o-audio-preview surface, migrate supported audio examples to gpt-audio-mini Jun 5, 2026

drewdrewthis mentioned this pull request Jun 5, 2026

fix(examples/voice): swap deleted gpt-4o-audio-preview → gpt-audio-mini #607

Closed

drewdrewthis requested a review from rogeriochaves June 5, 2026 12:35

This was referenced Jun 10, 2026

voice-integration CI step is permanently skipped (skipif(CI)) — can never catch a regression #654

Open

audio example tests retain brittle judge criteria (audio-to-audio + py audio-to-text) — parity with #612's generic rewrite #655

Open

drewdrewthis added the slack-requested Slack PR review request posted label Jun 10, 2026

rogeriochaves approved these changes Jun 11, 2026

View reviewed changes

drewdrewthis merged commit 1ebdd1c into main Jun 11, 2026
21 checks passed

drewdrewthis deleted the fix/486-delete-deprecated-voice-test branch June 11, 2026 09:38

This was referenced Jun 11, 2026

chore(main): release javascript 0.4.13 #614

Merged

chore(main): release python 0.7.31 #656

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(examples/voice/#486): retire legacy gpt-4o-audio-preview surface, migrate supported audio examples to gpt-audio-mini#612

chore(examples/voice/#486): retire legacy gpt-4o-audio-preview surface, migrate supported audio examples to gpt-audio-mini#612
drewdrewthis merged 7 commits into
mainfrom
fix/486-delete-deprecated-voice-test

drewdrewthis commented Jun 4, 2026 •

edited

Loading

Uh oh!

github-actions Bot left a comment

Uh oh!

drewdrewthis commented Jun 4, 2026

Uh oh!

drewdrewthis commented Jun 4, 2026

Uh oh!

Uh oh!

drewdrewthis commented Jun 10, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

drewdrewthis commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why we don't unskip (the key correction)

What changed

Verification

Closes / supersedes

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

drewdrewthis commented Jun 4, 2026

Uh oh!

drewdrewthis commented Jun 4, 2026

✅ Review + prove-it: READY (after closing-ref correction)

Uh oh!

Uh oh!

drewdrewthis commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review verdict: READY

Resolved since last review

Non-blocking

Evidence

Uh oh!

github-actions Bot commented Jun 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

drewdrewthis commented Jun 4, 2026 •

edited

Loading

drewdrewthis commented Jun 10, 2026 •

edited

Loading