fix: add SDK-level caching for Speech and Music generation by SecurityQQ · Pull Request #201 · vargHQ/sdk

SecurityQQ · 2026-04-07T00:47:29Z

Summary

Adds withCache() wrappers for Speech and Music in both the standalone (await Speech()/await Music()) and render pipeline paths
Uses computeCacheKey(element) consistently — the same canonical key format used by Image and Video
Adds pendingFiles deduplication for concurrent Speech/Music renders
Removes manual ctx.cache.get()/ctx.cache.set() + JSON.stringify cache keys from both renderers

Problem

Speech was completely missing SDK-level caching. Music had manual caching via ctx.cache.get()/ctx.cache.set() with JSON.stringify keys, which was inconsistent with Image/Video and produced different cache keys than the standalone await Music() path.

Before:

Layer	Image	Video	Music	Speech
Gateway (Redis/R2)	yes	yes	yes	yes
SDK `withCache()` render pipeline	yes	yes	manual get/set	no
SDK `withCache()` standalone await	yes	yes	yes	no
`computeCacheKey(element)`	yes	yes	no (JSON.stringify)	no (ignored)
`pendingFiles` dedup	yes	yes	no	no

After:

Layer	Image	Video	Music	Speech
Gateway (Redis/R2)	yes	yes	yes	yes
SDK `withCache()` render pipeline	yes	yes	yes	yes
SDK `withCache()` standalone await	yes	yes	yes	yes
`computeCacheKey(element)`	yes	yes	yes	yes
`pendingFiles` dedup	yes	yes	yes	yes

Changes

src/react/resolve.ts — standalone await Speech():

Added getCachedGenerateSpeech() wrapping generateSpeechAI with withCache(), matching getCachedGenerateVideo()/getCachedGenerateMusic()
Replaced direct generateSpeechAI() call with cached wrapper

src/react/renderers/context.ts:

Added generateSpeech and generateMusic fields to RenderContext, matching generateImage/generateVideo

src/react/renderers/render.ts (renderRoot()):

Creates cachedGenerateSpeech and cachedGenerateMusic via withCache(), same pattern as Image/Video
Passes them into RenderContext

src/react/renderers/speech.ts:

Rewrote to use computeCacheKey(element) + ctx.generateSpeech() + pendingFiles dedup
Removed manual ctx.cache.get()/ctx.cache.set() and JSON.stringify key

src/react/renderers/music.ts:

Same rewrite: computeCacheKey(element) + ctx.generateMusic() + pendingFiles dedup
Removed manual caching logic

src/studio/step-renderer.ts + test fixtures:

Added generateSpeech/generateMusic to all RenderContext construction sites

Closes #200

Speech was the only media type missing withCache/ctx.cache caching at the SDK level. Every await Speech() and <Speech> in render re-hit ElevenLabs even with identical inputs, wasting API credits and adding latency. - resolve.ts: add getCachedGenerateSpeech() wrapping generateSpeechAI with withCache(), matching getCachedGenerateVideo/Music pattern - renderers/speech.ts: add manual ctx.cache get/set matching renderMusic() pattern for the render pipeline path Closes #200

…nd Music renderers Align Speech and Music render pipeline caching with the Image/Video pattern: - Use computeCacheKey(element) for canonical cache keys (captures model provider, settings, providerOptions, children structure) - Route generation through ctx.generateSpeech/ctx.generateMusic which are withCache() wrappers created in renderRoot(), matching ctx.generateImage/ctx.generateVideo - Add pendingFiles deduplication for concurrent renders - Remove manual ctx.cache.get()/set() and JSON.stringify cache keys Also adds generateSpeech/generateMusic to RenderContext and wires them up in render.ts, step-renderer.ts, and test fixtures.

… throwing stubs Replace 'not implemented in test' stubs with real withCache-wrapped mock functions for generateSpeech/generateMusic, matching how generateImage and generateVideo are already mocked in the same test files.

Add 5 new tests verifying render-pipeline caching for Speech and Music: - Speech: reuses cache when only volume/id differ (ignored props) - Speech: does NOT reuse cache when text differs - Speech: does NOT reuse cache when voice differs - Music: reuses cache with identical prompt/model/duration - Music: does NOT reuse cache when prompt differs All tests follow the same pattern as the existing Image/Video cache tests: create element, render in ctx1, render variant in ctx2, assert call count.

Previously only the raw TTS API call was cached via withCache. The expensive post-processing (ffprobe duration, ffmpeg segment slicing, S3 uploads) ran on every invocation even with identical inputs. Now the entire resolved result — including segments with their sliced audio bytes, word timings, and duration — is cached under a 'resolveSpeech:' key. On cache hit, segments are reconstructed from cached binary data without calling ffmpeg or ElevenLabs. Also removes non-deterministic upload URLs (Date.now + Math.random) from ResolvedElement serialization in computeCacheKey. These URLs were causing downstream cache misses for Video elements that take speech segments as audio input (e.g. VEED lip-sync videos).

coderabbitai · 2026-04-07T02:25:49Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 64a8725c-1514-470f-99ed-c1080e520e0a

📥 Commits

Reviewing files that changed from the base of the PR and between 1f4903d and 5d84741.

📒 Files selected for processing (10)

src/react/renderers/cache.test.ts
src/react/renderers/context.ts
src/react/renderers/music.ts
src/react/renderers/packshot.test.ts
src/react/renderers/render.ts
src/react/renderers/speech.ts
src/react/renderers/talking-head.test.ts
src/react/renderers/utils.ts
src/react/resolve.ts
src/studio/step-renderer.ts

📝 Walkthrough

Walkthrough

the pr adds speech and music generation caching across the sdk's render and resolve layers. it implements cached wrappers for generateSpeech and generateMusic, threads them through RenderContext, adds resolve-level caching for speech with segment serialization, and removes non-deterministic file urls from cache keys.

Changes

Cohort / File(s)	Summary
context & core wiring `src/react/renderers/context.ts`, `src/react/renderers/render.ts`, `src/studio/step-renderer.ts`	added `generateSpeech` and `generateMusic` properties to `RenderContext`; wired cached versions in render setup and studio step sessions.
resolve-level speech caching `src/react/resolve.ts`	introduced `getCachedGenerateSpeech()` wrapper and resolve-level caching for `resolveSpeechElement` with `CachedSegment`/`CachedSpeechResult` serialization; reconstructs segments on cache hits.
renderer speech/music `src/react/renderers/speech.ts`, `src/react/renderers/music.ts`	swapped direct generation imports for context-based calls; added concurrent deduplication via `pendingFiles` tracking; refactored progress lifecycle and file construction.
cache serialization `src/react/renderers/utils.ts`	removed non-deterministic file upload urls from `serializeValue()` to stabilize downstream cache keys.
test coverage `src/react/renderers/cache.test.ts`, `src/react/renderers/packshot.test.ts`, `src/react/renderers/talking-head.test.ts`	extended renderer cache tests for speech/music with mock generators; added packshot renderer tests; updated test context mocks to include `generateSpeech`/`generateMusic`.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Render Pipeline
    participant RenderContext as RenderContext
    participant PendingFiles as pendingFiles<br/>(Dedup)
    participant Cache as CacheStorage
    participant Generator as ctx.generateSpeech
    participant FileStore as generatedFiles

    Client->>RenderContext: renderSpeech(element)
    RenderContext->>RenderContext: compute cacheKeyStr
    RenderContext->>PendingFiles: check if in-flight
    alt In-flight promise exists
        PendingFiles-->>Client: return existing promise
    else Cache miss or no pending
        RenderContext->>Cache: check cache.get(cacheKeyStr)
        alt Cache hit
            Cache-->>RenderContext: return cached audio
            RenderContext->>FileStore: push file metadata
            RenderContext-->>Client: resolve promise
        else Cache miss
            RenderContext->>Generator: call with params
            Generator-->>RenderContext: return audio object
            RenderContext->>Cache: cache.set(cacheKeyStr, audio)
            RenderContext->>FileStore: push generated file
            PendingFiles->>PendingFiles: clean up pending entry
            RenderContext-->>Client: resolve promise
        end
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

Speech segments are not cached at the SDK level #200 — directly addresses the root cause of speech caching regression by implementing getCachedGenerateSpeech() and resolve-level caching for speech generation, matching the proposed fix for both resolveSpeechElement() and renderSpeech().

Possibly related PRs

feat: speech segments with word-level timestamps #159 — modifies src/react/resolve.ts and speech renderer code with speech generation/resolution pipeline changes that overlap directly with segment metadata handling.
feat: support custom CacheStorage for serverless environments #81 — modifies RenderContext and src/react/renderers/render.ts to wire cached generation functions through the render pipeline, matching the caching surface setup here.
feat: async element resolution (await Speech/Video/Image/Music) #156 — modifies the same rendering/resolution pipeline (resolve.ts, speech/music renderers, RenderContext) for speech and music generation caching and reuse.

Poem

🎵 speech and music now cache their way,
through resolve and render they stay,
segments serialize, duplicates fade—
no more api calls for the same soundwave made 🎙️

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/speech-caching

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

SecurityQQ added 2 commits April 6, 2026 17:47

SecurityQQ changed the title ~~fix: add SDK-level caching for Speech generation~~ fix: add SDK-level caching for Speech and Music generation Apr 7, 2026

SecurityQQ added 3 commits April 6, 2026 17:59

SecurityQQ mentioned this pull request Apr 7, 2026

VEED video cache keys are non-deterministic across runs (speech segment duration drift) #202

Closed

SecurityQQ merged commit d4acc2d into main Apr 7, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: add SDK-level caching for Speech and Music generation#201

fix: add SDK-level caching for Speech and Music generation#201
SecurityQQ merged 5 commits intomainfrom
fix/speech-caching

SecurityQQ commented Apr 7, 2026 •

edited

Loading

Uh oh!

Uh oh!

coderabbitai Bot commented Apr 7, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Possibly related PRs

Poem

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

SecurityQQ commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Changes

Uh oh!

Uh oh!

coderabbitai Bot commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Possibly related PRs

Poem

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

SecurityQQ commented Apr 7, 2026 •

edited

Loading

coderabbitai Bot commented Apr 7, 2026 •

edited

Loading