fix: add SDK-level caching for Speech and Music generation#201
fix: add SDK-level caching for Speech and Music generation#201SecurityQQ merged 5 commits intomainfrom
Conversation
Speech was the only media type missing withCache/ctx.cache caching at the SDK level. Every await Speech() and <Speech> in render re-hit ElevenLabs even with identical inputs, wasting API credits and adding latency. - resolve.ts: add getCachedGenerateSpeech() wrapping generateSpeechAI with withCache(), matching getCachedGenerateVideo/Music pattern - renderers/speech.ts: add manual ctx.cache get/set matching renderMusic() pattern for the render pipeline path Closes #200
…nd Music renderers Align Speech and Music render pipeline caching with the Image/Video pattern: - Use computeCacheKey(element) for canonical cache keys (captures model provider, settings, providerOptions, children structure) - Route generation through ctx.generateSpeech/ctx.generateMusic which are withCache() wrappers created in renderRoot(), matching ctx.generateImage/ctx.generateVideo - Add pendingFiles deduplication for concurrent renders - Remove manual ctx.cache.get()/set() and JSON.stringify cache keys Also adds generateSpeech/generateMusic to RenderContext and wires them up in render.ts, step-renderer.ts, and test fixtures.
… throwing stubs Replace 'not implemented in test' stubs with real withCache-wrapped mock functions for generateSpeech/generateMusic, matching how generateImage and generateVideo are already mocked in the same test files.
Add 5 new tests verifying render-pipeline caching for Speech and Music: - Speech: reuses cache when only volume/id differ (ignored props) - Speech: does NOT reuse cache when text differs - Speech: does NOT reuse cache when voice differs - Music: reuses cache with identical prompt/model/duration - Music: does NOT reuse cache when prompt differs All tests follow the same pattern as the existing Image/Video cache tests: create element, render in ctx1, render variant in ctx2, assert call count.
Previously only the raw TTS API call was cached via withCache. The expensive post-processing (ffprobe duration, ffmpeg segment slicing, S3 uploads) ran on every invocation even with identical inputs. Now the entire resolved result — including segments with their sliced audio bytes, word timings, and duration — is cached under a 'resolveSpeech:' key. On cache hit, segments are reconstructed from cached binary data without calling ffmpeg or ElevenLabs. Also removes non-deterministic upload URLs (Date.now + Math.random) from ResolvedElement serialization in computeCacheKey. These URLs were causing downstream cache misses for Video elements that take speech segments as audio input (e.g. VEED lip-sync videos).
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (10)
📝 WalkthroughWalkthroughthe pr adds speech and music generation caching across the sdk's render and resolve layers. it implements cached wrappers for Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Render Pipeline
participant RenderContext as RenderContext
participant PendingFiles as pendingFiles<br/>(Dedup)
participant Cache as CacheStorage
participant Generator as ctx.generateSpeech
participant FileStore as generatedFiles
Client->>RenderContext: renderSpeech(element)
RenderContext->>RenderContext: compute cacheKeyStr
RenderContext->>PendingFiles: check if in-flight
alt In-flight promise exists
PendingFiles-->>Client: return existing promise
else Cache miss or no pending
RenderContext->>Cache: check cache.get(cacheKeyStr)
alt Cache hit
Cache-->>RenderContext: return cached audio
RenderContext->>FileStore: push file metadata
RenderContext-->>Client: resolve promise
else Cache miss
RenderContext->>Generator: call with params
Generator-->>RenderContext: return audio object
RenderContext->>Cache: cache.set(cacheKeyStr, audio)
RenderContext->>FileStore: push generated file
PendingFiles->>PendingFiles: clean up pending entry
RenderContext-->>Client: resolve promise
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related issues
Possibly related PRs
Poem
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary
withCache()wrappers for Speech and Music in both the standalone (await Speech()/await Music()) and render pipeline pathscomputeCacheKey(element)consistently — the same canonical key format used by Image and VideopendingFilesdeduplication for concurrent Speech/Music rendersctx.cache.get()/ctx.cache.set()+JSON.stringifycache keys from both renderersProblem
Speech was completely missing SDK-level caching. Music had manual caching via
ctx.cache.get()/ctx.cache.set()withJSON.stringifykeys, which was inconsistent with Image/Video and produced different cache keys than the standaloneawait Music()path.Before:
withCache()render pipelinewithCache()standalone awaitcomputeCacheKey(element)pendingFilesdedupAfter:
withCache()render pipelinewithCache()standalone awaitcomputeCacheKey(element)pendingFilesdedupChanges
src/react/resolve.ts— standaloneawait Speech():getCachedGenerateSpeech()wrappinggenerateSpeechAIwithwithCache(), matchinggetCachedGenerateVideo()/getCachedGenerateMusic()generateSpeechAI()call with cached wrappersrc/react/renderers/context.ts:generateSpeechandgenerateMusicfields toRenderContext, matchinggenerateImage/generateVideosrc/react/renderers/render.ts(renderRoot()):cachedGenerateSpeechandcachedGenerateMusicviawithCache(), same pattern as Image/VideoRenderContextsrc/react/renderers/speech.ts:computeCacheKey(element)+ctx.generateSpeech()+pendingFilesdedupctx.cache.get()/ctx.cache.set()andJSON.stringifykeysrc/react/renderers/music.ts:computeCacheKey(element)+ctx.generateMusic()+pendingFilesdedupsrc/studio/step-renderer.ts+ test fixtures:generateSpeech/generateMusicto allRenderContextconstruction sitesCloses #200