Problem
Caption rendering has become noticeably slower since adding the universal font support system. The pipeline now downloads font files from S3, loads them with opentype.js for measurement, computes per-character positions, and (for emoji) may spawn ffmpeg subprocesses for pixel scanning — all before the actual video burn even starts.
Where time is spent
-
Font downloads (ensureLocalFonts) — downloads TTF/OTF files from s3.varg.ai/fonts/ on first use. Noto CJK fonts are 16-17MB each. Cached on disk after first download, but cold starts are slow.
-
opentype.js font loading (loadFont) — opentype.loadSync() parses the full font file including all tables. CJK fonts with 60K+ glyphs are especially slow to parse.
-
Per-character position computation (getCharXPositions) — iterates every character in every subtitle line, computing glyph advances via opentype.js. For long videos with many subtitle entries, this adds up.
-
Arabic/complex script measurement (measureRenderedWidth) — spawns an ffmpeg subprocess per Arabic text segment to measure the actual rendered width via pixel scanning. Each subprocess takes ~100-200ms.
-
Emoji gap detection (measureEmojiGapPositions) — spawns ffmpeg to render the full subtitle line and scan for gaps. ~200ms per line with emoji.
-
ZIP creation for Rendi (runWithCompressedFolder) — downloads all font files again into memory to create the ZIP, even though they're already cached on disk locally.
Optimization opportunities
Quick wins
Medium effort
Larger changes
Impact
Primarily affects:
- Cloud rendering via the render service (every render job pays this cost)
- Local rendering with
<Captions> component
- Worst case: Japanese/Korean/Chinese text with emoji = CJK font download (17MB) + opentype parsing + per-character measurement
Related files
src/react/renderers/captions.ts — main pipeline
src/react/renderers/burn-captions.ts — font download + ZIP creation
src/react/renderers/text-measure.ts — opentype.js measurement
src/react/renderers/fonts.ts — font resolution + S3 URLs
src/ai-sdk/providers/editly/rendi/index.ts — runWithCompressedFolder ZIP builder
Problem
Caption rendering has become noticeably slower since adding the universal font support system. The pipeline now downloads font files from S3, loads them with opentype.js for measurement, computes per-character positions, and (for emoji) may spawn ffmpeg subprocesses for pixel scanning — all before the actual video burn even starts.
Where time is spent
Font downloads (
ensureLocalFonts) — downloads TTF/OTF files froms3.varg.ai/fonts/on first use. Noto CJK fonts are 16-17MB each. Cached on disk after first download, but cold starts are slow.opentype.js font loading (
loadFont) —opentype.loadSync()parses the full font file including all tables. CJK fonts with 60K+ glyphs are especially slow to parse.Per-character position computation (
getCharXPositions) — iterates every character in every subtitle line, computing glyph advances via opentype.js. For long videos with many subtitle entries, this adds up.Arabic/complex script measurement (
measureRenderedWidth) — spawns an ffmpeg subprocess per Arabic text segment to measure the actual rendered width via pixel scanning. Each subprocess takes ~100-200ms.Emoji gap detection (
measureEmojiGapPositions) — spawns ffmpeg to render the full subtitle line and scan for gaps. ~200ms per line with emoji.ZIP creation for Rendi (
runWithCompressedFolder) — downloads all font files again into memory to create the ZIP, even though they're already cached on disk locally.Optimization opportunities
Quick wins
runWithCompressedFolderdownloads fonts from URLs even when they're already in/tmp/varg-caption-fonts/. Read from local cache instead.loadFont()when measurement is actually needed (i.e., when emoji are present). Plain text captions don't need opentype.js at all.getFontMetrics()results — metrics for a given font+size combo are deterministic. Cache them in a Map.Medium effort
ensureLocalFontsalready usesPromise.all, but the ZIP builder downloads sequentially. Parallelize.measureRenderedWidthcalls — instead of one ffmpeg subprocess per Arabic segment, render all segments in a single ffmpeg call with multiple drawtext filters.Larger changes
Impact
Primarily affects:
<Captions>componentRelated files
src/react/renderers/captions.ts— main pipelinesrc/react/renderers/burn-captions.ts— font download + ZIP creationsrc/react/renderers/text-measure.ts— opentype.js measurementsrc/react/renderers/fonts.ts— font resolution + S3 URLssrc/ai-sdk/providers/editly/rendi/index.ts—runWithCompressedFolderZIP builder