feat: add 4-bit quantized Qwen TTS models and Russian abbreviations#563
feat: add 4-bit quantized Qwen TTS models and Russian abbreviations#563tsvalek wants to merge 2 commits intojamiepine:mainfrom
Conversation
Add ~40 common Russian abbreviations (т.д., т.е., т.п., т.к., г., млн., руб., ул., стр., см., etc.) to the sentence-boundary splitter so they are not treated as sentence endings during chunked TTS generation. This fixes incorrect text splitting for Russian-language input where abbreviations like 'и т.д.' or '5 млн.' were causing the chunker to break mid-sentence.
…ference Add MLX 4-bit quantized variants of Qwen3-TTS (1.7B and 0.6B) which use ~4x less memory bandwidth and generate 2-3x faster on Apple Silicon compared to the existing bf16 models. Backend changes: - Add 4-bit model paths to MLX backend model map - Register new model configs (MLX-only, hidden on non-Apple platforms) - Size: 1.7B-4bit ~1.1GB (vs 3.5GB bf16), 0.6B-4bit ~400MB (vs 1.2GB) Frontend changes: - Add 4-bit options to engine selector dropdown - Update zod schema to accept new model size variants - Update display names for toast notifications
📝 WalkthroughWalkthroughThis PR adds support for two new 4-bit quantized Qwen TTS model variants (1.7B-4bit and 0.6B-4bit) across the frontend and backend. Changes include adding new model options to the engine selector, updating form validation, extending backend model configuration for MLX-only variants, and enhancing sentence splitting with Russian abbreviations. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (4)
app/src/lib/hooks/useGenerationForm.ts (1)
106-129: Replace nested ternary with a lookup map.The displayName chain is now 7 levels deep and adding two more 4-bit branches further hurts readability. A
Record<string, string>keyed byengine(with a small per-enginemodelSizedispatcher for qwen/qwen_custom_voice/tada) would make future model additions a one-line change and eliminate the parallel-but-divergentmodelName/displayNameternaries.♻️ Sketch
const QWEN_TTS_DISPLAY: Record<string, string> = { '1.7B': 'Qwen TTS 1.7B', '0.6B': 'Qwen TTS 0.6B', '1.7B-4bit': 'Qwen TTS 1.7B ⚡ Fast', '0.6B-4bit': 'Qwen TTS 0.6B ⚡ Fast', }; // ... : QWEN_TTS_DISPLAY[data.modelSize ?? '1.7B'] ?? 'Qwen TTS 1.7B';Better yet, source
displayNamefrom the backend/modelsresponse so the UI doesn't need to be kept in sync withModelConfig.display_nameat all.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/src/lib/hooks/useGenerationForm.ts` around lines 106 - 129, The nested ternary building displayName in useGenerationForm (based on engine and data.modelSize) is hard to read and brittle; replace it with a lookup approach: create per-engine maps (e.g., QWEN_TTS_DISPLAY, QWEN_CUSTOM_VOICE_DISPLAY, TADA_DISPLAY) keyed by modelSize and a top-level ENGINE_DISPLAY map for engines that do not depend on modelSize, then compute displayName by selecting the appropriate map based on engine and falling back to a sensible default (use data.modelSize ?? '1.7B' for lookups); update the displayName assignment in the useGenerationForm code to use these maps (referencing engine, data.modelSize, and the display maps) instead of the long nested ternary.backend/backends/__init__.py (2)
239-239: Consider hoisting_languagesto a module-level constant.The exact same list
["zh", "en", "ja", "ko", "de", "fr", "ru", "pt", "es", "it"]appears here, in_get_qwen_custom_voice_configs()(lines 303 and 313), and in_get_qwen_llm_configs()(lines 460–462). Promoting it to a module-level constant (e.g.,QWEN_SUPPORTED_LANGUAGES) would DRY this up and make future language additions a one-line change.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/backends/__init__.py` at line 239, Hoist the repeated language list into a module-level constant (e.g., QWEN_SUPPORTED_LANGUAGES) and replace the local _languages definitions with that constant; specifically remove the local _languages = ["zh", "en", "ja", "ko", "de", "fr", "ru", "pt", "es", "it"] and reference QWEN_SUPPORTED_LANGUAGES from _get_qwen_custom_voice_configs and _get_qwen_llm_configs (and any other occurrences) so adding/removing languages is a one-line change.
236-237: Dead assignments on non-MLX branch.
repo_1_7b_4bitandrepo_0_6b_4bitare set toNonehere but never read — the 4-bit configs are only constructed inside theif backend_type == "mlx":block at line 265. These two lines can be dropped to reduce noise.♻️ Optional cleanup
else: repo_1_7b = "Qwen/Qwen3-TTS-12Hz-1.7B-Base" repo_0_6b = "Qwen/Qwen3-TTS-12Hz-0.6B-Base" - repo_1_7b_4bit = None - repo_0_6b_4bit = None🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/backends/__init__.py` around lines 236 - 237, Remove the dead pre-assignments for repo_1_7b_4bit and repo_0_6b_4bit: these variables are only created inside the if backend_type == "mlx": branch, so delete the top-level lines setting repo_1_7b_4bit = None and repo_0_6b_4bit = None to eliminate unused noise and keep variable initialization local to the mlx branch where they are actually constructed.app/src/components/Generation/EngineModelSelector.tsx (1)
72-75: Type cast is now a lie — broaden the literal union.
valuehere can beqwen:1.7B-4bitorqwen:0.6B-4bit(added at lines 22–23), so the extractedmodelSizemay be'1.7B-4bit'/'0.6B-4bit', but the cast still narrows to'1.7B' | '0.6B'. Functionally OK because the zod schema inuseGenerationForm.tswas widened to accept the 4-bit variants, but the cast misleads readers and any future strict consumer ofform.getValues('modelSize'). Same issue applies to line 98.♻️ Suggested fix
} else if (value.startsWith('qwen:')) { const [, modelSize] = value.split(':'); form.setValue('engine', 'qwen'); - form.setValue('modelSize', modelSize as '1.7B' | '0.6B'); + form.setValue('modelSize', modelSize as GenerationFormValues['modelSize']);Apply the same
GenerationFormValues['modelSize']cast to theqwen_custom_voiceandtadabranches for consistency.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/src/components/Generation/EngineModelSelector.tsx` around lines 72 - 75, The code narrows extracted qwen model sizes with an incorrect cast to '1.7B' | '0.6B'; change the cast used when calling form.setValue('modelSize', ...) to use GenerationFormValues['modelSize'] so the 4-bit variants (e.g., '1.7B-4bit') are correctly represented, and apply the same change to the qwen_custom_voice and tada branches where form.setValue('modelSize', ...) is used to keep types consistent with the widened zod schema.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@app/src/components/Generation/EngineModelSelector.tsx`:
- Around line 22-23: ENGINE_OPTIONS currently hardcodes 4-bit Qwen entries
(e.g., 'qwen:1.7B-4bit', 'qwen:0.6B-4bit') which should be conditionally shown
only when the backend reports those models; update EngineModelSelector to call
the backend model listing (e.g., apiClient.getModelStatus() or GET /models) and
filter ENGINE_OPTIONS by presence in that returned model list before rendering
the dropdown. Specifically, fetch the available model names on mount (or use
existing model status hook), then derive a filteredOptions =
ENGINE_OPTIONS.filter(opt => returnedModels.includes(opt.value) ||
returnedModels.includes(opt.model_name || opt.value.split(':')[0])) and render
filteredOptions in place of ENGINE_OPTIONS so the 4-bit qwen entries only appear
when the backend exposes them.
---
Nitpick comments:
In `@app/src/components/Generation/EngineModelSelector.tsx`:
- Around line 72-75: The code narrows extracted qwen model sizes with an
incorrect cast to '1.7B' | '0.6B'; change the cast used when calling
form.setValue('modelSize', ...) to use GenerationFormValues['modelSize'] so the
4-bit variants (e.g., '1.7B-4bit') are correctly represented, and apply the same
change to the qwen_custom_voice and tada branches where
form.setValue('modelSize', ...) is used to keep types consistent with the
widened zod schema.
In `@app/src/lib/hooks/useGenerationForm.ts`:
- Around line 106-129: The nested ternary building displayName in
useGenerationForm (based on engine and data.modelSize) is hard to read and
brittle; replace it with a lookup approach: create per-engine maps (e.g.,
QWEN_TTS_DISPLAY, QWEN_CUSTOM_VOICE_DISPLAY, TADA_DISPLAY) keyed by modelSize
and a top-level ENGINE_DISPLAY map for engines that do not depend on modelSize,
then compute displayName by selecting the appropriate map based on engine and
falling back to a sensible default (use data.modelSize ?? '1.7B' for lookups);
update the displayName assignment in the useGenerationForm code to use these
maps (referencing engine, data.modelSize, and the display maps) instead of the
long nested ternary.
In `@backend/backends/__init__.py`:
- Line 239: Hoist the repeated language list into a module-level constant (e.g.,
QWEN_SUPPORTED_LANGUAGES) and replace the local _languages definitions with that
constant; specifically remove the local _languages = ["zh", "en", "ja", "ko",
"de", "fr", "ru", "pt", "es", "it"] and reference QWEN_SUPPORTED_LANGUAGES from
_get_qwen_custom_voice_configs and _get_qwen_llm_configs (and any other
occurrences) so adding/removing languages is a one-line change.
- Around line 236-237: Remove the dead pre-assignments for repo_1_7b_4bit and
repo_0_6b_4bit: these variables are only created inside the if backend_type ==
"mlx": branch, so delete the top-level lines setting repo_1_7b_4bit = None and
repo_0_6b_4bit = None to eliminate unused noise and keep variable initialization
local to the mlx branch where they are actually constructed.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: acf3a571-5041-4825-8541-deb6390de76c
📒 Files selected for processing (5)
app/src/components/Generation/EngineModelSelector.tsxapp/src/lib/hooks/useGenerationForm.tsbackend/backends/__init__.pybackend/backends/mlx_backend.pybackend/utils/chunked_tts.py
| { value: 'qwen:1.7B-4bit', label: 'Qwen3-TTS 1.7B ⚡ Fast', engine: 'qwen' }, | ||
| { value: 'qwen:0.6B-4bit', label: 'Qwen3-TTS 0.6B ⚡ Fast', engine: 'qwen' }, |
There was a problem hiding this comment.
4-bit options shown unconditionally — non-Apple users will see broken entries.
The PR description states 4-bit models are "hidden on non-Apple platforms" and the backend (_get_qwen_model_configs in backend/backends/__init__.py) only emits these configs when backend_type == "mlx". However, this dropdown hardcodes qwen:1.7B-4bit / qwen:0.6B-4bit for every platform, so PyTorch/non-Apple users will see "⚡ Fast" entries that fail when selected (model lookup returns no config; load endpoint will error).
Consider gating these two entries on backend capability — e.g., fetch the available models from /models (or apiClient.getModelStatus()) and only render options whose model_name is present. Filtering ENGINE_OPTIONS against the backend-reported model list also removes the need to keep the UI list and backend config in lockstep going forward.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@app/src/components/Generation/EngineModelSelector.tsx` around lines 22 - 23,
ENGINE_OPTIONS currently hardcodes 4-bit Qwen entries (e.g., 'qwen:1.7B-4bit',
'qwen:0.6B-4bit') which should be conditionally shown only when the backend
reports those models; update EngineModelSelector to call the backend model
listing (e.g., apiClient.getModelStatus() or GET /models) and filter
ENGINE_OPTIONS by presence in that returned model list before rendering the
dropdown. Specifically, fetch the available model names on mount (or use
existing model status hook), then derive a filteredOptions =
ENGINE_OPTIONS.filter(opt => returnedModels.includes(opt.value) ||
returnedModels.includes(opt.model_name || opt.value.split(':')[0])) and render
filteredOptions in place of ENGINE_OPTIONS so the 4-bit qwen entries only appear
when the backend exposes them.
| "т.д", # и т.д. (и так далее) | ||
| "т.п", # и т.п. (и тому подобное) | ||
| "т.е", # т.е. (то есть) | ||
| "т.к", # т.к. (так как) | ||
| "т.н", # т.н. (так называемый) | ||
| "т.о", # т.о. (таким образом) |
There was a problem hiding this comment.
Dotted Russian abbreviations won’t match with current period parser
This addition won’t work for entries like т.д, т.е, т.к, т.о, н.э because Line 162 only backtracks over letters, so at the final dot it extracts only the last segment (e.g., д) and misses _ABBREVIATIONS. The splitter can still break mid-phrase, contrary to the PR goal.
Suggested fix (parse abbreviation stem including internal dots)
diff --git a/backend/utils/chunked_tts.py b/backend/utils/chunked_tts.py
@@
def _find_last_sentence_end(text: str) -> int:
@@
if char == ".":
- # Walk backwards to find the preceding word
- word_start = pos - 1
- while word_start >= 0 and text[word_start].isalpha():
- word_start -= 1
- word = text[word_start + 1 : pos].lower()
- if word in _ABBREVIATIONS:
+ # Walk backwards to capture abbreviation stems with internal dots,
+ # e.g. "e.g.", "u.s.", "т.д."
+ token_start = pos - 1
+ while token_start >= 0 and (
+ text[token_start].isalpha() or text[token_start] == "."
+ ):
+ token_start -= 1
+ token = text[token_start + 1 : pos].strip(".").lower()
+ if token in _ABBREVIATIONS:
continue
# Skip decimal numbers (digit immediately before the period)
- if word_start >= 0 and text[word_start].isdigit():
+ if token_start >= 0 and text[token_start].isdigit():
continueAlso applies to: 67-67
🧰 Tools
🪛 Ruff (0.15.11)
[warning] 57-57: String contains ambiguous е (CYRILLIC SMALL LETTER IE). Did you mean e (LATIN SMALL LETTER E)?
(RUF001)
[warning] 57-57: Comment contains ambiguous е (CYRILLIC SMALL LETTER IE). Did you mean e (LATIN SMALL LETTER E)?
(RUF003)
[warning] 60-60: String contains ambiguous о (CYRILLIC SMALL LETTER O). Did you mean o (LATIN SMALL LETTER O)?
(RUF001)
[warning] 60-60: Comment contains ambiguous о (CYRILLIC SMALL LETTER O). Did you mean o (LATIN SMALL LETTER O)?
(RUF003)
This PR adds two performance/quality improvements for the TTS engine:
Tested locally on an M4 Max.
Summary by CodeRabbit