Skip to content

feat: add 4-bit quantized Qwen TTS models and Russian abbreviations#563

Open
tsvalek wants to merge 2 commits intojamiepine:mainfrom
tsvalek:feat/4bit-quantized-models-and-russian-abbrevs
Open

feat: add 4-bit quantized Qwen TTS models and Russian abbreviations#563
tsvalek wants to merge 2 commits intojamiepine:mainfrom
tsvalek:feat/4bit-quantized-models-and-russian-abbrevs

Conversation

@tsvalek
Copy link
Copy Markdown

@tsvalek tsvalek commented Apr 26, 2026

This PR adds two performance/quality improvements for the TTS engine:

  1. Russian Abbreviations: Added ~40 common Russian abbreviations (t.d., t.e., t.p., g., mln., etc.) to chunked_tts.py to prevent incorrect sentence splitting. This fixes issues where Qwen3-TTS would add long pauses or stutter on abbreviations.
    1. 4-bit Quantized Models (Apple Silicon): Added 4-bit variants of the Qwen3-TTS models for the MLX backend. The 4-bit models transfer ~4x less memory bandwidth during inference, making them 2-3x faster on M-series chips while taking up significantly less disk space (1.1GB vs 3.5GB).
      Tested locally on an M4 Max.

Summary by CodeRabbit

  • New Features
    • Added two new Qwen TTS 4-bit model variants (1.7B-4bit and 0.6B-4bit) marked as "⚡ Fast" options for quicker generation.
    • Improved Russian language support in text-to-speech processing for better sentence parsing.

tsvalek added 2 commits April 26, 2026 21:23
Add ~40 common Russian abbreviations (т.д., т.е., т.п., т.к., г., млн.,
руб., ул., стр., см., etc.) to the sentence-boundary splitter so they
are not treated as sentence endings during chunked TTS generation.

This fixes incorrect text splitting for Russian-language input where
abbreviations like 'и т.д.' or '5 млн.' were causing the chunker to
break mid-sentence.
…ference

Add MLX 4-bit quantized variants of Qwen3-TTS (1.7B and 0.6B) which use
~4x less memory bandwidth and generate 2-3x faster on Apple Silicon
compared to the existing bf16 models.

Backend changes:
- Add 4-bit model paths to MLX backend model map
- Register new model configs (MLX-only, hidden on non-Apple platforms)
- Size: 1.7B-4bit ~1.1GB (vs 3.5GB bf16), 0.6B-4bit ~400MB (vs 1.2GB)

Frontend changes:
- Add 4-bit options to engine selector dropdown
- Update zod schema to accept new model size variants
- Update display names for toast notifications
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 26, 2026

📝 Walkthrough

Walkthrough

This PR adds support for two new 4-bit quantized Qwen TTS model variants (1.7B-4bit and 0.6B-4bit) across the frontend and backend. Changes include adding new model options to the engine selector, updating form validation, extending backend model configuration for MLX-only variants, and enhancing sentence splitting with Russian abbreviations.

Changes

Cohort / File(s) Summary
Frontend Model Selection
app/src/components/Generation/EngineModelSelector.tsx
Added two new Qwen 4-bit "Fast" variant options to the engine dropdown menu.
Frontend Form Validation
app/src/lib/hooks/useGenerationForm.ts
Extended modelSize validation and displayName mapping to support new 4-bit model sizes (1.7B-4bit, 0.6B-4bit) with "⚡ Fast" indicator labels.
Backend Model Configuration
backend/backends/__init__.py
Refactored Qwen TTS model config generation to centralize supported languages and conditionally add two new MLX-only 4-bit quantized model variants with their HuggingFace repository IDs.
Backend Model Path Resolution
backend/backends/mlx_backend.py
Extended _get_model_path method to recognize and resolve new 4-bit model size identifiers to their corresponding HuggingFace model IDs.
Text Processing
backend/utils/chunked_tts.py
Added Russian-language abbreviation tokens to the sentence splitting abbreviation set to prevent periods from being treated as sentence boundaries.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • PR #258: Modifies the same frontend hook and backend engine/config mappings for model validation and configuration logic.
  • PR #319: Updates the same frontend components for engine and model selection with overlapping changes to model option handling.

Suggested reviewers

  • rhmod09-dev

Poem

🐰 Hop, hop! New Qwen variants take flight,
Four-bit fast models, blazing and bright!
From UI to backend, the configs align,
Russian abbreviations and models so fine! ⚡

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 60.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely summarizes the two main changes: adding 4-bit quantized Qwen TTS models and Russian abbreviations, which directly align with the changeset across all modified files.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (4)
app/src/lib/hooks/useGenerationForm.ts (1)

106-129: Replace nested ternary with a lookup map.

The displayName chain is now 7 levels deep and adding two more 4-bit branches further hurts readability. A Record<string, string> keyed by engine (with a small per-engine modelSize dispatcher for qwen/qwen_custom_voice/tada) would make future model additions a one-line change and eliminate the parallel-but-divergent modelName/displayName ternaries.

♻️ Sketch
const QWEN_TTS_DISPLAY: Record<string, string> = {
  '1.7B': 'Qwen TTS 1.7B',
  '0.6B': 'Qwen TTS 0.6B',
  '1.7B-4bit': 'Qwen TTS 1.7B ⚡ Fast',
  '0.6B-4bit': 'Qwen TTS 0.6B ⚡ Fast',
};
// ...
: QWEN_TTS_DISPLAY[data.modelSize ?? '1.7B'] ?? 'Qwen TTS 1.7B';

Better yet, source displayName from the backend /models response so the UI doesn't need to be kept in sync with ModelConfig.display_name at all.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src/lib/hooks/useGenerationForm.ts` around lines 106 - 129, The nested
ternary building displayName in useGenerationForm (based on engine and
data.modelSize) is hard to read and brittle; replace it with a lookup approach:
create per-engine maps (e.g., QWEN_TTS_DISPLAY, QWEN_CUSTOM_VOICE_DISPLAY,
TADA_DISPLAY) keyed by modelSize and a top-level ENGINE_DISPLAY map for engines
that do not depend on modelSize, then compute displayName by selecting the
appropriate map based on engine and falling back to a sensible default (use
data.modelSize ?? '1.7B' for lookups); update the displayName assignment in the
useGenerationForm code to use these maps (referencing engine, data.modelSize,
and the display maps) instead of the long nested ternary.
backend/backends/__init__.py (2)

239-239: Consider hoisting _languages to a module-level constant.

The exact same list ["zh", "en", "ja", "ko", "de", "fr", "ru", "pt", "es", "it"] appears here, in _get_qwen_custom_voice_configs() (lines 303 and 313), and in _get_qwen_llm_configs() (lines 460–462). Promoting it to a module-level constant (e.g., QWEN_SUPPORTED_LANGUAGES) would DRY this up and make future language additions a one-line change.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/backends/__init__.py` at line 239, Hoist the repeated language list
into a module-level constant (e.g., QWEN_SUPPORTED_LANGUAGES) and replace the
local _languages definitions with that constant; specifically remove the local
_languages = ["zh", "en", "ja", "ko", "de", "fr", "ru", "pt", "es", "it"] and
reference QWEN_SUPPORTED_LANGUAGES from _get_qwen_custom_voice_configs and
_get_qwen_llm_configs (and any other occurrences) so adding/removing languages
is a one-line change.

236-237: Dead assignments on non-MLX branch.

repo_1_7b_4bit and repo_0_6b_4bit are set to None here but never read — the 4-bit configs are only constructed inside the if backend_type == "mlx": block at line 265. These two lines can be dropped to reduce noise.

♻️ Optional cleanup
     else:
         repo_1_7b = "Qwen/Qwen3-TTS-12Hz-1.7B-Base"
         repo_0_6b = "Qwen/Qwen3-TTS-12Hz-0.6B-Base"
-        repo_1_7b_4bit = None
-        repo_0_6b_4bit = None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/backends/__init__.py` around lines 236 - 237, Remove the dead
pre-assignments for repo_1_7b_4bit and repo_0_6b_4bit: these variables are only
created inside the if backend_type == "mlx": branch, so delete the top-level
lines setting repo_1_7b_4bit = None and repo_0_6b_4bit = None to eliminate
unused noise and keep variable initialization local to the mlx branch where they
are actually constructed.
app/src/components/Generation/EngineModelSelector.tsx (1)

72-75: Type cast is now a lie — broaden the literal union.

value here can be qwen:1.7B-4bit or qwen:0.6B-4bit (added at lines 22–23), so the extracted modelSize may be '1.7B-4bit' / '0.6B-4bit', but the cast still narrows to '1.7B' | '0.6B'. Functionally OK because the zod schema in useGenerationForm.ts was widened to accept the 4-bit variants, but the cast misleads readers and any future strict consumer of form.getValues('modelSize'). Same issue applies to line 98.

♻️ Suggested fix
   } else if (value.startsWith('qwen:')) {
     const [, modelSize] = value.split(':');
     form.setValue('engine', 'qwen');
-    form.setValue('modelSize', modelSize as '1.7B' | '0.6B');
+    form.setValue('modelSize', modelSize as GenerationFormValues['modelSize']);

Apply the same GenerationFormValues['modelSize'] cast to the qwen_custom_voice and tada branches for consistency.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src/components/Generation/EngineModelSelector.tsx` around lines 72 - 75,
The code narrows extracted qwen model sizes with an incorrect cast to '1.7B' |
'0.6B'; change the cast used when calling form.setValue('modelSize', ...) to use
GenerationFormValues['modelSize'] so the 4-bit variants (e.g., '1.7B-4bit') are
correctly represented, and apply the same change to the qwen_custom_voice and
tada branches where form.setValue('modelSize', ...) is used to keep types
consistent with the widened zod schema.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@app/src/components/Generation/EngineModelSelector.tsx`:
- Around line 22-23: ENGINE_OPTIONS currently hardcodes 4-bit Qwen entries
(e.g., 'qwen:1.7B-4bit', 'qwen:0.6B-4bit') which should be conditionally shown
only when the backend reports those models; update EngineModelSelector to call
the backend model listing (e.g., apiClient.getModelStatus() or GET /models) and
filter ENGINE_OPTIONS by presence in that returned model list before rendering
the dropdown. Specifically, fetch the available model names on mount (or use
existing model status hook), then derive a filteredOptions =
ENGINE_OPTIONS.filter(opt => returnedModels.includes(opt.value) ||
returnedModels.includes(opt.model_name || opt.value.split(':')[0])) and render
filteredOptions in place of ENGINE_OPTIONS so the 4-bit qwen entries only appear
when the backend exposes them.

---

Nitpick comments:
In `@app/src/components/Generation/EngineModelSelector.tsx`:
- Around line 72-75: The code narrows extracted qwen model sizes with an
incorrect cast to '1.7B' | '0.6B'; change the cast used when calling
form.setValue('modelSize', ...) to use GenerationFormValues['modelSize'] so the
4-bit variants (e.g., '1.7B-4bit') are correctly represented, and apply the same
change to the qwen_custom_voice and tada branches where
form.setValue('modelSize', ...) is used to keep types consistent with the
widened zod schema.

In `@app/src/lib/hooks/useGenerationForm.ts`:
- Around line 106-129: The nested ternary building displayName in
useGenerationForm (based on engine and data.modelSize) is hard to read and
brittle; replace it with a lookup approach: create per-engine maps (e.g.,
QWEN_TTS_DISPLAY, QWEN_CUSTOM_VOICE_DISPLAY, TADA_DISPLAY) keyed by modelSize
and a top-level ENGINE_DISPLAY map for engines that do not depend on modelSize,
then compute displayName by selecting the appropriate map based on engine and
falling back to a sensible default (use data.modelSize ?? '1.7B' for lookups);
update the displayName assignment in the useGenerationForm code to use these
maps (referencing engine, data.modelSize, and the display maps) instead of the
long nested ternary.

In `@backend/backends/__init__.py`:
- Line 239: Hoist the repeated language list into a module-level constant (e.g.,
QWEN_SUPPORTED_LANGUAGES) and replace the local _languages definitions with that
constant; specifically remove the local _languages = ["zh", "en", "ja", "ko",
"de", "fr", "ru", "pt", "es", "it"] and reference QWEN_SUPPORTED_LANGUAGES from
_get_qwen_custom_voice_configs and _get_qwen_llm_configs (and any other
occurrences) so adding/removing languages is a one-line change.
- Around line 236-237: Remove the dead pre-assignments for repo_1_7b_4bit and
repo_0_6b_4bit: these variables are only created inside the if backend_type ==
"mlx": branch, so delete the top-level lines setting repo_1_7b_4bit = None and
repo_0_6b_4bit = None to eliminate unused noise and keep variable initialization
local to the mlx branch where they are actually constructed.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: acf3a571-5041-4825-8541-deb6390de76c

📥 Commits

Reviewing files that changed from the base of the PR and between 7df366d and f1b2b1f.

📒 Files selected for processing (5)
  • app/src/components/Generation/EngineModelSelector.tsx
  • app/src/lib/hooks/useGenerationForm.ts
  • backend/backends/__init__.py
  • backend/backends/mlx_backend.py
  • backend/utils/chunked_tts.py

Comment on lines +22 to +23
{ value: 'qwen:1.7B-4bit', label: 'Qwen3-TTS 1.7B ⚡ Fast', engine: 'qwen' },
{ value: 'qwen:0.6B-4bit', label: 'Qwen3-TTS 0.6B ⚡ Fast', engine: 'qwen' },
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

4-bit options shown unconditionally — non-Apple users will see broken entries.

The PR description states 4-bit models are "hidden on non-Apple platforms" and the backend (_get_qwen_model_configs in backend/backends/__init__.py) only emits these configs when backend_type == "mlx". However, this dropdown hardcodes qwen:1.7B-4bit / qwen:0.6B-4bit for every platform, so PyTorch/non-Apple users will see "⚡ Fast" entries that fail when selected (model lookup returns no config; load endpoint will error).

Consider gating these two entries on backend capability — e.g., fetch the available models from /models (or apiClient.getModelStatus()) and only render options whose model_name is present. Filtering ENGINE_OPTIONS against the backend-reported model list also removes the need to keep the UI list and backend config in lockstep going forward.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src/components/Generation/EngineModelSelector.tsx` around lines 22 - 23,
ENGINE_OPTIONS currently hardcodes 4-bit Qwen entries (e.g., 'qwen:1.7B-4bit',
'qwen:0.6B-4bit') which should be conditionally shown only when the backend
reports those models; update EngineModelSelector to call the backend model
listing (e.g., apiClient.getModelStatus() or GET /models) and filter
ENGINE_OPTIONS by presence in that returned model list before rendering the
dropdown. Specifically, fetch the available model names on mount (or use
existing model status hook), then derive a filteredOptions =
ENGINE_OPTIONS.filter(opt => returnedModels.includes(opt.value) ||
returnedModels.includes(opt.model_name || opt.value.split(':')[0])) and render
filteredOptions in place of ENGINE_OPTIONS so the 4-bit qwen entries only appear
when the backend exposes them.

Comment on lines +55 to +60
"т.д", # и т.д. (и так далее)
"т.п", # и т.п. (и тому подобное)
"т.е", # т.е. (то есть)
"т.к", # т.к. (так как)
"т.н", # т.н. (так называемый)
"т.о", # т.о. (таким образом)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Dotted Russian abbreviations won’t match with current period parser

This addition won’t work for entries like т.д, т.е, т.к, т.о, н.э because Line 162 only backtracks over letters, so at the final dot it extracts only the last segment (e.g., д) and misses _ABBREVIATIONS. The splitter can still break mid-phrase, contrary to the PR goal.

Suggested fix (parse abbreviation stem including internal dots)
diff --git a/backend/utils/chunked_tts.py b/backend/utils/chunked_tts.py
@@
 def _find_last_sentence_end(text: str) -> int:
@@
         if char == ".":
-            # Walk backwards to find the preceding word
-            word_start = pos - 1
-            while word_start >= 0 and text[word_start].isalpha():
-                word_start -= 1
-            word = text[word_start + 1 : pos].lower()
-            if word in _ABBREVIATIONS:
+            # Walk backwards to capture abbreviation stems with internal dots,
+            # e.g. "e.g.", "u.s.", "т.д."
+            token_start = pos - 1
+            while token_start >= 0 and (
+                text[token_start].isalpha() or text[token_start] == "."
+            ):
+                token_start -= 1
+            token = text[token_start + 1 : pos].strip(".").lower()
+            if token in _ABBREVIATIONS:
                 continue
             # Skip decimal numbers (digit immediately before the period)
-            if word_start >= 0 and text[word_start].isdigit():
+            if token_start >= 0 and text[token_start].isdigit():
                 continue

Also applies to: 67-67

🧰 Tools
🪛 Ruff (0.15.11)

[warning] 57-57: String contains ambiguous е (CYRILLIC SMALL LETTER IE). Did you mean e (LATIN SMALL LETTER E)?

(RUF001)


[warning] 57-57: Comment contains ambiguous е (CYRILLIC SMALL LETTER IE). Did you mean e (LATIN SMALL LETTER E)?

(RUF003)


[warning] 60-60: String contains ambiguous о (CYRILLIC SMALL LETTER O). Did you mean o (LATIN SMALL LETTER O)?

(RUF001)


[warning] 60-60: Comment contains ambiguous о (CYRILLIC SMALL LETTER O). Did you mean o (LATIN SMALL LETTER O)?

(RUF003)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant