-
Notifications
You must be signed in to change notification settings - Fork 2.9k
feat: add 4-bit quantized Qwen TTS models and Russian abbreviations #563
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -25,6 +25,7 @@ | |
| # Lowercase for case-insensitive matching. | ||
| _ABBREVIATIONS = frozenset( | ||
| { | ||
| # English | ||
| "mr", | ||
| "mrs", | ||
| "ms", | ||
|
|
@@ -50,6 +51,44 @@ | |
| "u.s", | ||
| "u.s.a", | ||
| "u.k", | ||
| # Russian | ||
| "т.д", # и т.д. (и так далее) | ||
| "т.п", # и т.п. (и тому подобное) | ||
| "т.е", # т.е. (то есть) | ||
| "т.к", # т.к. (так как) | ||
| "т.н", # т.н. (так называемый) | ||
| "т.о", # т.о. (таким образом) | ||
|
Comment on lines
+55
to
+60
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Dotted Russian abbreviations won’t match with current period parser This addition won’t work for entries like Suggested fix (parse abbreviation stem including internal dots)diff --git a/backend/utils/chunked_tts.py b/backend/utils/chunked_tts.py
@@
def _find_last_sentence_end(text: str) -> int:
@@
if char == ".":
- # Walk backwards to find the preceding word
- word_start = pos - 1
- while word_start >= 0 and text[word_start].isalpha():
- word_start -= 1
- word = text[word_start + 1 : pos].lower()
- if word in _ABBREVIATIONS:
+ # Walk backwards to capture abbreviation stems with internal dots,
+ # e.g. "e.g.", "u.s.", "т.д."
+ token_start = pos - 1
+ while token_start >= 0 and (
+ text[token_start].isalpha() or text[token_start] == "."
+ ):
+ token_start -= 1
+ token = text[token_start + 1 : pos].strip(".").lower()
+ if token in _ABBREVIATIONS:
continue
# Skip decimal numbers (digit immediately before the period)
- if word_start >= 0 and text[word_start].isdigit():
+ if token_start >= 0 and text[token_start].isdigit():
continueAlso applies to: 67-67 🧰 Tools🪛 Ruff (0.15.11)[warning] 57-57: String contains ambiguous (RUF001) [warning] 57-57: Comment contains ambiguous (RUF003) [warning] 60-60: String contains ambiguous (RUF001) [warning] 60-60: Comment contains ambiguous (RUF003) |
||
| "др", # и др. (и другие) | ||
| "пр", # и пр. (и прочее) | ||
| "г", # г. (год / город) | ||
| "гг", # гг. (годы) | ||
| "в", # в. (век) | ||
| "вв", # вв. (века) | ||
| "н.э", # н.э. (нашей эры) | ||
| "ул", # ул. (улица) | ||
| "д", # д. (дом) | ||
| "корп", # корп. (корпус) | ||
| "стр", # стр. (строение / страница) | ||
| "руб", # руб. (рублей) | ||
| "коп", # коп. (копеек) | ||
| "тыс", # тыс. (тысяч) | ||
| "млн", # млн. (миллионов) | ||
| "млрд", # млрд. (миллиардов) | ||
| "трлн", # трлн. (триллионов) | ||
| "кв", # кв. (квадратный) | ||
| "см", # см. (смотри / сантиметр) | ||
| "им", # им. (имени) | ||
| "проф", # проф. (профессор) | ||
| "акад", # акад. (академик) | ||
| "доц", # доц. (доцент) | ||
| "ред", # ред. (редактор) | ||
| "изд", # изд. (издание) | ||
| "обл", # обл. (область) | ||
| "р", # р. (река / рублей) | ||
| "оз", # оз. (озеро) | ||
| "о", # о. (остров) | ||
| "м", # м. (метро / метр) | ||
| "гр", # гр. (гражданин / грамм) | ||
| } | ||
| ) | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
4-bit options shown unconditionally — non-Apple users will see broken entries.
The PR description states 4-bit models are "hidden on non-Apple platforms" and the backend (
_get_qwen_model_configsinbackend/backends/__init__.py) only emits these configs whenbackend_type == "mlx". However, this dropdown hardcodesqwen:1.7B-4bit/qwen:0.6B-4bitfor every platform, so PyTorch/non-Apple users will see "⚡ Fast" entries that fail when selected (model lookup returns no config; load endpoint will error).Consider gating these two entries on backend capability — e.g., fetch the available models from
/models(orapiClient.getModelStatus()) and only render options whosemodel_nameis present. FilteringENGINE_OPTIONSagainst the backend-reported model list also removes the need to keep the UI list and backend config in lockstep going forward.🤖 Prompt for AI Agents