Feature request: Croatian language support (existing checkpoints + fine-tune options)

## Summary
Voicebox currently supports 23 languages across all engines, but Croatian (and most South Slavic languages) are not yet covered. The closest available are Polish (Chatterbox Multilingual) and Russian (Qwen3) — neither produces acceptable phonetics for Croatian.

## Why it matters
Croatian has ~5M native speakers and there is currently **no production-grade local-first Croatian TTS** in any open-source desktop app. Realistic use cases: audiobook production, accessibility for visually impaired users, indie game dialogue, language-learning content, podcast tooling. Voicebox would be the first.

## Existing Croatian resources (to save the maintainer/contributor work)

No one needs to train a Croatian TTS from scratch — there is already non-zero groundwork available. Three realistic integration paths:

### Option A — wrap an existing Croatian checkpoint
- **`nikolab/speecht5_tts_hr`** — https://huggingface.co/nikolab/speecht5_tts_hr
  - Fine-tuned Microsoft SpeechT5, **MIT license**
  - Trained on VoxPopuli HR: **43 hours, 83 speakers, ~250k tokens**
  - Multi-speaker (83 trained voices) but **not zero-shot cloning** — fits Voicebox's existing `voice_type = "preset"` profile flow (same pattern Kokoro uses, see `backend/database/models.py`)
  - Known limitation: chokes on sequences > ~20 words → Voicebox's existing auto-chunker should handle this transparently
  - ~0.1B params / ~400 MB download

- **`facebook/mms-tts-hrv`** — Meta MMS, ISO 639-3 code `hrv`
  - VITS architecture, single-speaker baseline quality
  - License compatibility to be verified by maintainers before integration

### Option B — fine-tune a multilingual cloning model on VoxPopuli HR
- Base: Coqui XTTS v2 (16 languages, Croatian NOT among them — confirmed against https://huggingface.co/coqui/XTTS-v2)
- Add-a-language fine-tune on the same 43 h VoxPopuli HR corpus that `nikolab` used → would give zero-shot cloning + Croatian in one model
- Community precedent: https://github.com/ylacombe/finetune-hf-vits is a maintained Hugging Face recipe for fine-tuning VITS/MMS, directly applicable
- This aligns with the existing roadmap entry in `README.md` → "More Models: XTTS, Bark, and other open-source voice models"

### Option C — ship a "fuzzy" intermediate option
Temporary: let users pick Chatterbox Multilingual and type Croatian text with a new `hr` code that internally maps to Polish phonemes. Imperfect but better than nothing. Could ship as an "experimental" badge while Option A or B matures.

## Dataset availability
- **VoxPopuli HR** — 43 h, publicly downloadable (already used by the `nikolab` model above)
- **Mozilla Common Voice Croatian** — effectively empty in the current release (~0.01 h, 1 speaker). Not viable as a training source.
- **Total realistic pool** — ~40–60 h if combined with smaller academic corpora. Enough for fine-tuning, not for from-scratch SOTA.

## Offer to help
I'm a native Croatian speaker and a Voicebox user. I'm not a Python/Rust developer, but I'm happy to contribute concretely:

- Test all three options on native Croatian sentences including the harder phonemes (č, ć, š, ž, đ, dž, lj, nj) and common English loanwords ("pizza" → /pitsa/, "software" → /softver/) which multilingual models typically mispronounce
- Provide reference Croatian audio samples for QA across standard Shtokavian and one regional variant
- Record a small evaluation set (~30 min) of clean studio audio, released under CC-BY so it can live in the repo as a regression-test fixture
- Test pre-release builds on macOS and Windows
- Translate the UI to Croatian (`app/src/i18n/locales/hr/`) once an engine ships

Happy to coordinate over Discussions or this issue thread. Thanks for the great project!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: Croatian language support (existing checkpoints + fine-tune options) #549

Summary

Why it matters

Existing Croatian resources (to save the maintainer/contributor work)

Option A — wrap an existing Croatian checkpoint

Option B — fine-tune a multilingual cloning model on VoxPopuli HR

Option C — ship a "fuzzy" intermediate option

Dataset availability

Offer to help

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Feature request: Croatian language support (existing checkpoints + fine-tune options) #549

Description

Summary

Why it matters

Existing Croatian resources (to save the maintainer/contributor work)

Option A — wrap an existing Croatian checkpoint

Option B — fine-tune a multilingual cloning model on VoxPopuli HR

Option C — ship a "fuzzy" intermediate option

Dataset availability

Offer to help

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions