Can someone help me with training new Language from scratch ? #453

costcuttingcz · 2024-11-11T23:58:24Z

costcuttingcz
Nov 11, 2024

Helo,

I am experimenting longer time with this and I am frustrated. I want train Czech language from a scratch.
I have everything but I am not able to make it work.
I have /workspace/dataset/cs/wav with all wav files
/workspace/dataset/cs/data.csv contains wav name and in second field text which is in wav. Separator is |
I created vocabulary /workspace/dataset/cs/vocab.txt which contains words.

I want to save results to /workspace/ckpts
On the biginning it has problem that there is no model loaded.
And when i try fix files everythign fall apart.

I modifyed trainer.py to my_trainer.py

Can pelase someone help me make it work ?

`
from importlib.resources import files
from f5_tts.model import CFM, DiT, Trainer, UNetT
from f5_tts.model.dataset import load_dataset
from f5_tts.model.utils import get_tokenizer

-------------------------- Dataset Settings ---------------------------

target_sample_rate = 24000
n_mel_channels = 100
hop_length = 256
win_length = 1024
n_fft = 1024
mel_spec_type = "vocos" # 'vocos' or 'bigvgan'

tokenizer = "custom" # Set to 'custom' to use your vocab.txt ---- 'pinyin', 'char', or 'custom'
tokenizer_path = "/workspace/dataset/cs/vocab.txt" # Path to your vocab.txt
dataset_name = "/workspace/dataset/cs/data.csv" # Path to your data.csv

-------------------------- Training Settings --------------------------

exp_name = "F5TTS_Czech" # Updated experiment name

learning_rate = 7.5e-5

batch_size_per_gpu = 1000 # 8 GPUs, 8 * 38400 = 307200
batch_size_type = "frame" # "frame" or "sample"
max_samples = 64 # max sequences per batch if use frame-wise batch_size. we set 32 for small models, 64 for base models
grad_accumulation_steps = 1 # note: updates = steps / grad_accumulation_steps
max_grad_norm = 1.0

epochs = 11 # use linear decay, thus epochs control the slope
num_warmup_updates = 20000 # warmup steps
save_per_updates = 50000 # save checkpoint per steps
last_per_steps = 5000 # save last checkpoint per steps

model params

if exp_name == "F5TTS_Czech":
wandb_resume_id = None
model_cls = DiT
model_cfg = dict(dim=1024, depth=22, heads=16, ff_mult=2, text_dim=512, conv_layers=4)

-----------------------------------------------------------------------

def main():
vocab_char_map, vocab_size = get_tokenizer(tokenizer_path, tokenizer)

mel_spec_kwargs = dict(
    n_fft=n_fft,
    hop_length=hop_length,
    win_length=win_length,
    n_mel_channels=n_mel_channels,
    target_sample_rate=target_sample_rate,
    mel_spec_type=mel_spec_type,
)

model = CFM(
    transformer=model_cls(**model_cfg, text_num_embeds=vocab_size, mel_dim=n_mel_channels),
    mel_spec_kwargs=mel_spec_kwargs,
    vocab_char_map=vocab_char_map,
)

trainer = Trainer(
    model,
    epochs,
    learning_rate,
    num_warmup_updates=num_warmup_updates,
    save_per_updates=save_per_updates,
    checkpoint_path="/workspace/ckpts/",  # Ensure this path is correct
    batch_size=batch_size_per_gpu,
    batch_size_type=batch_size_type,
    max_samples=max_samples,
    grad_accumulation_steps=grad_accumulation_steps,
    max_grad_norm=max_grad_norm,
    wandb_project="CFM-TTS",
    wandb_run_name=exp_name,
    wandb_resume_id=wandb_resume_id,
    last_per_steps=last_per_steps,
    log_samples=True,
    mel_spec_type=mel_spec_type,
)

# Load dataset

print("Loading with params  ...")
print(tokenizer_path)
train_dataset = load_dataset(dataset_name, tokenizer_path, mel_spec_kwargs=mel_spec_kwargs)
trainer.train(
    train_dataset,
    resumable_with_seed=42,  # seed for shuffling dataset
)

if name == "main":
main()

`

isolveit-aps · 2024-11-25T15:40:27Z

isolveit-aps
Nov 25, 2024

Were you able to get this working? The Gradio UI helps very much, accompanied by the various video tutorials in this repo discussion, and on youtube.

2 replies

costcuttingcz Dec 9, 2024
Author

No it didnt help. I am now trying to create blank model and train it. But Its difficult create similar network.

isolveit-aps Dec 13, 2024

I'm not sure I understand the changes to the file - what errors are you getting?
Also, for training, maybe take a look at Jarod's tutorial here: https://youtu.be/GmketyZW2c4

creagel · 2025-04-30T10:12:26Z

creagel
Apr 30, 2025

Do you have any progress with Czech language? I want to get started on that too - I have about 2 hours of dataset for learning (studio quality sound).

1 reply

costcuttingcz May 12, 2025
Author

I was not able do it from scratch. Seems you will have to do finetune on existing models. I was not able to get rid of fonetically wrong reading of "J" and also skipping "ě" . 2 hours is not inaf by my oppinion.

trackme518 · 2025-09-09T20:35:03Z

trackme518
Sep 9, 2025

Hi, maybe this will help? https://huggingface.co/fav-kky/SpeechT5-base-cs-tts I would be interested in this.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Can someone help me with training new Language from scratch ? #453

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Can someone help me with training new Language from scratch ? #453

Uh oh!

costcuttingcz Nov 11, 2024

-------------------------- Dataset Settings ---------------------------

-------------------------- Training Settings --------------------------

model params

-----------------------------------------------------------------------

Replies: 3 comments · 3 replies

Uh oh!

isolveit-aps Nov 25, 2024

Uh oh!

costcuttingcz Dec 9, 2024 Author

Uh oh!

isolveit-aps Dec 13, 2024

Uh oh!

creagel Apr 30, 2025

Uh oh!

costcuttingcz May 12, 2025 Author

Uh oh!

trackme518 Sep 9, 2025

costcuttingcz
Nov 11, 2024

Replies: 3 comments 3 replies

isolveit-aps
Nov 25, 2024

costcuttingcz Dec 9, 2024
Author

creagel
Apr 30, 2025

costcuttingcz May 12, 2025
Author

trackme518
Sep 9, 2025