Replies: 2 comments 8 replies
-
|
I was able to answer some of the questions myself: On my machine any value for max samples higher than 8 leads to horribly slow training speed. So I guess going higher than that is not advisable. Trained for 320 epochs but unfortunately the resulting model doesn't perform any better than the base model in my opinion. No discernable difference in the generated audio files. |
Beta Was this translation helpful? Give feedback.
-
|
Hello, could you share the parameters used and what other insights you had with training such as the 8-bit Adam optimizer (if you found out how it affects your training). I am currently doing a very similar training with a similar setup and I'm getting some weird artefacts (a wet sound) and I'm not sure if the model is undertrained or my reference audio files are bad and it's picking too much background noise. Also on my machine (4070 laptop) it takes like 1 day for around 20k-25k steps depending on parameters and I can't figure out if it's performing normally or slow. (I see ppl here training for >100k steps). I would appreciate very much some insights and results from your training. My Parameters:
I actually trained my model for only 150 epochs, 30k steps. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I've been getting great results with F5-TTS, but my first attempt at fine-tuning trained from scratch instead of using the pretrained model—the output started as noise and is only slowly becoming speech.
How do I correctly fine-tune instead of starting from scratch?
Do I need to set "Tokenizer File" and "Path to Pretrained Checkpoint" manually? If so, what should I put?
Does "Download corresponding dataset first, and fill in the path in scripts" (repo link) refer to this?
Project Details:
I'm working on generating voices for characters from an old game. I have 10 to 60 minutes of clean audio samples per character. Language: English.
Hardware:
GPU: Nvidia 4080 Laptop (12GB VRAM)
I'm looking for advice on the best values to set for finetuning, given my hardware. Here’s what I’ve gathered so far, but I’d love some expert input:
Parameter Questions
Batch Size per GPU: I assume 6400 should work with 12GB VRAM, but would appreciate confirmation.
Max Samples: Not sure, but I read that 2 might be fine (reference).
Gradient Accumulation Steps & Max Gradient Norm: No idea—should I just leave them at 1?
Epochs: How many would be reasonable for my dataset size?
Warmup Updates: Not sure what value is appropriate.
Save per Updates: I assume setting this high is better, as frequent saving would slow down training?
Last per Updates: Not sure what value to use here either.
Other Options:
Use 8-bit Adam optimizer – Should I enable this?
Mixed Precision – Any recommendations based on my GPU?
Logger – Not sure what’s best here.
Finetuning Duration
How long should I expect finetuning to take per character? Just so I can compare against my actual training times and check if my machine is underperforming due to driver or config issues.
Any guidance would be highly appreciated! 🚀
Thanks in advance!
Beta Was this translation helpful? Give feedback.
All reactions