about n_spks

Hello p0p4k,

I am deeply grateful for the code you have provided.

I have a question while adapting it to a Korean version. I am preparing to use a speech-to-text dataset with approximately 2000 speakers.

However, the dataset does not contain speaker labels for the voice data. From what I understand in the paper, it seems that speaker information is learned using a 3-second prompt without explicitly using speaker labels.

If my understanding is correct, this suggests that speaker labels are not necessary for training the model, and thus, not required in the filelist for multi-speaker synthesis.

Yet, I noticed that the code is designed to use speaker labels at the first index of the filelist when n_spks is greater than 1.

I would be extremely grateful if you could clarify this part for me.

My understanding of this paper is still quite limited, and I apologize if my question seems naive.

Thank you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

about n_spks #30

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

about n_spks #30

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions