Add optional params to specify target split sizes in number of hours in ReazonSpeech recipe. #1490

kinanmartin · 2025-07-16T01:29:49Z

Adds optional parameters to specify target split sizes by number of hours in the ReazonSpeech prepare recipe.

The user can now optionally specify target hours by command line, like so:

lhotse prepare reazonspeech \
  -j $nj \
  --train-hours 100 \
  --dev-hours 2 \
  --test-hours 2 \
  $dl_dir/ReazonSpeech data/manifests

pzelasko · 2025-07-18T15:01:58Z

What’s the rationale for this? Wouldn’t it cause different recipes to have different test and dev sets?

kinanmartin · 2025-07-25T07:07:47Z

@pzelasko Thanks for the comment!

The rationale is to allow our icefall model training recipe to create different split sizes depending on desired sizes for each split of the dataset.

We are currently working on a bilingual (English and Japanese) icefall recipe which relies on the data prepared via lhotse in the icefall prepare script here as well as data from an English dataset (icefall prepare script here). For the bilingual model, we want to be able to use the icefall recipes to prepare equally sized train, dev, and test sets for both datasets, then combine them to make a balanced dataset for the bilingual model, so we would like to be able to have control over the sizes of the splits instead of hardcoding them.

The way I have written the code here, if the new optional parameters are not specified, the train, dev, and test sets should all be generated identically to the current version. When the parameters are specified, the set random seed should make it such that using the same parameter values produces the same test and dev sets. Please let me know if I'm mistaken though.

pzelasko · 2025-09-18T19:27:34Z

I meant that dev and test sets should always be the same regardless of the desired size for your training data. If you could modify it so that this property is preserved, I'd be OK to merge this. Otherwise I have concerns about non-stable test data.

Add optional params to specify target split sizes in number of hours.

e54e243

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add optional params to specify target split sizes in number of hours in ReazonSpeech recipe. #1490

Add optional params to specify target split sizes in number of hours in ReazonSpeech recipe. #1490

Uh oh!

kinanmartin commented Jul 16, 2025

Uh oh!

pzelasko commented Jul 18, 2025

Uh oh!

kinanmartin commented Jul 25, 2025 •

edited

Loading

Uh oh!

pzelasko commented Sep 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add optional params to specify target split sizes in number of hours in ReazonSpeech recipe. #1490

Are you sure you want to change the base?

Add optional params to specify target split sizes in number of hours in ReazonSpeech recipe. #1490

Uh oh!

Conversation

kinanmartin commented Jul 16, 2025

Uh oh!

pzelasko commented Jul 18, 2025

Uh oh!

kinanmartin commented Jul 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pzelasko commented Sep 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kinanmartin commented Jul 25, 2025 •

edited

Loading