Currently, audio generation and voice cloning workflows appear to focus on processing one input at a time.
For users generating large datasets, audiobooks, podcasts, educational content, or multi-speaker projects, it would be helpful to support batch processing of multiple text inputs and/or multiple reference audio files in a single run.
Suggested Features
- Batch text-to-speech generation from a text file (TXT, CSV, JSON).
- Batch voice cloning using multiple reference audio files.
- Progress tracking for long-running jobs.
- Optional parallel processing when sufficient GPU resources are available.
- Automatic naming and organization of generated outputs.
Example
voxcpm batch
--input prompts.csv
--ref-audio speakers/
--output generated_audio/
Benefits
- Faster dataset generation.
- Improved productivity for content creators.
- Easier large-scale experimentation and benchmarking.
- Better support for audiobook and podcast workflows.
Currently, audio generation and voice cloning workflows appear to focus on processing one input at a time.
For users generating large datasets, audiobooks, podcasts, educational content, or multi-speaker projects, it would be helpful to support batch processing of multiple text inputs and/or multiple reference audio files in a single run.
Suggested Features
Example
voxcpm batch
--input prompts.csv
--ref-audio speakers/
--output generated_audio/
Benefits