[Feature Request] Batch Audio Generation and Voice Cloning Support

Currently, audio generation and voice cloning workflows appear to focus on processing one input at a time.

For users generating large datasets, audiobooks, podcasts, educational content, or multi-speaker projects, it would be helpful to support batch processing of multiple text inputs and/or multiple reference audio files in a single run.

Suggested Features

- Batch text-to-speech generation from a text file (TXT, CSV, JSON).
- Batch voice cloning using multiple reference audio files.
- Progress tracking for long-running jobs.
- Optional parallel processing when sufficient GPU resources are available.
- Automatic naming and organization of generated outputs.

Example

voxcpm batch \
  --input prompts.csv \
  --ref-audio speakers/ \
  --output generated_audio/

Benefits

- Faster dataset generation.
- Improved productivity for content creators.
- Easier large-scale experimentation and benchmarking.
- Better support for audiobook and podcast workflows.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Batch Audio Generation and Voice Cloning Support #323

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Feature Request] Batch Audio Generation and Voice Cloning Support #323

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions