Skip to content

[Feature Request] Batch Audio Generation and Voice Cloning Support #323

@nayar-900

Description

@nayar-900

Currently, audio generation and voice cloning workflows appear to focus on processing one input at a time.

For users generating large datasets, audiobooks, podcasts, educational content, or multi-speaker projects, it would be helpful to support batch processing of multiple text inputs and/or multiple reference audio files in a single run.

Suggested Features

  • Batch text-to-speech generation from a text file (TXT, CSV, JSON).
  • Batch voice cloning using multiple reference audio files.
  • Progress tracking for long-running jobs.
  • Optional parallel processing when sufficient GPU resources are available.
  • Automatic naming and organization of generated outputs.

Example

voxcpm batch
--input prompts.csv
--ref-audio speakers/
--output generated_audio/

Benefits

  • Faster dataset generation.
  • Improved productivity for content creators.
  • Easier large-scale experimentation and benchmarking.
  • Better support for audiobook and podcast workflows.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions