Skip to content

feat: add MiniMax Cloud TTS as third voiceover provider#11

Open
octo-patch wants to merge 1 commit intodigitalsamba:mainfrom
octo-patch:feature/add-minimax-tts-provider
Open

feat: add MiniMax Cloud TTS as third voiceover provider#11
octo-patch wants to merge 1 commit intodigitalsamba:mainfrom
octo-patch:feature/add-minimax-tts-provider

Conversation

@octo-patch
Copy link
Copy Markdown

Summary

Adds MiniMax Cloud TTS as a third voiceover provider alongside ElevenLabs and Qwen3-TTS.

  • No GPU required — runs entirely via MiniMax cloud API
  • Two models: speech-2.8-hd (high quality) and speech-2.8-turbo (faster)
  • 12 built-in voices: 5 English + 7 Chinese
  • Full integration with voiceover.py (--provider minimax) and standalone minimax_tts.py
  • Brand config support via voice.json minimax section

Usage

# Standalone
python tools/minimax_tts.py --text "Hello world" --voice English_Graceful_Lady --output hello.mp3
python tools/minimax_tts.py --list-voices

# Via voiceover.py (single file or per-scene)
python tools/voiceover.py --provider minimax --script script.md --output out.mp3
python tools/voiceover.py --provider minimax --minimax-voice English_Persuasive_Man --scene-dir scenes/ --json

Files Changed (8 files, ~1159 additions)

File Change
tools/minimax_tts.py New standalone MiniMax TTS tool
tools/voiceover.py Add minimax to --provider choices, MiniMax CLI options, generation dispatch
tools/config.py Add get_minimax_api_key() helper
brands/default/voice.json Add minimax config section
README.md Document MiniMax TTS usage
CLAUDE.md Document MiniMax TTS standalone tool
tests/test_minimax_tts.py 28 unit tests
tests/test_minimax_tts_integration.py 3 integration tests

Test Plan

  • 28 unit tests pass (mocked API, CLI parsing, dry-run, brand config, payload format)
  • 3 integration tests pass (real API calls: hd model, turbo model, voiceover.py integration)
  • Existing ElevenLabs and Qwen3-TTS providers unaffected
  • Verify --list-voices output
  • Test per-scene mode with MiniMax provider

Add MiniMax Cloud TTS (speech-2.8-hd / speech-2.8-turbo) as a third
voiceover provider alongside ElevenLabs and Qwen3-TTS. MiniMax offers
12 built-in voices (5 English + 7 Chinese), no GPU required — runs
entirely via cloud API.

Changes:
- tools/minimax_tts.py: standalone MiniMax TTS tool with --list-voices,
  --model hd/turbo, --voice, --speed, --volume, --pitch options
- tools/voiceover.py: add --provider minimax with --minimax-voice,
  --minimax-model, --volume, --pitch options; works in both single-file
  and per-scene modes
- tools/config.py: add get_minimax_api_key() helper
- brands/default/voice.json: add minimax config section
- README.md, CLAUDE.md: document MiniMax TTS usage
- tests/test_minimax_tts.py: 28 unit tests
- tests/test_minimax_tts_integration.py: 3 integration tests
Copy link
Copy Markdown
Collaborator

@ConalMullan ConalMullan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for this contribution — really well done! You've clearly studied the codebase and followed the existing patterns closely. The test coverage is solid and the voiceover.py integration is clean. I'm keen to try MiniMax out once this is merged — still on the lookout for the best TTS provider so this is great timing.

A couple of things to address before merging:

1. --volume and --pitch should be namespaced (medium)

These are added as top-level args in voiceover.py, but they're MiniMax-specific. The Qwen3 args use --speaker, --tone, etc. — for consistency and to avoid future collisions, these should be --minimax-volume and --minimax-pitch.

2. Missing toolkit-registry.json entry (gap)

Per our architecture, new tools should be added to _internal/toolkit-registry.json — that's the canonical catalog for all tools, skills, components, etc. CLAUDE.md and README are updated (nice!), but the registry needs an entry too. See the existing entries for qwen3_tts or sadtalker as examples.

Minor notes (non-blocking)

  • No input validation on speed/volume/pitch ranges — the docstring says speed is 0.5–2.0, volume 0.1–10.0, pitch -12 to 12, but values pass straight through to the API. This is consistent with how the other tools work, so not a blocker — just noting it.
  • Brand config default detection — checking args.minimax_voice == "English_Graceful_Lady" to detect "user didn't set it" is the same pattern as Qwen3's args.speaker == "Ryan" check, so it's fine for now.

Thanks again — looking forward to the update! 🎙️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants