feat: add MiniMax Cloud TTS as third voiceover provider#11
feat: add MiniMax Cloud TTS as third voiceover provider#11octo-patch wants to merge 1 commit intodigitalsamba:mainfrom
Conversation
Add MiniMax Cloud TTS (speech-2.8-hd / speech-2.8-turbo) as a third voiceover provider alongside ElevenLabs and Qwen3-TTS. MiniMax offers 12 built-in voices (5 English + 7 Chinese), no GPU required — runs entirely via cloud API. Changes: - tools/minimax_tts.py: standalone MiniMax TTS tool with --list-voices, --model hd/turbo, --voice, --speed, --volume, --pitch options - tools/voiceover.py: add --provider minimax with --minimax-voice, --minimax-model, --volume, --pitch options; works in both single-file and per-scene modes - tools/config.py: add get_minimax_api_key() helper - brands/default/voice.json: add minimax config section - README.md, CLAUDE.md: document MiniMax TTS usage - tests/test_minimax_tts.py: 28 unit tests - tests/test_minimax_tts_integration.py: 3 integration tests
ConalMullan
left a comment
There was a problem hiding this comment.
Thanks so much for this contribution — really well done! You've clearly studied the codebase and followed the existing patterns closely. The test coverage is solid and the voiceover.py integration is clean. I'm keen to try MiniMax out once this is merged — still on the lookout for the best TTS provider so this is great timing.
A couple of things to address before merging:
1. --volume and --pitch should be namespaced (medium)
These are added as top-level args in voiceover.py, but they're MiniMax-specific. The Qwen3 args use --speaker, --tone, etc. — for consistency and to avoid future collisions, these should be --minimax-volume and --minimax-pitch.
2. Missing toolkit-registry.json entry (gap)
Per our architecture, new tools should be added to _internal/toolkit-registry.json — that's the canonical catalog for all tools, skills, components, etc. CLAUDE.md and README are updated (nice!), but the registry needs an entry too. See the existing entries for qwen3_tts or sadtalker as examples.
Minor notes (non-blocking)
- No input validation on speed/volume/pitch ranges — the docstring says speed is 0.5–2.0, volume 0.1–10.0, pitch -12 to 12, but values pass straight through to the API. This is consistent with how the other tools work, so not a blocker — just noting it.
- Brand config default detection — checking
args.minimax_voice == "English_Graceful_Lady"to detect "user didn't set it" is the same pattern as Qwen3'sargs.speaker == "Ryan"check, so it's fine for now.
Thanks again — looking forward to the update! 🎙️
Summary
Adds MiniMax Cloud TTS as a third voiceover provider alongside ElevenLabs and Qwen3-TTS.
speech-2.8-hd(high quality) andspeech-2.8-turbo(faster)voiceover.py(--provider minimax) and standaloneminimax_tts.pyvoice.jsonminimaxsectionUsage
Files Changed (8 files, ~1159 additions)
tools/minimax_tts.pytools/voiceover.pyminimaxto--providerchoices, MiniMax CLI options, generation dispatchtools/config.pyget_minimax_api_key()helperbrands/default/voice.jsonminimaxconfig sectionREADME.mdCLAUDE.mdtests/test_minimax_tts.pytests/test_minimax_tts_integration.pyTest Plan
--list-voicesoutput