Skip to content

EmZod/speak

Repository files navigation

                          ███████╗██████╗ ███████╗ █████╗ ██╗  ██╗
                          ██╔════╝██╔══██╗██╔════╝██╔══██╗██║ ██╔╝
                          ███████╗██████╔╝█████╗  ███████║█████╔╝ 
                          ╚════██║██╔═══╝ ██╔══╝  ██╔══██║██╔═██╗ 
                          ███████║██║     ███████╗██║  ██║██║  ██╗
                          ╚══════╝╚═╝     ╚══════╝╚═╝  ╚═╝╚═╝  ╚═╝

Talk to your Claude.

License Voice Cloning Platform

Voice cloning. Long documents. Audiobook quality. Local & private.

speak article.md --stream → Audio starts in seconds


Install

For AI Agents (Claude Code, Cursor, Windsurf):

npx skills add EmZod/speak

CLI:

git clone https://github.com/EmZod/speak.git
cd speak && bun install
alias speak="bun run $(pwd)/src/index.ts"

Requirements: macOS Apple Silicon · Bun · Python 3.10+ · sox (brew install sox)


Usage

speak "Hello, world!" --play        # Generate and play
speak article.md --stream           # Stream long content  
speak document.md --output out.wav  # Save to file
speak --clipboard --play            # Read from clipboard

Voice Cloning

Clone any voice from a 10-30 second sample:

# Use your cloned voice
speak "Hello" --voice ~/.chatter/voices/morgan_freeman.wav --play

Long Documents

speak book.md --auto-chunk --output book.wav    # Auto-chunk for reliability
speak --resume manifest.json                     # Resume interrupted generation
speak *.md --output-dir ~/Audio/                 # Batch processing
speak --estimate document.md                     # Estimate duration first

Commands

speak <text|file>      Generate speech
speak health           Check system status
speak models           List available models
speak concat <files>   Combine audio files
speak daemon kill      Stop TTS server

Options

--play          Play after generation
--stream        Stream as it generates
--output        Output file or directory
--voice         Custom voice file (WAV)
--auto-chunk    Chunk long documents
--estimate      Show duration estimate
--dry-run       Preview without generating

Performance

Long documents     ████████████████████  Streaming, auto-chunk
Voice cloning      ████████████████████  Any voice from sample
Emotion tags       ████████████████████  [laugh], [sigh], etc.
Quality            ████████████████████  Audiobook grade

See Also

Need instant audio (~90ms)? Try speakturbo.


Documentation

File Content
SKILL.md Full usage guide for agents
docs/usage.md Complete CLI reference
docs/troubleshooting.md Common issues & fixes
AGENTS.md Architecture & development

MIT License · Built on Chatterbox TTS

About

A fast CLI tool for Agents to convert their text output to speech using Chatterbox TTS on Apple Silicon. Agent SKILL files included.

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors