High Quality Local Text-to-Speech Generator
Generate high-quality speech from text using the powerful Kokoro TTS pipeline with an intuitive web interface.
๐ Quick Start โข ๐ฆ Download โข ๐ง Build โข ๐ Documentation โข ๐ค Contributing
- High-Quality TTS: Powered by the advanced Kokoro TTS pipeline
- Multiple Voices: Choose from a wide variety of natural-sounding voices
- Customizable Output: Adjust speech speed and pitch with precision
- Batch Processing: Generate audio from multi-paragraph text input with natural pauses
- Real-time Preview: Instant audio playback within the interface
- Modern Design: Built with NiceGUI for a sleek, responsive web interface
- Intuitive Controls: Simple, user-friendly experience
- Progress Indicators: Visual feedback for pipeline loading and audio generation
- Dark Mode: Easy on the eyes for extended use
- Responsive Layout: Works across devices and screen sizes
- WAV Format: High-quality audio output
- Automatic Naming: Unique identifiers for each generated file
- Local Processing: All data processed on your machine for privacy
- Cross-Platform: Works on Windows and Linux
- Download the appropriate release for your platform from the Releases page
- Extract the downloaded archive (if applicable)
- Run the application:
- Windows: Double-click the
.exefile - Linux: Make the AppImage executable (
chmod +x KokoroTTSGenerator.AppImage) and run it
- Windows: Double-click the
- Wait for the TTS pipeline to initialize on first run (may take a few minutes)
# Clone the repository
git clone https://github.com/WilleIshere/KokoroTTSGenerator.git
cd KokoroTTSGenerator
# Create virtual environment and install dependencies with UV
uv venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
uv sync
# Run the application
python app.py- Operating System: Windows 10/11 or Linux
- Memory: 4GB RAM minimum (8GB recommended)
- Storage: 2GB+ free space for model files and audio output
- Internet: Required for initial model download
- No Python installation needed for compiled versions
- Version: 0.1.0
- Formats:
-
Windows: Standalone executable (.exe) - no installation required
-
Linux: AppImage (.AppImage) - runs anywhere
-
Source: Python package (requires Python 3.12 & UV)
-
- Size: ~300MB (includes all dependencies and runtime)
โฌ๏ธ Download Latest Release
Note: For the compiled versions, no Python installation or additional dependencies are required. Everything is bundled in the executable.
- First Launch: Wait for the TTS pipeline to initialize (first run only)
- Select Voice: Choose from available voices in the dropdown
- Adjust Parameters: Set speech speed and pitch using the sliders
- Enter Text: Type or paste text into the text area
- Generate: Click "Generate Audio" to create speech
- Enjoy: Preview the audio directly in the app and download the WAV file
The application includes a variety of high-quality voices:
- Female Voices: af_alloy, af_aoede, af_bella, af_jessica, af_kore, af_nicole, af_nova, af_river, af_sarah, af_sky
- Male Voices: am_adam, am_echo, am_eric, am_fenrir, am_liam, am_michael, am_onyx, am_puck, am_santa
All voices are included in the compiled versions - no additional downloads required.
- Long Text: Break long texts into paragraphs for better processing
- Punctuation: Use proper punctuation for natural speech rhythm
- Speed & Pitch: Experiment with different settings for optimal results
- Browser Compatibility: Works best in modern browsers
This project has been architected with a modular design for maintainability and extensibility:
KokoroTTSGenerator/
โโโ ๐ app.py # Main entry point
โโโ ๐ src/ # Source code
โ โโโ gui.py # Web interface implementation
โ โโโ tts.py # TTS pipeline implementation
โโโ ๐ final_audio/ # Output directory for generated audio
โโโ ๐ temp/ # Temporary working directory
โโโ ๐ pyproject.toml # Dependencies and project configuration
โโโ ๐ uv.lock # UV dependencies lockfile
- Modern Web Interface: Built with NiceGUI for a responsive experience
- Efficient Pipeline: Fast, high-quality audio generation
- Clean Separation: UI and TTS logic kept separate for maintainability
- Python-powered: Leverages the best Python libraries for TTS
# Ensure you have Python 3.12 installed
python --version
# Clone the repository
git clone https://github.com/WilleIshere/KokoroTTSGenerator.git
cd KokoroTTSGenerator# Create virtual environment and install dependencies with UV
uv venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
uv sync
# For development tools
uv pip install -e ".[dev]"uv run app.py- Project Structure: Simple, modular design for easy maintenance
- Kokoro TTS: Leverages the powerful Kokoro TTS pipeline
- NiceGUI: Built with a modern web interface framework
- Compiled Versions: Standalone executables for all platforms
We welcome contributions! Here's how you can help:
# Fork and clone the repository
git clone https://github.com/yourusername/KokoroTTSGenerator.git
cd KokoroTTSGenerator
# Create virtual environment and install dependencies
uv venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
uv sync
uv pip install -e ".[dev]"- ๐ Bug Reports: Found an issue? Please open an issue
- ๐ก Feature Requests: Have an idea? We'd love to hear it
- ๐ง Code Contributions: Submit a pull request
- ๐ Documentation: Help improve our docs
- Follow PEP 8 style guidelines
- Add tests for new features
- Update documentation for changes
- Ensure cross-platform compatibility
- Frontend: NiceGUI (Python web interface framework)
- TTS Engine: Kokoro TTS pipeline (v0.9.4+)
- Audio: soundfile, numpy
- Package Management: UV (Fast, reliable Python package manager)
- Dependencies: kokoro, nicegui, torch, soundfile
- Distribution: Standalone executables for Windows and Linux
This project is licensed under the MIT License - see the LICENSE file for details.
- Kokoro TTS: Amazing TTS pipeline that powers this application
- NiceGUI: Beautiful modern web interface framework
- Python Community: For the incredible ecosystem of libraries
- ๐ Documentation: Check our comprehensive docs
- ๐ Issues: Report bugs or request features on GitHub
- ๐ฌ Discussions: Community Q&A and general discussion
- First Run Slow: Initial pipeline loading downloads models and may take a few minutes
- Memory Usage: TTS models require significant RAM; 8GB recommended for optimal performance
- Antivirus Warnings: Some antivirus software may flag compiled executables; these are false positives
- Linux Permissions: On Linux, remember to make AppImage files executable before running
- โจ Initial release with core functionality
- ๐ฏ Multiple voice options
- ๐๏ธ Speed and pitch controls
- ๐ฎ Web-based user interface
- ๐ High-quality audio output
- ๐ฆ Compiled versions for Windows and Linux
Made with โค๏ธ by WilleIshere