A real-time speech-to-text and translation tool with a live terminal interface.
Built on top of the excellent
KoljaB/RealtimeSTTlibrary.
- Real-time speech recognition from microphone (with available input device listing)
- Asynchronous translation using deep-translator (Google Translate) or OpenAI
- Live updating table in the terminal using rich
- Device selection and device listing via command line
- Proxy support for translation requests
- Language selection for both input and translation
- Support for multiple languages
- Logging of transcripts and translations (see
app.log,transcript.log,transcript_with_translation.log) - Context management for improved translation accuracy
- Modular translation and rendering backends (see
translator/andrenderer/)
├── realtime_stt.py # Main logic for real-time speech translation
├── input_devices.py # Audio device utilities
├── translator/ # Translation interfaces and implementations
│ ├── __init__.py
│ ├── base.py
│ ├── factory.py
│ ├── google_translator.py
│ └── openai_translator.py
├── compressor/ # Context compressor for OpenAI
│ ├── __init__.py
│ ├── base.py
│ └── openai_compressor.py
├── renderer/ # Output rendering (terminal, HTML, etc.)
│ ├── __init__.py
│ ├── base.py
│ ├── factory.py
│ ├── html_fastaip_renderer.py
│ └── rich_render.py
├── requirements.txt # Project dependencies
├── readme.md # Documentation
├── app.log # Application log
├── transcript.log # Raw transcript log
├── transcript_with_translation.log # Transcript with translation log
├── LICENSE # License file
conda create -n online_speech_translate python=3.12
conda activate online_speech_translatepython3 -m venv .venv
source .venv/bin/activate # On Windows use: .venv\Scripts\activateBefore installing dependencies, run:
sudo apt-get update
sudo apt-get install python3-dev portaudio19-devBefore installing dependencies, run:
brew install portaudioThen install Python dependencies:
git clone https://github.com/nikkiw/realtime_translator.git
cd realtime_translator
pip install -r requirements.txtpython realtime_stt.py --list_devicesTo run the real-time speech translator, execute:
python realtime_stt.py --input_device_index <device_index> --input_lang <source_language> --translate_lang <target_language>--input_device_index: Index of the audio input device (default is 0).--input_lang: Language spoken by the speaker (2-letter code, e.g., 'en', one of:en,ru,de,pt,it,es,fr).--translate_lang: Target translation language (2-letter code, e.g., 'ru', one of:en,ru,de,pt,it,es,fr).--proxy: Optional proxy URL for translation requests.--translator_type: Type of translator to use ('google' or 'openai').--openai_api_key: OpenAI API key for using the OpenAI translator.--renderer: Output rendering engine. Options:rich: Shows a live-updating, color table in the terminal (recommended for CLI use).html_fastaip: Outputs results to an HTML page (convenient for web integration or browser viewing, url: http://127.0.0.1:8090).
--log_level: Logging level (default: INFO).--list_devices: List available audio devices and exit.
python realtime_stt.py --input_device_index 0 --input_lang en --translate_lang ru --renderer html_fastaip --proxy http://user:pass@host:port- rich: Displays results in a modern, colorized table directly in your terminal. Supports live updates and is ideal for command-line workflows.
- html_fastaip: Renders results to an HTML file for viewing in a browser or embedding in web apps. Useful for sharing or integrating with other tools.
- Python 3.12
- See
requirements.txtfor Python dependencies
app.log: General application logtranscript.log: Raw recognized texttranscript_with_translation.log: Recognized text with translation
Contributions are welcome! Please submit a pull request or open an issue for any enhancements or bug fixes.
This project is licensed under the MIT License. See the LICENSE file for more details.