Turn every voice message into clean, readable text β automatically, on your own Telegram account, in any chat.
A self-hosted Telegram userbot that transcribes voice messages in real time using AI speech recognition. It works in 1-on-1 DMs and group chats alike, listens to incoming and outgoing voices, optionally rephrases the transcription into polished text, and can clean up the original audio afterwards β all controlled per chat with simple slash commands.
- ποΈ Automatic transcription β every voice message becomes text, in DMs and groups
- π₯ Per-chat control β each 1:1 and each group keeps its own independent settings
- π Two directions β transcribe what you receive, what you send, or both
- π§ AI rephrasing β optionally clean up filler words while keeping your tone & style
- β‘ Built for speed β Groq's LPU or OpenAI Whisper, your choice per task
- ποΈ Mixed mode β e.g. Groq for fast transcription, OpenAI for high-quality rephrasing
- π§Ή Auto cleanup β delete the original voice note after transcription
- π©Ί Self-healing β a connection watchdog restarts the bot cleanly on network failures
- π Self-hosted β runs as a userbot under your account; no third-party storage
- π Multilingual β transcribes in the original spoken language
![]() |
![]() |
| Long voice notes are split into parts (1/2, 2/2) | Each transcript is posted as a reply with sender & duration |
- The bot watches your account for voice messages in all your chats (DMs and groups).
- A voice arrives β it downloads and transcribes the audio.
- (Optional) It rephrases the transcription for readability.
- The text is posted as a reply to the original voice message.
- (Optional) The original voice note is deleted to keep the chat tidy.
π‘ Pro tip: Run any command from a chat's Scheduled Messages view to keep both the command and its reply invisible to your chat partner.
The bot reacts to voice messages in every chat type β direct messages, groups and supergroups. Each chat is configured independently (settings are keyed by chat ID), so you decide per conversation what happens.
β‘ Important β it's ON everywhere by default. Transcription is automatically active in every chat the bot hasn't seen before (DMs and groups). There is no per-chat opt-in: if you don't want it in a particular group, you have to turn it off there with
/toff. You can flip this default withtranscription_enabled_new_chats: falseinconfig.yaml(see the Configuration section) β then new chats stay silent until you/tonthem.
| 1-on-1 (DM) | Group / Supergroup | |
|---|---|---|
| Incoming voice | your partner's voice notes | voice notes from any member |
| Outgoing voice | your own voice notes | your own voice notes |
| Settings scope | this DM only | this group only |
| Who can run commands | only you (the account owner) | only you β never other members |
| Where the reply appears | private between you two | visible to the whole group |
How to adjust a specific chat β send the command inside that exact chat; it only affects that one conversation (the setting is stored under that chat's ID). Since transcription is already ON everywhere, this is mostly about turning things off:
/toff β master switch OFF for THIS chat (silences it entirely, e.g. a noisy group)
/ton β master switch ON for THIS chat
/tin β toggle only the incoming direction (others' voices) in THIS chat
/tout β toggle only the outgoing direction (your own voices) in THIS chat
/toff overrides the direction toggles: while a chat is off, /tin / /tout have no effect
until you /ton it again.
β οΈ Group privacy: Because this is a userbot, every transcription is posted as you into the chat β in a group that means all members see it. If you only want transcriptions for yourself in a noisy group, either keep them in your DMs, or use/toffto mute the bot there. The Scheduled Messages trick keeps things invisible in 1-on-1 chats only.
π€ Voice notes sent by bots are skipped automatically.
- Python 3.10+
- A Telegram account + free API credentials from my.telegram.org
- An AI provider key β pick one (or both):
- Groq β free tier, extremely fast (recommended)
- OpenAI β paid, very accurate
# 1. Clone
git clone https://github.com/bjspi/voxscribe.git
cd voxscribe
# 2. Virtual environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# 3. Dependencies
pip install -r requirements.txtAll settings live in config.yaml (gitignored β your secrets stay local).
cp config.example.yaml config.yamlThen fill in your credentials:
telegram:
api_id: "123456"
api_hash: "your_telegram_api_hash"
phone_nr: "+49123456789"
account: "@your_username"
api:
provider:
transcription: "GROQ" # GROQ or OPENAI
rephrase: "GROQ" # GROQ or OPENAI
keys:
openai: "" # required if using OPENAI
groq: "" # required if using GROQ
models:
groq:
transcription: "whisper-large-v3"
rephrase: "openai/gpt-oss-120b"
openai:
transcription: "whisper-1"
rephrase: "gpt-4o-mini"The two prompts under prompts: in config.yaml have the biggest impact on output
quality β take a minute to adapt them before relying on the bot:
prompts.transcriptionsteers the speech-to-text model. Add your own jargon, names, product/brand names and recurring topics so they get spelled correctly, and set the expected punctuation/capitalization style.prompts.rephrasesteers how the transcript is cleaned up afterwards β control the tone, how aggressively filler words are removed, and how much restructuring is allowed.
Both prompts ship with sensible English defaults in config.example.yaml; treat them as a
starting point and make them yours. Per-chat overrides are also possible via /setprompt,
/setprompt_in and /setprompt_out.
Use a different provider for each task β fast transcription, high-quality rephrasing:
api:
provider:
transcription: "GROQ" # β‘ fast
rephrase: "OPENAI" # π§ high qualityBackward compatible: if you set
providerto a single string instead of atranscription/rephrasepair, that one provider is used for both tasks:api: provider: "GROQ" # used for transcription AND rephrasing
| Service | Where | Notes |
|---|---|---|
| Telegram API | my.telegram.org β API development tools | Free. Copy api_id + api_hash. |
| Groq | console.groq.com | Free tier, no credit card. |
| OpenAI | platform.openai.com | Pay-as-you-go. |
A few optional toggles in config.yaml control defaults and logging:
# Transcribe by default in chats the bot has never seen before?
# true (default) β ON everywhere, opt OUT per chat with /toff
# false β silent in new chats until you opt IN with /ton
transcription_enabled_new_chats: true
logging:
retention_days: 10
# Log full message content (transcripts, prompts, results)?
# false (default) β only a short, redacted preview is logged (privacy-friendly)
# true β full content in the logs (useful for debugging)
verbose: false| Setting | Default | Effect |
|---|---|---|
transcription_enabled_new_chats |
true |
Whether a brand-new chat (not yet in chats.json) transcribes automatically. Existing chats keep their own saved setting. |
logging.verbose |
false |
When off, transcription/rephrasing content is logged only as a ~200-char preview. Turn on to log full content while debugging. |
python bot.pyOn the first launch you'll authenticate your Telegram account:
- Enter the verification code sent to Telegram
- Confirm the login on your other devices
The session is stored in session/ (gitignored) so you only log in once.
Send these in the chat you want to configure (a DM or a group). Only you can trigger them β other group members can't. In 1-on-1 chats, sending via Scheduled Messages keeps them invisible to your partner.
| Command | Action |
|---|---|
/helpv |
Show all commands + current settings |
/statusv |
Show current transcription settings |
/ton |
Enable transcription globally for this chat |
/toff |
Disable transcription globally for this chat |
/tin |
Toggle transcription of incoming voices |
/tout |
Toggle transcription of outgoing voices |
/rephrase |
Toggle AI rephrasing of transcriptions |
/delin |
Toggle deletion of incoming voices after transcription |
/delout |
Toggle deletion of outgoing voices after transcription |
/prompt |
Show the current rephrasing prompt |
/prompts |
Show prompts overview (custom / default) |
/setprompt |
Set a custom rephrasing prompt |
/setprompt_in |
Set a custom rephrasing prompt for incoming messages |
/setprompt_out |
Set a custom rephrasing prompt for outgoing messages |
The commands are built from small, memorable building blocks:
| Block | Means | Mnemonic |
|---|---|---|
t⦠|
transcription | ton toff tin tout |
del⦠|
delete the voice note | delin delout |
β¦in / β¦out |
direction β incoming vs outgoing | tin / tout, delin / delout |
on / off |
global on/off for the chat | ton / toff |
β¦v |
the voice-bot namespace (avoids clashing with /help etc.) |
helpv statusv |
So tin = transcribe incoming (toggle), delout = delete outgoing, ton =
transcription on. Once it clicks, you'll never open /helpv again.
- Speed β Groq's LPU inference is blisteringly fast
- Free tier β generous limits, no credit card
- Quality β comparable accuracy to OpenAI Whisper
Free-tier limits (2025): ~10,000 requests/day, ~10 requests/min β plenty for personal use.
A background connection watchdog pings Telegram on an interval. After repeated failures it raises a ConnectionHealthError and exits with a non-zero code, so a process manager
(supervisor / systemd / PM2) can restart the bot cleanly β no infinite reconnect loops.
Whether the bot actually exits on connection loss is controlled by
recovery.watchdog_hard_exit in config.yaml:
| Value | Behaviour | Use when |
|---|---|---|
true (default) |
After telegram_healthcheck_max_failures failed checks the bot exits with code 1 so your process manager restarts it. |
You run it under supervisor / systemd / PM2. |
false |
The bot keeps running, resets the counter and relies on Pyrogram's built-in auto-reconnect. | You run python bot.py directly, without a process manager. |
Example supervisor config
[program:voice_transcription]
command=/path/to/venv/bin/python /path/to/bot.py
directory=/path/to/voice_transcriber
autostart=true
autorestart=true
startretries=10
stderr_logfile=/var/log/voice_transcription.err.log
stdout_logfile=/var/log/voice_transcription.out.logTune the watchdog in config.yaml under recovery: (interval, timeout, max failures, shutdown timeout, hard-exit).
voxscribe/
βββ bot.py # Entry point β python bot.py
βββ requirements.txt
βββ config.example.yaml # template (committed)
βββ config.yaml # your secrets (gitignored)
βββ chats.json # per-chat settings (gitignored, auto-created)
βββ src/
β βββ handlers.py # command + voice handlers
β βββ transcription.py # transcription & rephrasing logic
β βββ helpers.py # config loading, message utils
β βββ logging.py # central logging setup
βββ session/ # Telegram session files (gitignored)
βββ logs/ # daily rotating logs (gitignored)
Every chat the bot interacts with gets its own entry in chats.json, keyed by the numeric
Telegram chat ID. The file is created and updated automatically whenever you run a command
or a voice is transcribed β you normally never edit it by hand. It is gitignored (it maps
your private chats) and written atomically, with a .json.backup kept alongside it.
- Values are
1(on) /0(off); prompt fields are strings (empty = use the global prompt). - Missing keys fall back to defaults, so a minimal
{ "transcription": 1 }entry is valid. - A chat with no entry at all uses
transcription_enabled_new_chatsto decide whether it starts ON or OFF (see the Configuration section).
| Problem | Fix |
|---|---|
TgCrypto is missing warning |
Harmless; pip install tgcrypto to silence and speed up |
| Authentication errors | Verify api_id / api_hash; phone number in international format (+49β¦) |
| API key errors | Check the key is active and the matching provider is set |
Logs: one file per day in logs/bot_YYYYMMDD.log. Retention is configurable via
logging.retention_days in config.yaml (default: 10 days).
- Fork the repo
- Create a feature branch
- Commit your changes
- Push and open a Pull Request
Released under the MIT License β see LICENSE.
- Pyrogram β Telegram MTProto framework
- OpenAI Whisper β speech recognition
- Groq β ultra-fast LLM inference
- PyYAML β YAML parsing
This is a userbot that automates actions on your personal Telegram account. Use it responsibly and in line with Telegram's Terms of Service. The authors are not responsible for misuse or account restrictions.


{ "11122233": { // chat ID (a person or a group) "chatname": "ALICE", // cached display name, just for readability "transcription": 1, // master switch for this chat (/ton Β· /toff) "transcription_in": 1, // transcribe incoming voices (/tin) "transcription_out": 1, // transcribe your own voices (/tout) "rephrasing": 0, // AI rephrasing on/off (/rephrase) "delete_incoming_voice": 0,// delete incoming after text (/delin) "delete_outgoing_voice": 0,// delete outgoing after text (/delout) "rephrase_prompt": "", // legacy/global custom prompt "rephrase_prompt_in": "", // custom prompt, incoming (/setprompt_in) "rephrase_prompt_out": "" // custom prompt, outgoing (/setprompt_out) }, "44455566": { // partial entries are fine β missing "transcription": 1 // keys fall back to the defaults } }