δΈζ | English
mano-asr is a local speech recognition service for vertical domains, deeply optimized for Apple Silicon via Cider. Purpose-built for internet / IT office scenarios, it is closely adapted to high-frequency workplace use cases such as meeting notes, technical discussions, product reviews, and engineering dictation. Through targeted optimization on domain data, mano-asr accurately recognizes English terms, acronyms, and product names (e.g. Kubernetes, FastAPI, PRD, Code Review) as well as mixed Chinese-English speech, effectively addressing the term-misrecognition and code-switching segmentation issues common to general-purpose models β so transcripts come out clear, domain-aware, and accurate. The service runs fully locally, works out of the box, and keeps audio and transcript data on your machine.
Core capabilities:
- π― Vertical-domain optimization β optimized on internet / IT office data; accurate on English terms, acronyms, product names, and mixed Chinese-English speech.
- π Native Apple Silicon β MLX-based local inference on M-series chips, further optimized with our in-house acceleration framework Cider.
- π Fully local, privacy-first β audio and transcripts never leave your machine.
- βοΈ VAD segmentation β optional FSMN VAD splits long audio and transcribes segment by segment.
- π§© Pluggable engines β supports Fun-ASR-Nano, Qwen3-ASR and more base models, switchable with one command.
- π·οΈ @Mention replacement β auto-fix nicknames and transliterated names in transcripts via a visual page. See Mentions.
- β‘ One-command start β install via
brew install, thenmano-asr start.
Changelog Β· Models Β· Installation Β· Usage Β· API Β· License Β· Acknowledgments Β·
See the full release history on the Releases page.
- 2026-06-09 β Added @mention replacement with a visual management page (
mano-asr mentions) for editing nickname β canonical-name mappings; spoken "θΎηΉ" is normalized to@before replacement. (v0.1.15 fixes packaging so the web page ships in the Homebrew build.) - 2026-05-29 β Released the first ASR model for internet office scenarios, with written-style transcription output and accurate recognition of industry-specific terminology.
- 2026-05-26 β First release: FastAPI transcription service, FunASR-Nano engine, FSMN VAD, hotword extraction, session logging.
mano-asr uses a pluggable engine design and supports several mainstream ASR base models. Switch with a single command: mano-asr model use <name>.
| Model | Base model | Quant | Size | Languages | Links |
|---|---|---|---|---|---|
| Mano-ASR-0.8B (default) | Fun-ASR-Nano | 8bit | 0.8 GB | ZH / EN | π€ Β· π€ Β·π |
The model is downloaded automatically from HuggingFace or ModelScope (China mirror); the source is chosen by network environment on first run.
brew tap mano-asr/mano-asr
brew install mano-asr
# Start (first run auto-initializes + downloads the default model)
mano-asr start
mano-asr doctor # environment check# 1. Dependency: ffmpeg (decodes non-WAV audio)
brew install ffmpeg
# 2. Clone + install
git clone https://github.com/Mininglamp-AI/mano-asr.git
cd mano-asr
python3 -m venv .venv && source .venv/bin/activate
pip install -U pip
pip install -e .
# 3. Download the model
hf download Mininglamp-2718/Mano-ASR-0.8B-Instruct-1.0-MLX-8bit \
--local-dir models/Mininglamp-2718/Mano-ASR-0.8B-Instruct-1.0-MLX-8bit
# Behind a China mirror:
# HF_ENDPOINT=https://hf-mirror.com hf download ...
# 4. Start the server
python3 server.py \
--model-path models/Mininglamp-2718/Mano-ASR-0.8B-Instruct-1.0-MLX-8bit \
--vad-model-path models/fsmn-vad-mlx \
--host 0.0.0.0 --port 8787 --load-on-startupRequirements: macOS (Apple Silicon) Β· Python 3.10+ Β· ffmpeg / ffprobe on PATH.
The example below shows audio translation: transcribe speech and translate it into Chinese.
# On first run, the service auto-initializes and downloads the default model
mano-asr start
# Transcribe / translate an audio file
mano-asr transcribe assets/BAC009S0764W0129.wav
from core.auto_model import AutoModel
model = AutoModel(
model="models/Mininglamp-2718/Mano-ASR-0.8B-Instruct-1.0-MLX-8bit",
vad_model="models/fsmn-vad-mlx", # optional: auto-segment long audio
)
text = model.generate(
"assets/BAC009S0764W0129.wav",
task="translate", # translation task
target_language="zh", # target language: Chinese
merge_vad=True,
)
print(text)
# -> "ηθ³εΊη°δΊ€ζε δΉεζ»ηζ
ε΅"curl -X POST http://127.0.0.1:8787/v1/voice/transcribe \
-F "audio=@assets/BAC009S0764W0129.wav" \
-F "personal_context=## Terms\n- FastAPI\n- Kubernetes" \
-F "mode=smart"{
"status": 200,
"text": "transcribed text",
"m": "mano-asr",
"engine": "mlx"
}Full API fields, limits and auth are documented under API.
Auto-replace casual nicknames and transliterated names in transcripts with the canonical spelling you want (e.g. @ε°ζ β @Xiaoming). Manage entries on a visual web page β no JSON editing required:
mano-asr start # Start the service (if not running)
mano-asr mentions # Open the management page in your browserπ Full guide: Mentions
Transcribe a single uploaded audio file. Request type: multipart/form-data.
| Field | Type | Required | Description |
|---|---|---|---|
audio |
file | yes | Audio file. Supported: .wav .mp3 .ogg .webm .m4a .flac |
context_text |
string | no | Existing text for append/edit modes; last 5000 chars kept |
chat_context |
string | no | Chat context; last 20000 chars kept |
personal_context |
string | no | Personal correction / hotword context; last 10000 chars kept |
member_context |
string | no | Member context; last 5000 chars kept |
mode |
string | no | smart / append_only / edit_only, default smart |
Limits: default max file 30 MiB, max duration 660 s; edit_only requires context_text.
Returns current service limits and engine metadata.
curl http://127.0.0.1:8787/v1/voice/configDisabled by default. If started with --auth-token, requests must carry Authorization: Bearer <token>.
python3 server.py --model-path <path> --auth-token "$MANO_ASR_TOKEN"Released under the MIT License.
Copyright (c) 2026 MININGLAMP Technology.
mano-asr would not be possible without these excellent open-source projects:
- MLX & mlx-audio β Apple's machine-learning framework and audio toolkit, the foundation of mano-asr's local inference.
- FunASR / FunAudioLLM β source of Fun-ASR-Nano and FSMN-VAD, providing strong Chinese speech recognition.
- Qwen3 β the base model behind the Qwen3-ASR engine.
- mlx-community β high-quality MLX quantized models.
- ModelScope & Hugging Face β model hosting and distribution.
- FastAPI β high-performance web framework.
Thanks to everyone contributing to the open-source speech recognition community.