Skip to content

Mininglamp-AI/mano-asr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

82 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

mano-asr

MLX License Stars HuggingFace ModelScope CN ModelScope AI

δΈ­ζ–‡ | English


Introduction

mano-asr is a local speech recognition service for vertical domains, deeply optimized for Apple Silicon via Cider. Purpose-built for internet / IT office scenarios, it is closely adapted to high-frequency workplace use cases such as meeting notes, technical discussions, product reviews, and engineering dictation. Through targeted optimization on domain data, mano-asr accurately recognizes English terms, acronyms, and product names (e.g. Kubernetes, FastAPI, PRD, Code Review) as well as mixed Chinese-English speech, effectively addressing the term-misrecognition and code-switching segmentation issues common to general-purpose models β€” so transcripts come out clear, domain-aware, and accurate. The service runs fully locally, works out of the box, and keeps audio and transcript data on your machine.

Core capabilities:

  • 🎯 Vertical-domain optimization β€” optimized on internet / IT office data; accurate on English terms, acronyms, product names, and mixed Chinese-English speech.
  • 🍎 Native Apple Silicon β€” MLX-based local inference on M-series chips, further optimized with our in-house acceleration framework Cider.
  • πŸ”’ Fully local, privacy-first β€” audio and transcripts never leave your machine.
  • βœ‚οΈ VAD segmentation β€” optional FSMN VAD splits long audio and transcribes segment by segment.
  • 🧩 Pluggable engines β€” supports Fun-ASR-Nano, Qwen3-ASR and more base models, switchable with one command.
  • 🏷️ @Mention replacement β€” auto-fix nicknames and transliterated names in transcripts via a visual page. See Mentions.
  • ⚑ One-command start β€” install via brew install, then mano-asr start.

Changelog Β· Models Β· Installation Β· Usage Β· API Β· License Β· Acknowledgments Β·


Changelog

See the full release history on the Releases page.

  • 2026-06-09 β€” Added @mention replacement with a visual management page (mano-asr mentions) for editing nickname β†’ canonical-name mappings; spoken "θ‰Ύη‰Ή" is normalized to @ before replacement. (v0.1.15 fixes packaging so the web page ships in the Homebrew build.)
  • 2026-05-29 β€” Released the first ASR model for internet office scenarios, with written-style transcription output and accurate recognition of industry-specific terminology.
  • 2026-05-26 β€” First release: FastAPI transcription service, FunASR-Nano engine, FSMN VAD, hotword extraction, session logging.

Models

mano-asr uses a pluggable engine design and supports several mainstream ASR base models. Switch with a single command: mano-asr model use <name>.

Model Base model Quant Size Languages Links
Mano-ASR-0.8B (default) Fun-ASR-Nano 8bit 0.8 GB ZH / EN πŸ€— Β· πŸ€– ·🌟

The model is downloaded automatically from HuggingFace or ModelScope (China mirror); the source is chosen by network environment on first run.


Installation

Option 1: Homebrew (recommended)

brew tap mano-asr/mano-asr
brew install mano-asr

# Start (first run auto-initializes + downloads the default model)
mano-asr start
mano-asr doctor   # environment check

Option 2: From source

# 1. Dependency: ffmpeg (decodes non-WAV audio)
brew install ffmpeg

# 2. Clone + install
git clone https://github.com/Mininglamp-AI/mano-asr.git
cd mano-asr
python3 -m venv .venv && source .venv/bin/activate
pip install -U pip
pip install -e .

# 3. Download the model
hf download Mininglamp-2718/Mano-ASR-0.8B-Instruct-1.0-MLX-8bit \
  --local-dir models/Mininglamp-2718/Mano-ASR-0.8B-Instruct-1.0-MLX-8bit

# Behind a China mirror:
# HF_ENDPOINT=https://hf-mirror.com hf download ...

# 4. Start the server
python3 server.py \
  --model-path models/Mininglamp-2718/Mano-ASR-0.8B-Instruct-1.0-MLX-8bit \
  --vad-model-path models/fsmn-vad-mlx \
  --host 0.0.0.0 --port 8787 --load-on-startup

Requirements: macOS (Apple Silicon) Β· Python 3.10+ Β· ffmpeg / ffprobe on PATH.


Usage

The example below shows audio translation: transcribe speech and translate it into Chinese.

CLI (recommended)

# On first run, the service auto-initializes and downloads the default model
mano-asr start

# Transcribe / translate an audio file
mano-asr transcribe assets/BAC009S0764W0129.wav

Python API

from core.auto_model import AutoModel

model = AutoModel(
    model="models/Mininglamp-2718/Mano-ASR-0.8B-Instruct-1.0-MLX-8bit",
    vad_model="models/fsmn-vad-mlx",   # optional: auto-segment long audio
)

text = model.generate(
    "assets/BAC009S0764W0129.wav",
    task="translate",        # translation task
    target_language="zh",    # target language: Chinese
    merge_vad=True,
)
print(text)
# -> "η”šθ‡³ε‡ΊηŽ°δΊ€ζ˜“ε‡ δΉŽεœζ»žηš„ζƒ…ε†΅"

HTTP API

curl -X POST http://127.0.0.1:8787/v1/voice/transcribe \
  -F "audio=@assets/BAC009S0764W0129.wav" \
  -F "personal_context=## Terms\n- FastAPI\n- Kubernetes" \
  -F "mode=smart"
{
  "status": 200,
  "text": "transcribed text",
  "m": "mano-asr",
  "engine": "mlx"
}

Full API fields, limits and auth are documented under API.

@Mention replacement

Auto-replace casual nicknames and transliterated names in transcripts with the canonical spelling you want (e.g. @小明 β†’ @Xiaoming). Manage entries on a visual web page β€” no JSON editing required:

mano-asr start        # Start the service (if not running)
mano-asr mentions     # Open the management page in your browser

πŸ“– Full guide: Mentions


API

POST /v1/voice/transcribe

Transcribe a single uploaded audio file. Request type: multipart/form-data.

Field Type Required Description
audio file yes Audio file. Supported: .wav .mp3 .ogg .webm .m4a .flac
context_text string no Existing text for append/edit modes; last 5000 chars kept
chat_context string no Chat context; last 20000 chars kept
personal_context string no Personal correction / hotword context; last 10000 chars kept
member_context string no Member context; last 5000 chars kept
mode string no smart / append_only / edit_only, default smart

Limits: default max file 30 MiB, max duration 660 s; edit_only requires context_text.

GET /v1/voice/config

Returns current service limits and engine metadata.

curl http://127.0.0.1:8787/v1/voice/config

Authentication

Disabled by default. If started with --auth-token, requests must carry Authorization: Bearer <token>.

python3 server.py --model-path <path> --auth-token "$MANO_ASR_TOKEN"

License

Released under the MIT License.

Copyright (c) 2026 MININGLAMP Technology.


Acknowledgments

mano-asr would not be possible without these excellent open-source projects:

  • MLX & mlx-audio β€” Apple's machine-learning framework and audio toolkit, the foundation of mano-asr's local inference.
  • FunASR / FunAudioLLM β€” source of Fun-ASR-Nano and FSMN-VAD, providing strong Chinese speech recognition.
  • Qwen3 β€” the base model behind the Qwen3-ASR engine.
  • mlx-community β€” high-quality MLX quantized models.
  • ModelScope & Hugging Face β€” model hosting and distribution.
  • FastAPI β€” high-performance web framework.

Thanks to everyone contributing to the open-source speech recognition community.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors