Transcribe‑ASR

Client‑side Automatic Speech Recognition powered by Whisper (ONNX) & Hugging Face Transformers

Overview

Transcribe‑ASR is a pure‑client web application that runs Whisper speech‑to‑text models directly in the browser.
All heavy lifting (model download, audio decoding, inference) happens locally; no audio data ever leaves the user's device, making the tool privacy‑first and usable offline after the initial model download.

Features

Zero‑Server Architecture: Runs entirely in the browser; no backend required.
Multilingual & English‑Only Whisper: Tiny, Base, and their .en- variants from Hugging Face.
ONNX Optimisation: Fast inference with fp32, fp16, q4, and q4f16 quantised weights.
Dynamic Resampling: Automatic 16 kHz conversion for any input sample rate.
Fine‑Grained Decoding Controls: Temperature, top‑p, top‑k, repetition penalty, max tokens.
Language & Task Switching: transcribe (speech‑to‑text) or translate (to English).
Progress & Logging UI: Real‑time status, download progress, and inference feedback.
Clipboard & Clear Utilities: One‑click copy of the transcript and quick reset.
Responsive Design: Works on desktop, tablet, and mobile browsers.
Offline‑Ready: Model files are cached via the Cache API; subsequent runs need no network.
Accessibility‑Friendly: Keyboard‑operable controls, focusable tooltips, and ARIA‑ready markup.

Demo

A live demo is hosted on GitHub Pages:

https://harisnae.github.io/transcribe-asr

Getting Started

Prerequisites

Requirement	Minimum version
Browser	Chrome ≥ 89, Edge ≥ 89, Firefox ≥ 86, Safari ≥ 14 (WebAssembly, Web Audio API, Cache API)
Internet	Required once to download the selected ONNX model (≈ 80‑200 MB depending on dtype)
Static server	Browsers block `fetch` of local files, so the app must be served over HTTP(S)

Running Locally

Clone the repository

git clone https://github.com/harisnae/transcribe-asr.git
cd transcribe-asr

Start a static file server (any of the following will work):

# Python 3
python -m http.server 8000

# Node.js (http‑server)
npx http-server -p 8000

# VS Code Live Server extension
#   → Right‑click index.html → "Open with Live Server"

Open the app in your browser
```
http://localhost:8000/
```
Click Load model. The selected ONNX files will be downloaded and cached. Subsequent loads are instantaneous and work offline.

How It Works

Model Loading – The UI calls pipeline('automatic-speech-recognition', repoId, {...}) from the Hugging Face transformers CDN. The pipeline fetches the ONNX encoder/decoder files and creates an ONNX Runtime session (device: 'auto' → CPU or WebGPU when available).
Audio Handling – The chosen audio file (or sample) is read as an ArrayBuffer, decoded with the Web Audio API, and resampled to 16 kHz (Whisper’s required sample rate) using an OfflineAudioContext when needed.
Inference – The resampled Float32Array is passed to the pipeline together with user‑defined generation options. An AbortController enables graceful cancellation.
Result Rendering – The pipeline returns { text: "…" }. The text is displayed in the Transcription result panel and can be copied to the clipboard.
Caching – After a successful load, the ONNX files are stored in the browser’s Cache Storage (caches.open('whisper-model')). The UI checks this cache on start‑up to indicate offline readiness.

User Guide

Model & Data Selection

Model – Choose a Whisper model from the dropdown (tiny/base, multilingual or English‑only).
Data – Upload a local audio file (.wav, .mp3, …) or pick a pre‑hosted sample from the Sample selector. The audio player will preview the file.

Decoding Parameters

Input	Description	Default
Temperature (`tempInput`)	Controls randomness; lower = more deterministic.	`0.6`
Top‑p (`toppInput`)	Nucleus sampling threshold.	`1.0`
Top‑k (`topkInput`)	Limits sampling to the k most likely tokens (0 = disabled).	`0`
Repetition Penalty (`repPenInput`)	Penalises repeated tokens.	`1.0`
Max New Tokens (`maxTokensInput`)	Upper bound on generated tokens per chunk (max 448).	`20`

Adjust these values to trade off speed vs. transcription quality.

Language & Task Options

Language – Auto‑detect (default) or force a specific language from the extensive list.
Task – transcribe (default) or translate (output in English). The task selector is enabled only for multilingual models.

Controls & Utilities

Button	Action
Load model	Downloads and prepares the selected Whisper model.
Transcribe	Starts inference on the loaded audio.
Stop	Cancels the current inference via `AbortController`.
Copy	Copies the transcript to the clipboard.
Clear	Clears the transcript area.

Privacy, Offline Use & Caching

Privacy‑First – Audio never leaves the client; all processing occurs locally in the browser.
Offline‑Ready – After the first download the ONNX files are cached. The app can be opened and used without any network connection.
No Telemetry – The code does not send analytics or usage data to any third‑party service.

Tip: If you want a completely self‑hosted solution, download the transformers.min.js bundle and the ONNX files once, then serve them from your own domain. Adding an integrity attribute (SRI) to the script tag is recommended for extra security.

Project Structure

├─ index.html          # Main UI markup
├─ style.css           # Responsive styling and tooltip handling
├─ app.js              # Core logic (model loading, audio handling, UI)
├─ README.md           # You are reading this file
└─ assets/             # Optional: icons, screenshots, etc.

Troubleshooting & Known Limitations

Symptom	Possible Cause	Fix
Model download stalls or fails	CORS / network issue, or the CDN is temporarily unavailable.	Retry later or host the ONNX files yourself.
“Web Audio API not supported”	Using an outdated browser (e.g., Safari < 14).	Upgrade to a modern browser.
UI freezes while loading a long audio file	Decoding/resampling runs on the main thread.	Keep audio files under ~30 s, or split longer files into chunks (future improvement).
Abort button does nothing	The pipeline only checks the abort signal between chunks.	Wait a few seconds; the abort will be processed as soon as the current chunk finishes.
Tooltips not read by screen readers	Tooltips are CSS‑only.	Use ARIA `aria-describedby` with a hidden `<div role="tooltip">` (planned enhancement).
Model size too large for a metered connection	fp32 models are ~200 MB.	Choose a lower‑precision dtype (`fp16`, `q4`, `q4f16`) to reduce download size.
Out‑of‑memory on low‑end devices	Very long audio or high‑precision models.	Limit file size (e.g., 10 MB) or use a quantised model.

Contributing

Contributions are welcome! Please follow these steps:

Fork the repository.
Create a feature branch (git checkout -b feat/your-feature).
Make your changes. Ensure the app still works in the latest browsers.
(Optional) Run a linter/formatter (npm run lint or your preferred tool).
Open a Pull Request with a clear description of the change.

For major changes, open an issue first to discuss the design.

License

This code is released under the MIT License.

The model weights referenced in the repository are released under the Apache License Version 2.0 (as provided by the original Hugging Face model authors).

Acknowledgements

OpenAI Whisper – the underlying speech‑to‑text model.
Hugging Face Transformers – JavaScript pipeline and model hub.
ONNX Runtime Web – fast WebAssembly inference.

Happy transcribing!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Transcribe‑ASR

Table of Contents

Overview

Features

Demo

Getting Started

Prerequisites

Running Locally

How It Works

User Guide

Model & Data Selection

Decoding Parameters

Language & Task Options

Controls & Utilities

Privacy, Offline Use & Caching

Project Structure

Troubleshooting & Known Limitations

Contributing

License

Acknowledgements

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
assets		assets
LICENSE		LICENSE
README.md		README.md
app.js		app.js
index.html		index.html
style.css		style.css

License

harisnae/transcribe-asr

Folders and files

Latest commit

History

Repository files navigation

Transcribe‑ASR

Table of Contents

Overview

Features

Demo

Getting Started

Prerequisites

Running Locally

How It Works

User Guide

Model & Data Selection

Decoding Parameters

Language & Task Options

Controls & Utilities

Privacy, Offline Use & Caching

Project Structure

Troubleshooting & Known Limitations

Contributing

License

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages