TrOCR / Donut — Offline OCR Demo (Transformers.js)

Overview

This project demonstrates a privacy-focused Optical Character Recognition (OCR) application built using Transformers.js. It allows users to extract text from images entirely offline after loading a pre-trained model. No internet connection is required for OCR processing once the model is loaded. This ensures complete data privacy, as no images or data are ever transmitted to a server. The application is designed to be easily deployable as a static website, such as on GitHub Pages, requiring no backend infrastructure or inference servers.

Key Features

Offline OCR: The core functionality is performing OCR locally in the browser, guaranteeing data privacy. Once the model is downloaded, the application functions without an internet connection.
Privacy-Focused: All image data stays in the browser and no data is sent to external servers.
Model Variety: Supports both single-line (TrOCR) and multi-line (Donut) OCR models. Choose the appropriate model based on your needs:
- TrOCR: Ideal for extracting text from single lines, such as scanned documents or cropped images.
- Donut: Designed for document OCR, capable of handling full-page screenshots and complex layouts.
Model Selection: A dropdown menu allows users to select from several pre-trained models.
Image Input: Users can upload images directly from their computer or load a sample screenshot.
Cropping (Optional): A basic cropping tool allows users to select a specific region of the image for OCR, improving accuracy and performance.
Raw Output & Decoded Text: Displays both the raw JSON output from the OCR pipeline and the decoded text.
Copy to Clipboard: Easily copy the extracted text to the clipboard.
No Backend Required: The application is entirely client-side and does not require any server-side components or inference APIs.
Easy Deployment: Can be deployed as a static website on platforms like GitHub Pages.

How it Works

Model Loading: The application uses the @xenova/transformers library to load a pre-trained OCR model directly in the browser. This process may take some time, especially for larger models. A progress bar indicates the loading status.
Offline Functionality: Once the model is loaded, it is cached in the browser. The application can then perform OCR on images even without an internet connection.
Image Processing: When an image is uploaded, it is displayed on a canvas element. The user can optionally crop the image to focus on the relevant text.
OCR Pipeline: The selected OCR model (TrOCR or Donut) is used to extract text from the image.
Output: The extracted text is displayed in a text area, along with the raw JSON output from the OCR pipeline.

Technologies Used

HTML, CSS, javascript-protocol: The core web technologies.
Transformers.js (@xenova/transformers): A JavaScript library for running transformer models in the browser. https://xenova.io/transformers/
Web Workers: Used to offload the computationally intensive OCR tasks from the main thread, ensuring a responsive user interface.

Demo

https://harisnae.github.io/OfflineWebOCR/

Getting Started

Clone the Repository:

git clone https://github.com/harisnae/OfflineWebOCR/
cd OfflineWebOCR

Open index.html in your browser: Simply open the index.html file in any modern web browser.

Deployment

This application is designed to be easily deployed as a static website. Here's how to deploy it to GitHub Pages:

Create a GitHub Repository: Create a new public repository on GitHub.
Push the Code: Push the contents of this project to the repository.
Enable GitHub Pages:
- Go to your repository's settings.
- Navigate to the "Pages" section.
- Select the main branch (or the branch containing your code) as the source.
- GitHub Pages will automatically deploy your website.

Acknowledgements

@xenova/transformers – JavaScript port of Hugging Face Transformers, enabling on‑device inference.
TrOCR – Microsoft’s Transformer‑based OCR model for handwritten/printed single lines.
Donut – Layout‑aware document OCR model from the same research group.
GitHub Pages – For providing a free static‑site hosting platform.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Disclaimer

The accuracy of the OCR results depends on the quality of the image and the selected model. The application is provided "as is" without any warranty. Use at your own risk.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
LICENSE		LICENSE
README.md		README.md
index.html		index.html
sample_screenshot.png		sample_screenshot.png
script.js		script.js
styles.css		styles.css
worker.js		worker.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TrOCR / Donut — Offline OCR Demo (Transformers.js)

Overview

Key Features

How it Works

Technologies Used

Demo

Getting Started

Deployment

Acknowledgements

License

Disclaimer

About

Uh oh!

Releases

Packages

Languages

License

harisnae/OfflineWebOCR

Folders and files

Latest commit

History

Repository files navigation

TrOCR / Donut — Offline OCR Demo (Transformers.js)

Overview

Key Features

How it Works

Technologies Used

Demo

Getting Started

Deployment

Acknowledgements

License

Disclaimer

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages