A comprehensive collection of examples and configurations for getting the best results from the Nanonets-OCR2 model. This repository provides practical implementations for various OCR use cases including financial documents, complex tables, and multilingual content.
- Image to Markdown Converter: Comprehensive notebook showcasing the best practices for using Nanonets-OCR2-3B model across various document types including:
- Bank statements and financial documents
- Complex tables and structured data
- Photos captured from mobile devices
- Multilingual images and documents
- Python 3.11
- CUDA-compatible GPU (recommended for optimal performance)
uvpackage manager
-
Install uv package manager:
curl -LsSf https://astral.sh/uv/install.sh | sh -
Create virtual environment and install dependencies:
# Create virtual environment with Python 3.11 uv venv --python=3.11 # Activate the virtual environment source .venv/bin/activate # Install dependencies uv pip install -r requirements.txt
-
Start Jupyter Lab:
jupyter lab
This project uses the following key dependencies:
- PyTorch: Deep learning framework with CUDA support
- Transformers: Hugging Face transformers library for model inference
- Jupyter: Interactive notebook environment
- PDF2Image: PDF document processing
- Accelerate: Model acceleration and optimization
- Clone this repository
- Follow the installation steps above
- Open the Jupyter notebook in
Nanonets-OCR2-Cookbook/image2md.ipynb - Run the cells to see examples of OCR processing
Contributions are welcome! Please feel free to submit pull requests or open issues for bugs and feature requests.