End-to-end OCR + HTR pipeline for the Kaggle Handwritten to Data 2026 hackathon using the RUKOPYS handwritten document dataset.
This repository contains a complete project scaffold:
- Document preprocessing
- Layout/text-region detection
- Region classification
- OCR/HTR recognition
- Ukrainian language correction hooks
- Kaggle submission generation
- Evaluation utilities
- Streamlit demo app
- Hackathon documentation and architecture notes
.
├── app/
│ └── streamlit_app.py
├── configs/
│ └── default.yaml
├── data/
│ ├── raw/
│ ├── processed/
│ └── submissions/
├── docs/
│ ├── architecture.md
│ └── hackathon_report.md
├── scripts/
│ ├── run_inference.py
│ └── train_baseline.py
├── src/
│ └── rukopys_ocr/
│ ├── config.py
│ ├── correction.py
│ ├── detection.py
│ ├── metrics.py
│ ├── pipeline.py
│ ├── preprocessing.py
│ ├── recognition.py
│ ├── schemas.py
│ └── submission.py
└── tests/
├── test_metrics.py
└── test_pipeline_smoke.py
Create an environment and install dependencies:
python -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txtRun the demo app:
streamlit run app/streamlit_app.pyRun batch inference:
python scripts/run_inference.py --input data/raw/test_images --output data/submissions/submission.csvRun tests:
pytestPlace Kaggle or HuggingFace-exported data in data/raw/:
data/raw/
├── train_images/
├── test_images/
├── train.jsonl
├── silver.jsonl
└── sample_submission.csv
The pipeline accepts common image formats: .png, .jpg, .jpeg, .tif, .tiff, .bmp.
The default implementation is intentionally modular. It includes a lightweight classical fallback so the project runs without downloading large models, while leaving clear extension points for production models:
- Detector: contour-based fallback, replaceable with YOLO/Detectron2
- Recognizer: optional Tesseract/PaddleOCR-style adapter, replaceable with TrOCR/Qwen/Gemma VLM
- Corrector: dictionary and normalization hooks for Ukrainian post-processing
- Submission: Kaggle-ready CSV generation
Optimize the combined score:
0.15 * DetectionF1
+ 0.05 * ClassificationAccuracy
+ 0.30 * (1 - CER)
+ 0.50 * (1 - PageCER)
Shivam Singh
Department of Computer Science Engineering
AI Research and Development
2026