Open
Description
Summary
Propose adding an OCR (Optical Character Recognition) backend to enable local document text extraction capabilities within Docker Model Runner.
Motivation
- Expand Docker Model Runner beyond text generation to include vision/document processing
- Enable privacy-focused local OCR without cloud dependencies
- Leverage existing model distribution and scheduling infrastructure
Proposed Implementation
- Create new OCR backend following existing patterns in
pkg/inference/backends/
- Integrate with popular document AI, e.g., layoutLMv3, Donut, and et cetera
- Support common image formats (PNG, JPEG, PDF)
- Expose OCR functionality through OpenAI-compatible API endpoints
Technical Considerations
- Follow existing backend interface in
pkg/inference/backends/llamacpp/llamacpp.go
- Leverage model distribution system for OCR model downloads
- Integrate with resource management for memory allocation
- Support both CPU and GPU acceleration where available
Questions for Maintainers
- Preferred document AI models?
- API endpoint design preferences?
- Model packaging/distribution strategy?
Comment
I would be very grateful in my work if I could easily test document AI; OCR locally!
Metadata
Metadata
Assignees
Labels
No labels