Skip to content

Feature Request: Add OCR Backend Support for Local Document Processing #64

Open
@20yuto20

Description

@20yuto20

Summary

Propose adding an OCR (Optical Character Recognition) backend to enable local document text extraction capabilities within Docker Model Runner.

Motivation

  • Expand Docker Model Runner beyond text generation to include vision/document processing
  • Enable privacy-focused local OCR without cloud dependencies
  • Leverage existing model distribution and scheduling infrastructure

Proposed Implementation

  1. Create new OCR backend following existing patterns in pkg/inference/backends/
  2. Integrate with popular document AI, e.g., layoutLMv3, Donut, and et cetera
  3. Support common image formats (PNG, JPEG, PDF)
  4. Expose OCR functionality through OpenAI-compatible API endpoints

Technical Considerations

  • Follow existing backend interface in pkg/inference/backends/llamacpp/llamacpp.go
  • Leverage model distribution system for OCR model downloads
  • Integrate with resource management for memory allocation
  • Support both CPU and GPU acceleration where available

Questions for Maintainers

  • Preferred document AI models?
  • API endpoint design preferences?
  • Model packaging/distribution strategy?

Comment

I would be very grateful in my work if I could easily test document AI; OCR locally!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions