Skip to content

Request: Nougat OCR Integration #3294

Open
@OhadRubin

Description

@OhadRubin

Request: Nougat OCR Integration

I suggest adding Nougat OCR into llama.cpp to enable the processing of scientific PDF documents.
This can act as a first step towards adding multimodal models to this project!

Implementation:
It seems that Nougat is based on standard transformer architecture (like Bart and Swin Transformer) and most of the work would be on figuring out how to add the image processing.

Let me know what you think!
P.S.: Love this repo! I hope to add my own retrieval-pretrained transformer at some point to this repo.

Metadata

Metadata

Assignees

No one assigned

    Labels

    help wantedExtra attention is neededmodelModel specific

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions