Open
Description
Request: Nougat OCR Integration
I suggest adding Nougat OCR into llama.cpp to enable the processing of scientific PDF documents.
This can act as a first step towards adding multimodal models to this project!
Implementation:
It seems that Nougat is based on standard transformer architecture (like Bart and Swin Transformer) and most of the work would be on figuring out how to add the image processing.
Let me know what you think!
P.S.: Love this repo! I hope to add my own retrieval-pretrained transformer at some point to this repo.