The OCR-PDF project is designed to extract text from PDF documents using Optical Character Recognition (OCR) techniques. This project leverages various OCR models to provide accurate and efficient text extraction.
- Support for multiple OCR models including:
- Gemini
- OpenAI
- Qwen-2.5-vl(locale model)
- Ability to convert extracted text into Word documents.
- Easy integration with existing workflows.
after testing the performance of the models, the results are as follows:
| Model Name | Accuracy | Speed |
|---|---|---|
| Gemini | Low | Fast |
| OpenAI | Low | Fast |
| Qwen(Locale Model) | High | Very Slow |
(I use nvidia 4080 for testing)
To install the necessary dependencies, run the following command I'm not giving a requirements.txt file, because different model requires different, so you can install the dependencies by yourself. (You can use conda to create a new environment and install the dependencies, python 3.11 are recommended), note that if you want to use GPU, you need to install the corresponding CUDA version (https://pytorch.org/get-started/locally/).
Contributions are welcome! Please submit a pull request or open an issue for any enhancements or bug fixes.
This project is licensed under the MIT License. See the LICENSE file for more details.