Open
Description
Some of the PDF files support T3 fonts that do not have embedded toUnicode mapping. Such fonts cannot be extracted from the document effectively. In such cases, usage of OCR might be useful. An OCR library like tesseract
or such which can be helpful in such extraction of font data. This will be a helpful possibility in such scenarios. It has to be made sure that a library used should not violate the MIT Expat License of the PDFIO.