We've seen 13 instances of PDFs with a colour space that causes the below issue. I think if we wanted to deal with this we would have to run ocrmypdf, detect this type of failure and then if it happens try again with --color-conversion-strategy set to RGB or something
However, 13 is not that many so maybe this isn't a priority
Error in 'OcrMyPdfExtractor processing ArbVREi_dU0nQSH-GoRwgq1dRqSEXvtVh_QiycqaCKmADPSJnbgjf3bCWI6l3KZdx1Mas2U_jVveS5j1nofZTw': java.lang.IllegalStateException: OcrMyPdfExtractor error openjpeg warning: unspec CS. 1 component so assuming gray.
openjpeg warning: unspec CS. 1 component so assuming gray.
openjpeg warning: unspec CS. 1 component so assuming gray.
Start processing 2 pages concurrently
1 redoing OCR
2 redoing OCR
3 redoing OCR
4 redoing OCR
Postprocessing...
ColorConversionNeededError: The input PDF has an unusual color space. Use
--color-conversion-strategy to convert to a common color space
such as RGB, or use --output-type pdf to skip PDF/A conversion
and retain the original color space.
at extraction.ocr.BaseOcrExtractor.extract(BaseOcrExtractor.scala:36)
at extraction.FileExtractor.extract(FileExtractor.scala:22)
at extraction.Worker.safeInvokeExtractor(Worker.scala:150)
at extraction.Worker.$anonfun$executeBatch$3(Worker.scala:94)
We've seen 13 instances of PDFs with a colour space that causes the below issue. I think if we wanted to deal with this we would have to run ocrmypdf, detect this type of failure and then if it happens try again with
--color-conversion-strategyset to RGB or somethingHowever, 13 is not that many so maybe this isn't a priority