Address problems with colour conversion in ocrmypdf for certain PDFs

We've seen 13 instances of PDFs with a colour space that causes the below issue. I think if we wanted to deal with this we would have to run ocrmypdf, detect this type of failure and then if it happens try again with `--color-conversion-strategy` set to RGB or something

However, 13 is not that many so maybe this isn't a priority


```
Error in 'OcrMyPdfExtractor processing ArbVREi_dU0nQSH-GoRwgq1dRqSEXvtVh_QiycqaCKmADPSJnbgjf3bCWI6l3KZdx1Mas2U_jVveS5j1nofZTw': java.lang.IllegalStateException: OcrMyPdfExtractor error openjpeg warning: unspec CS. 1 component so assuming gray.
openjpeg warning: unspec CS. 1 component so assuming gray.
openjpeg warning: unspec CS. 1 component so assuming gray.
Start processing 2 pages concurrently
    1 redoing OCR
    2 redoing OCR
    3 redoing OCR
    4 redoing OCR
Postprocessing...
ColorConversionNeededError: The input PDF has an unusual color space. Use
--color-conversion-strategy to convert to a common color space
such as RGB, or use --output-type pdf to skip PDF/A conversion
and retain the original color space.

	at extraction.ocr.BaseOcrExtractor.extract(BaseOcrExtractor.scala:36)
	at extraction.FileExtractor.extract(FileExtractor.scala:22)
	at extraction.Worker.safeInvokeExtractor(Worker.scala:150)
	at extraction.Worker.$anonfun$executeBatch$3(Worker.scala:94)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Address problems with colour conversion in ocrmypdf for certain PDFs #679

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Address problems with colour conversion in ocrmypdf for certain PDFs #679

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions