Skip to content

fix(pdf-extraction): improve handling of unreadable PDF content#224

Open
Luis-manzur wants to merge 5 commits intomainfrom
219-bad-processing-of-pdfs-in-texas
Open

fix(pdf-extraction): improve handling of unreadable PDF content#224
Luis-manzur wants to merge 5 commits intomainfrom
219-bad-processing-of-pdfs-in-texas

Conversation

@Luis-manzur
Copy link
Copy Markdown
Contributor

This pull request improves the extraction of text from PDF files by adding a check to ensure that the extracted text is actually readable and not just binary or corrupt data. It also updates the interface and tests to reflect this new behavior.

Issue - #219

@Luis-manzur Luis-manzur requested review from flooie and grossir October 30, 2025 15:54
@Luis-manzur Luis-manzur linked an issue Oct 30, 2025 that may be closed by this pull request
Comment thread doctor/tasks.py Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: PRs to Review

Development

Successfully merging this pull request may close these issues.

Bad Processing of PDFs in Texas

3 participants