Skip to content

Add async Vision LLM extraction module (task 05)#10

Open
zalun wants to merge 3 commits intomainfrom
05-vision-extraction
Open

Add async Vision LLM extraction module (task 05)#10
zalun wants to merge 3 commits intomainfrom
05-vision-extraction

Conversation

@zalun
Copy link
Copy Markdown
Owner

@zalun zalun commented Apr 3, 2026

Summary

  • Add src/docproc/vision.py — async Vision extraction via DeepFellow OpenAI-compatible API
  • Convert PDF pages to images locally using PyMuPDF (zero system deps)
  • Send base64-encoded images to Vision LLM via AsyncOpenAI chat completions
  • Return VisionResult with extracted markdown content
  • Retry with exponential backoff on 5xx/connection errors, fail fast on 4xx
  • 27 tests covering validation, conversion, API calls, retries, and integration

Test plan

  • All 138 tests pass
  • Coverage at 95.9% (≥80% threshold)
  • ruff check + format clean
  • ty check passes
  • Vision module at 97% coverage

Closes #9

zalun added 3 commits April 3, 2026 12:42
Implements src/docproc/vision.py that converts PDF pages to images
using PyMuPDF locally, sends base64-encoded images to DeepFellow's
OpenAI-compatible chat completions endpoint, and returns VisionResult.

Closes #9
- Wrap PDF page rendering errors in VisionError (was propagating raw)
- Add test for connection error exhausting all retries
- Add test for image file read failure (OSError)
- Add test for PDF rendering failure with doc.close() guarantee
- Raise VisionError on empty choices list instead of IndexError
- Log warning when Vision API returns None/empty content
- Add tests for both edge cases
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Vision LLM extraction module (task 05)

1 participant