Extract structured data from screenshots. OCR + entity extraction + summary in one step.
What it does:
- Input: Screenshot image (png, jpg, webp)
- Output: Extracted text, entities, and structured summary
# Extract data from a screenshot
python extract.py --file receipt.png
# Output as JSON
python extract.py --file form.png --format json
# Process multiple images
python extract.py --file img1.png --file img2.png- Text: Full OCR of visible text
- Entities: Dates, amounts, names, emails, phone numbers, URLs
- Structure: Tables, forms, lists detected and formatted
- Summary: What the screenshot contains and key information
Receipt → Items, prices, total, date, merchant
Form → Field names and values as key-value pairs
Chat/Email → Sender, date, subject, body text
Error message → Error type, message, stack trace
Dashboard → Metrics, charts described, key numbers
- Python 3.11+
- google-genai (Gemini 2.0 Flash with vision)
- Pillow (image handling)
This is one tool in the Reify Studio collection — AI tools that feed into your personal knowledge vault.