Skip to content

reification-labs/screenshot-agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

Screenshot Agent

Extract structured data from screenshots. OCR + entity extraction + summary in one step.

Status: MVP

What it does:

  • Input: Screenshot image (png, jpg, webp)
  • Output: Extracted text, entities, and structured summary

Quick Start

# Extract data from a screenshot
python extract.py --file receipt.png

# Output as JSON
python extract.py --file form.png --format json

# Process multiple images
python extract.py --file img1.png --file img2.png

What It Extracts

  • Text: Full OCR of visible text
  • Entities: Dates, amounts, names, emails, phone numbers, URLs
  • Structure: Tables, forms, lists detected and formatted
  • Summary: What the screenshot contains and key information

Examples

Receipt → Items, prices, total, date, merchant Form → Field names and values as key-value pairs
Chat/Email → Sender, date, subject, body text Error message → Error type, message, stack trace Dashboard → Metrics, charts described, key numbers

Stack

  • Python 3.11+
  • google-genai (Gemini 2.0 Flash with vision)
  • Pillow (image handling)

Part of Reify Studio

This is one tool in the Reify Studio collection — AI tools that feed into your personal knowledge vault.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages