Skip to content

clawd-conroy/document-cleanup-agent

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Document Cleanup Agent

Turn messy documents into clean, structured output. The dad test: if he can use it, anyone can.

Status: MVP in progress

Phase 1 Goal: Prove the engine works

  • Input: Messy document (meeting notes, contracts, research, etc.)
  • Output: Cleaned, structured document (markdown or JSON)

Quick Start

# Clean up a document from stdin
cat messy-notes.txt | python cleanup.py

# From a file
python cleanup.py --file meeting-notes.txt

# Output as JSON
python cleanup.py --file notes.txt --format json

What It Does

  1. Extract text from various formats (txt, md, pdf, docx, html)
  2. Identify document structure (headings, lists, paragraphs)
  3. Fix formatting issues (spacing, bullets, numbering)
  4. Generate clean, consistent output
  5. Optionally extract metadata (dates, names, action items)

Examples

Input: Rambling meeting notes with inconsistent formatting Output: Structured summary with attendees, decisions, action items

Input: Scanned contract with OCR errors
Output: Clean text with sections properly identified

Stack

  • Python 3.11+
  • google-genai (Gemini API)
  • python-docx (Word docs)
  • PyMuPDF (PDFs)
  • beautifulsoup4 (HTML)

Part of Reify Studio

This is one tool in the Reify Studio collection — AI tools that feed into your personal knowledge vault.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 100.0%