Skip to content

brandongell/para

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

32 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Legal Document Organizer

An AI-powered legal document organizer that uses Gemini multimodal AI and OpenAI GPT-4o to automatically classify and organize legal documents with advanced PDF signature detection.

๐ŸŽฏ Key Features

  • ๐Ÿ”ฎ Gemini Multimodal PDF Extraction: Revolutionary visual signature detection that finds ALL signers in PDFs
  • ๐Ÿค– Dual AI Classification: Uses OpenAI GPT-4o for document classification and metadata extraction
  • ๐Ÿง  Memory System: Pre-indexes information for instant answers to business questions
  • ๐Ÿ“ Automatic Template Identification: Intelligently identifies and categorizes template documents
  • ๐Ÿ“ Structured Organization: Organizes documents into a 10-folder business function structure
  • ๐Ÿ‘€ Real-time Monitoring: Continuously monitors folders for new files and organizes them instantly
  • ๐Ÿ“„ Advanced PDF Processing: Detects visual signatures, filled form fields, and annotations
  • ๐Ÿ”„ Batch Processing: Can organize large numbers of existing files
  • โšก CLI Interface: Simple command-line interface for easy use
  • ๐Ÿค– Discord Bot: Natural language document search and template requests
  • ๐Ÿ”— Documenso Integration: Upload templates to Documenso for e-signature workflows

๐Ÿš€ What Makes This Special

This system uses Google Gemini's multimodal capabilities to analyze PDFs visually, not just as text. This breakthrough allows it to:

  • โœ… Find hidden signers: Detects signatures that appear visual but aren't in the text layer
  • โœ… Extract complete metadata: Names, addresses, emails, contract values, dates
  • โœ… Handle complex legal docs: SAFE agreements, employment contracts, investment documents
  • โœ… Process annotations: Stamps, form fields, electronic signatures

Example Success: On test document NMM.pdf, traditional extraction found only 1 signer (Dan Shipper), but Gemini finds both signers (Dan Shipper + Nashilu Mouen) with complete contact information.

๐Ÿง  Enhanced Memory System with Claude Code Search

The enhanced memory system captures comprehensive information from documents and aggregates it into searchable markdown files, now with Claude Code autonomous search capabilities:

Core Memory Files:

  • Company Information: EIN, addresses, formation details, milestones
  • People Directory: Employees, contractors, advisors, investors with contact info
  • Financial Summary: Total capital raised, investment rounds, SAFE agreements, financial impact analysis
  • Revenue & Sales: Customer contracts, revenue streams, business context, payment terms, obligations
  • Contracts Summary: Active contracts, expiring soon, business context, key provisions, obligations
  • Key Dates: Contract expirations, renewal deadlines, vesting schedules

Enhanced Information Capture:

The system now extracts and preserves:

  • Business Context: 3-5 sentence narratives explaining strategic importance and implications
  • Key Terms: 5-15 most important contractual provisions and conditions
  • Obligations: All specific deliverables, milestones, and requirements
  • Financial Terms: Payment schedules, pricing models, revenue shares, minimum commitments
  • Critical Facts: Document-specific information like EIN numbers, policy numbers, addresses

๐Ÿค– Claude Code Autonomous Search (NEW):

When you ask a question, Claude Code:

  1. Searches memory files for quick answers
  2. Identifies information gaps and what's still needed
  3. Reads full documents to extract missing details
  4. Continues searching until it finds complete answers
  5. Provides comprehensive responses with source citations

Example: "How much has Austin Rief invested?" โ†’ Claude finds the amount in memory, then reads the full SAFE agreement to extract valuation cap, discount rate, and other terms.

Memory files are automatically updated when new documents are added and enable instant detailed answers via Discord.

๐Ÿ“ Template Identification & Documenso Integration

The system automatically identifies template documents using intelligent pattern matching:

  • High Confidence: Documents with [BLANK], [FORM], (Form), or Template in filename
  • Medium Confidence: Generic documents without specific party names
  • Smart Exclusion: Won't mark EXECUTED or signed documents as templates

๐Ÿ†• Automatic Documenso Upload Prompt

When you upload a template file to Discord, Para automatically detects it and offers to upload it to Documenso for e-signature workflows. Just reply "yes" to upload!

When a template is detected and Documenso is configured, the system will:

  1. Prompt you to upload the template to Documenso
  2. Upload the document if you confirm
  3. Return a configuration link to set up signature fields
  4. Update metadata with Documenso document ID and URL

Example prompt:

๐ŸŽฏ Template Document Detected!
๐Ÿ“„ File: Employment Agreement [FORM].pdf
๐Ÿ“‹ Type: Employment Agreement
๐Ÿท๏ธ  Category: People_and_Employment
๐Ÿ” Confidence: HIGH

๐Ÿค” Would you like to upload this template to Documenso for configuration? (yes/no): yes

๐Ÿ“ค Uploading template to Documenso...
โœ… Template uploaded successfully!
๐Ÿ”— Configure your template here:
   https://app.documenso.com/documents/12345/convert-to-template
๐Ÿ“Œ Document ID: 12345
๐Ÿ’ก Tip: Add signature fields, text fields, and other elements in the Documenso interface.

Templates are:

  • Automatically categorized with status: "template"
  • Organized into the 09_Templates folder
  • Searchable via Discord with natural language
  • Tagged with metadata including placeholders and use cases

๐Ÿ“ฆ Installation

  1. Clone or download this project
  2. Install dependencies:
    npm install
  3. Set up your API keys:
    cp .env.example .env
    # Edit .env and add your API keys

โš™๏ธ Configuration

Edit the .env file to configure:

# Required for document classification
OPENAI_API_KEY=your_openai_api_key_here

# Required for advanced PDF extraction
GEMINI_API_KEY=your_gemini_api_key_here

# Optional for Discord bot
DISCORD_BOT_TOKEN=your_discord_bot_token_here
ORGANIZE_FOLDER_PATH=/path/to/your/documents

# Optional for Documenso integration
DOCUMENSO_API_URL=https://api.documenso.com
DOCUMENSO_API_TOKEN=your_documenso_api_token_here
DOCUMENSO_APP_URL=https://app.documenso.com

# Optional for Claude Code enhanced search
ANTHROPIC_API_KEY=your_anthropic_api_key_here
CLAUDE_CODE_MAX_ITERATIONS=10

๐ŸŽฎ Usage

Quick Start (CLI)

# Development mode
npm run dev

# Production mode
npm run build
npm start

Memory System

# Refresh memory files from all documents
npm run refresh-memory

The memory system creates pre-indexed markdown files that enable instant answers to questions like:

  • "What is our EIN number?"
  • "How much revenue do we have?"
  • "Who are our investors?"
  • "What contracts expire this month?"
  • "Who are our key partners?"

Discord Bot

npm run discord

Thread-Based Communication

The Discord bot uses a thread-based conversation model:

  • Activation: Bot only responds when @mentioned (e.g., @para help)
  • Thread Creation: Automatically creates a thread from your message
  • Continued Conversation: In threads, no @ mention needed - just type naturally
  • Auto-Archive: Threads archive after 24 hours of inactivity
  • Clean Channels: Keeps main channels clutter-free

Template Commands

  • "Show me all templates"
  • "I need an employment agreement template"
  • "Find SAFE template"
  • "Get me a blank NDA"

File Organization

  • Upload files with @para organize this to start a thread
  • Bot processes files and responds with organization details in thread
  • Multiple files are handled with progress updates

Documenso Integration

  • "Upload this template to Documenso"
  • "Show templates not in Documenso"
  • "Upload employment agreement to Documenso"
  • "Get Documenso link for NDA template"

Testing PDF Extraction

# Test Gemini multimodal extraction
npx ts-node test-gemini.ts

# Test simplified workflow
npx ts-node test-simplified-workflow.ts

# Test production workflow
npx ts-node test-production-workflow.ts

# Test template identification
npx ts-node test-template-prompt.ts
npx ts-node test-mnda-form.ts

# Test enhanced information capture
npx ts-node test-enhanced-information-capture.ts <path-to-document>

The enhanced information capture test will show:

  • All extracted metadata fields including business context
  • Key terms and obligations found in the document
  • Financial terms and critical facts
  • Which memory files were updated with the information

๐Ÿ“ Folder Structure

The organizer creates a structured folder system based on business functions:

  • 01_Corporate_and_Governance - Formation, governance, board documents
  • 02_People_and_Employment - Employment agreements, consulting, equity
  • 03_Finance_and_Investment - Investment documents, banking, tax
  • 04_Sales_and_Revenue - Customer agreements, sales contracts
  • 05_Operations_and_Vendors - Vendor agreements, supplier contracts
  • 06_Technology_and_IP - Patents, licenses, development agreements
  • 07_Marketing_and_Partnerships - Partnership agreements, marketing
  • 08_Risk_and_Compliance - Regulatory compliance, litigation
  • 09_Templates - Document templates and forms
  • 10_Archive - Expired or terminated documents

๐Ÿ”ง How It Works

1. PDF Processing with Gemini

  • Multimodal Analysis: Gemini analyzes the entire PDF including visual elements
  • Signature Detection: Finds both company and individual signatures
  • Metadata Extraction: Extracts names, dates, addresses, contract values
  • Form Field Analysis: Reads filled form fields that appear blank in text

2. Document Classification

  • AI Analysis: GPT-4o analyzes document content for classification
  • Category Assignment: Determines primary folder and subfolder
  • Confidence Scoring: Provides classification confidence levels

3. Organization Workflow

  1. File Detection: System detects new or existing files
  2. Content Extraction: Uses appropriate extraction method (Gemini for PDFs)
  3. AI Classification: Determines document category and destination
  4. File Movement: Moves files to organized folder structure
  5. Metadata Generation: Creates companion .metadata.json files
  6. Continuous Monitoring: Watches for new files

๐Ÿ“Š Supported Formats

  • PDF: Advanced Gemini multimodal extraction
  • DOCX: Microsoft Word documents
  • DOC: Legacy Word documents
  • TXT: Plain text files

๐ŸŽฏ API Endpoints

If using the optional web server:

  • Health Check: GET /health
  • Google Auth: GET /auth/google
  • Organizations: GET /organizations
  • Documents: GET /documents

๐Ÿ” Advanced Features

Metadata Schema

Each organized document gets a companion .metadata.json file with:

{
  "filename": "document.pdf",
  "status": "executed",
  "category": "Investment_Fundraising",
  "signers": [
    {"name": "Dan Shipper", "date_signed": "2023-06-15"},
    {"name": "Nashilu Mouen", "date_signed": "2023-06-15"}
  ],
  "primary_parties": [...],
  "effective_date": "2023-06-15",
  "contract_value": "$50000",
  "governing_law": "Delaware",
  "template_analysis": {
    "is_template": false,
    "confidence": "HIGH",
    "indicators": ["specific party names", "executed"]
  }
}

For templates:

{
  "filename": "Employment Agreement [BLANK].pdf",
  "status": "template",
  "template_analysis": {
    "is_template": true,
    "confidence": "HIGH",
    "indicators": ["[BLANK] in filename"],
    "template_type": "Employment Agreement",
    "field_placeholders": ["[EMPLOYEE NAME]", "[START DATE]", "[SALARY]"],
    "typical_use_case": "Standard employment agreement for new hires"
  }
}

Process Management with Overmind

cd para/
./start.sh  # Starts all services
overmind ps # View running processes

๐Ÿšจ Error Handling

  • Missing API Keys: Graceful degradation with warnings
  • Unsupported Files: Moved to default folder with error logging
  • Large Files: Files >50MB ignored during monitoring
  • Duplicate Names: Auto-resolved with counter suffixes
  • System Files: Automatically ignored (.DS_Store, temp files)

๐Ÿงช Testing

The project includes comprehensive test scripts:

  • test-gemini.ts - Test Gemini PDF extraction
  • test-simplified-workflow.ts - Test streamlined processing
  • test-production-workflow.ts - Test full production pipeline

๐Ÿ“‹ Requirements

  • Node.js 16+
  • OpenAI API key (for document classification)
  • Google Gemini API key (for advanced PDF extraction)
  • Discord Bot Token (optional, for Discord integration)

๐Ÿ—๏ธ Architecture

Legal Document Organizer
โ”œโ”€โ”€ Core Services
โ”‚   โ”œโ”€โ”€ GeminiPdfService (multimodal PDF extraction)
โ”‚   โ”œโ”€โ”€ FileReaderService (file content extraction)
โ”‚   โ”œโ”€โ”€ MetadataService (metadata generation)
โ”‚   โ”œโ”€โ”€ DocumentClassifierService (AI classification)
โ”‚   โ””โ”€โ”€ FileOrganizerService (file organization)
โ”œโ”€โ”€ Monitoring
โ”‚   โ””โ”€โ”€ FileMonitorService (real-time file watching)
โ”œโ”€โ”€ Integrations
โ”‚   โ””โ”€โ”€ DiscordBotService (Discord bot interface)
โ””โ”€โ”€ Utilities
    โ””โ”€โ”€ CLIUtils (command-line interface)

๐Ÿค Contributing

This project demonstrates advanced AI integration for legal document processing. Feel free to extend it with additional features or file formats.

๐Ÿ“„ License

MIT License

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages