Skip to content

Matanga1-2/pdf-gmail-processor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PDF Document Processor

A Python-based automation tool that processes PDF documents from Gmail attachments and organizes them in Google Drive. It can handle password-protected PDFs, automatically process them based on labels, and store them in organized folders.

Features

  • 📧 Automatically fetches emails with specific labels from Gmail
  • 🔐 Handles password-protected PDF attachments
  • 📁 Organizes processed documents in Google Drive
  • 🏷️ Supports multiple document types with different processing rules
  • 📅 Smart file naming based on email dates
  • 🔄 Marks processed emails as read and archives them

Use Cases

  1. Financial Documents

    • Salary slips
    • Bank statements
    • Investment reports
    • Tax documents
    • Insurance policies
  2. Business Documents

    • Invoices
    • Purchase orders
    • Expense reports
    • Contracts
    • Vendor agreements
  3. Educational/Professional

    • Academic transcripts
    • Certificates
    • Professional licenses
    • Training documents
  4. Healthcare

    • Medical reports
    • Lab results
    • Insurance claims
    • Prescription records

Setup

  1. Install Dependencies

    pip install -r requirements.txt
  2. Google API Setup

    • Create a Google Cloud Project
    • Enable Gmail API and Google Drive API
    • Create OAuth 2.0 credentials
    • Download credentials and save as credentials.json in the project root
  3. Configuration

    • Copy config.example.json to config.json
    • Update the configuration with your specific settings:
      • Gmail label IDs
      • Google Drive folder IDs
      • PDF passwords (if applicable)
      • File naming formats

Configuration Example

{
    "document_types": {
        "salary": {
            "gmail_label_id": "YOUR_LABEL_ID",
            "drive_folder_id": "YOUR_FOLDER_ID",
            "prefix": "salary",
            "pdf_password": "YOUR_PASSWORD",
            "filename_format": "{prefix}_{year}_{month}"
        }
    }
}

Configuration Fields

  • gmail_label_id: The Gmail label ID to filter emails
  • drive_folder_id: The Google Drive folder ID where processed files will be stored
  • prefix: Prefix for the processed file names
  • pdf_password: Password for encrypted PDFs (optional)
  • filename_format: Format string for naming processed files

Usage

  1. Run the processor

    python main.py
  2. First Run

    • On first run, you'll be prompted to authorize the application
    • Follow the OAuth flow in your browser
    • The authorization token will be saved for future use

How It Works

  1. The script checks for unread emails with specified labels
  2. Downloads PDF attachments to a temporary directory
  3. Processes PDFs (removes passwords if needed)
  4. Renames files according to the configured format
  5. Uploads to specified Google Drive folders
  6. Marks the original emails as read and archives them
  7. Cleans up temporary files

Directory Structure

.
├── main.py              # Main script
├── config.json          # Your configuration
├── requirements.txt     # Python dependencies
├── credentials.json     # Google API credentials
├── token.json          # OAuth token (generated)
├── google/             # Google API helpers
│   ├── gmail.py
│   └── drive.py
├── pdf/                # PDF processing utilities
│   └── pdf_manager.py
└── utils/              # Utility functions
    └── utils.py

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

Processes PDF documents from Gmail attachments and organizes them in Google Drive

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages