A Python-based automation tool that processes PDF documents from Gmail attachments and organizes them in Google Drive. It can handle password-protected PDFs, automatically process them based on labels, and store them in organized folders.
- 📧 Automatically fetches emails with specific labels from Gmail
- 🔐 Handles password-protected PDF attachments
- 📁 Organizes processed documents in Google Drive
- 🏷️ Supports multiple document types with different processing rules
- 📅 Smart file naming based on email dates
- 🔄 Marks processed emails as read and archives them
-
Financial Documents
- Salary slips
- Bank statements
- Investment reports
- Tax documents
- Insurance policies
-
Business Documents
- Invoices
- Purchase orders
- Expense reports
- Contracts
- Vendor agreements
-
Educational/Professional
- Academic transcripts
- Certificates
- Professional licenses
- Training documents
-
Healthcare
- Medical reports
- Lab results
- Insurance claims
- Prescription records
-
Install Dependencies
pip install -r requirements.txt
-
Google API Setup
- Create a Google Cloud Project
- Enable Gmail API and Google Drive API
- Create OAuth 2.0 credentials
- Download credentials and save as
credentials.json
in the project root
-
Configuration
- Copy
config.example.json
toconfig.json
- Update the configuration with your specific settings:
- Gmail label IDs
- Google Drive folder IDs
- PDF passwords (if applicable)
- File naming formats
- Copy
{
"document_types": {
"salary": {
"gmail_label_id": "YOUR_LABEL_ID",
"drive_folder_id": "YOUR_FOLDER_ID",
"prefix": "salary",
"pdf_password": "YOUR_PASSWORD",
"filename_format": "{prefix}_{year}_{month}"
}
}
}
gmail_label_id
: The Gmail label ID to filter emailsdrive_folder_id
: The Google Drive folder ID where processed files will be storedprefix
: Prefix for the processed file namespdf_password
: Password for encrypted PDFs (optional)filename_format
: Format string for naming processed files
-
Run the processor
python main.py
-
First Run
- On first run, you'll be prompted to authorize the application
- Follow the OAuth flow in your browser
- The authorization token will be saved for future use
- The script checks for unread emails with specified labels
- Downloads PDF attachments to a temporary directory
- Processes PDFs (removes passwords if needed)
- Renames files according to the configured format
- Uploads to specified Google Drive folders
- Marks the original emails as read and archives them
- Cleans up temporary files
.
├── main.py # Main script
├── config.json # Your configuration
├── requirements.txt # Python dependencies
├── credentials.json # Google API credentials
├── token.json # OAuth token (generated)
├── google/ # Google API helpers
│ ├── gmail.py
│ └── drive.py
├── pdf/ # PDF processing utilities
│ └── pdf_manager.py
└── utils/ # Utility functions
└── utils.py
This project is licensed under the MIT License - see the LICENSE file for details.