Document Generator & Audio Transcriber

A web application that combines HTML-to-PDF generation using wkhtmltopdf with AI-powered audio transcription using Whisper.

🚀 Features

PDF Generator

Real-time HTML/CSS Editing: Simple web interface to edit the HTML and CSS.
Live Preview: See your document as you edit
PDF Output: High-quality PDF generation with custom styling
Sample Templates: Pre-loaded examples to get you started
Responsive Design: Works well on desktop and mobile

Audio Transcriber

Web Audio Recording: Record directly in your browser
File Upload Support: Upload existing audio files for transcription
AI-Powered Transcription: Uses OpenAI Whisper for accurate speech-to-text
Language Detection: Automatically detects the spoken language
Copy to Clipboard: Easy text extraction and sharing

🛠 Technologies Used

Backend: Flask (Python) with Gunicorn
PDF Generation: wkhtmltopdf with pdfkit
Audio Processing: OpenAI Whisper + FFmpeg
Frontend: Modern HTML5, CSS3, JavaScript
Code Editors: CodeMirror with syntax highlighting
Container: Docker with Docker Compose

📦 Quick Start

Prerequisites

Docker and Docker Compose installed
At least 4GB RAM (for Whisper model)

Installation

Clone the repository:

git clone https://github.com/sscotti/wkhtmltopdf_whisper.git
cd wkhtmltopdf_whisper

Build and start the application:
```
docker-compose up -d utility
```
Access the application:
- Open your browser and go to: http://localhost:5001
- The application will be ready to use immediately!

🎯 Usage

PDF Generation

Navigate to the PDF Generator tab
Edit the HTML content in the left editor
Customize styling in the CSS editor on the right
See live preview in the preview panel
Click Generate PDF to download your document

Audio Transcription

Switch to the Audio Transcriber tab
For recording: Click "Start Recording" and speak into your microphone
For file upload: Drag and drop or select an audio file
Click the transcription button to process
Copy the transcribed text when ready
Note that this loads the default basic model for Whisper. See RUN python -c "import whisper; whisper.load_model('base')" in the Dockerfile.

🔧 API Endpoints

The application also provides REST API endpoints:

GET / - Web interface
POST /generate-pdf - Generate PDF from HTML/CSS
POST /transcribe - Transcribe uploaded audio file
POST /transcribe-blob - Transcribe recorded audio blob
POST /ffmpeg-info - Get media file information

Example API Usage

Generate PDF:

curl -X POST http://localhost:5001/generate-pdf \
  -H "Content-Type: application/json" \
  -d '{"html": "<h1>Hello World</h1>", "css": "h1 { color: blue; }"}' \
  --output document.pdf

Transcribe Audio:

curl -X POST http://localhost:5001/transcribe \
  -F "[email protected]"

🏗 Architecture

utilities/
├── app.py                 # Main Flask application
├── requirements.txt       # Python dependencies
├── Dockerfile            # Container configuration
├── templates/
│   └── index.html        # Web interface
└── static/
    ├── css/
    │   └── style.css     # Modern UI styles
    └── js/
        └── app.js        # Frontend functionality

🎨 Features in Detail

PDF Generator

CodeMirror Integration: Professional code editing experience
Live Preview: Real-time HTML rendering
Professional Styling: Beautiful default templates
Custom PDF Options: Configurable page size, margins, and encoding
Error Handling: Clear feedback for any issues

Audio Transcriber

Browser Recording: Uses Web Audio API for recording
Multiple Formats: Supports various audio file formats
Whisper Integration: State-of-the-art AI transcription
Processing Feedback: Real-time status updates
Language Support: Multi-language transcription capabilities

🔄 Development

Local Development

# Run in development mode
cd utilities
export FLASK_ENV=development
python app.py

Rebuilding the Container

docker-compose build utility
docker-compose up -d utility

Viewing Logs

docker-compose logs -f utility

📝 Configuration

Environment Variables

FLASK_ENV: Set to development for debug mode
Custom PDF options can be modified in app.py

Dependencies

All Python dependencies are managed in requirements.txt:

Flask 2.3.3
Gunicorn 21.2.0
pdfkit 1.0.0
openai-whisper 20231117
ffmpeg-python 0.2.0

🚀 Future Enhancements

User authentication and document storage
Template library with more designs
Batch audio processing
Export transcriptions in multiple formats
Integration with cloud storage services
Multi-language UI support

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

📞 Support

For support, please open an issue in the GitHub repository or contact the development team.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
utilities		utilities
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Document Generator & Audio Transcriber

🚀 Features

PDF Generator

Audio Transcriber

🛠 Technologies Used

📦 Quick Start

Prerequisites

Installation

🎯 Usage

PDF Generation

Audio Transcription

🔧 API Endpoints

Example API Usage

🏗 Architecture

🎨 Features in Detail

PDF Generator

Audio Transcriber

🔄 Development

Local Development

Rebuilding the Container

Viewing Logs

📝 Configuration

Environment Variables

Dependencies

🚀 Future Enhancements

📄 License

🤝 Contributing

📞 Support

About

Uh oh!

Releases

Packages

Languages

License

sscotti/wkhtmltopdf_whisper

Folders and files

Latest commit

History

Repository files navigation

Document Generator & Audio Transcriber

🚀 Features

PDF Generator

Audio Transcriber

🛠 Technologies Used

📦 Quick Start

Prerequisites

Installation

🎯 Usage

PDF Generation

Audio Transcription

🔧 API Endpoints

Example API Usage

🏗 Architecture

🎨 Features in Detail

PDF Generator

Audio Transcriber

🔄 Development

Local Development

Rebuilding the Container

Viewing Logs

📝 Configuration

Environment Variables

Dependencies

🚀 Future Enhancements

📄 License

🤝 Contributing

📞 Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages