Skip to content

sscotti/wkhtmltopdf_whisper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Document Generator & Audio Transcriber

A web application that combines HTML-to-PDF generation using wkhtmltopdf with AI-powered audio transcription using Whisper.

πŸš€ Features

PDF Generator

  • Real-time HTML/CSS Editing: Simple web interface to edit the HTML and CSS.
  • Live Preview: See your document as you edit
  • PDF Output: High-quality PDF generation with custom styling
  • Sample Templates: Pre-loaded examples to get you started
  • Responsive Design: Works well on desktop and mobile

Audio Transcriber

  • Web Audio Recording: Record directly in your browser
  • File Upload Support: Upload existing audio files for transcription
  • AI-Powered Transcription: Uses OpenAI Whisper for accurate speech-to-text
  • Language Detection: Automatically detects the spoken language
  • Copy to Clipboard: Easy text extraction and sharing

πŸ›  Technologies Used

  • Backend: Flask (Python) with Gunicorn
  • PDF Generation: wkhtmltopdf with pdfkit
  • Audio Processing: OpenAI Whisper + FFmpeg
  • Frontend: Modern HTML5, CSS3, JavaScript
  • Code Editors: CodeMirror with syntax highlighting
  • Container: Docker with Docker Compose

πŸ“¦ Quick Start

Prerequisites

  • Docker and Docker Compose installed
  • At least 4GB RAM (for Whisper model)

Installation

  1. Clone the repository:

    git clone https://github.com/sscotti/wkhtmltopdf_whisper.git
    cd wkhtmltopdf_whisper
  2. Build and start the application:

    docker-compose up -d utility
  3. Access the application:

    • Open your browser and go to: http://localhost:5001
    • The application will be ready to use immediately!

🎯 Usage

PDF Generation

  1. Navigate to the PDF Generator tab
  2. Edit the HTML content in the left editor
  3. Customize styling in the CSS editor on the right
  4. See live preview in the preview panel
  5. Click Generate PDF to download your document

Audio Transcription

  1. Switch to the Audio Transcriber tab
  2. For recording: Click "Start Recording" and speak into your microphone
  3. For file upload: Drag and drop or select an audio file
  4. Click the transcription button to process
  5. Copy the transcribed text when ready
  6. Note that this loads the default basic model for Whisper. See RUN python -c "import whisper; whisper.load_model('base')" in the Dockerfile.

πŸ”§ API Endpoints

The application also provides REST API endpoints:

  • GET / - Web interface
  • POST /generate-pdf - Generate PDF from HTML/CSS
  • POST /transcribe - Transcribe uploaded audio file
  • POST /transcribe-blob - Transcribe recorded audio blob
  • POST /ffmpeg-info - Get media file information

Example API Usage

Generate PDF:

curl -X POST http://localhost:5001/generate-pdf \
  -H "Content-Type: application/json" \
  -d '{"html": "<h1>Hello World</h1>", "css": "h1 { color: blue; }"}' \
  --output document.pdf

Transcribe Audio:

curl -X POST http://localhost:5001/transcribe \
  -F "[email protected]"

πŸ— Architecture

utilities/
β”œβ”€β”€ app.py                 # Main Flask application
β”œβ”€β”€ requirements.txt       # Python dependencies
β”œβ”€β”€ Dockerfile            # Container configuration
β”œβ”€β”€ templates/
β”‚   └── index.html        # Web interface
└── static/
    β”œβ”€β”€ css/
    β”‚   └── style.css     # Modern UI styles
    └── js/
        └── app.js        # Frontend functionality

🎨 Features in Detail

PDF Generator

  • CodeMirror Integration: Professional code editing experience
  • Live Preview: Real-time HTML rendering
  • Professional Styling: Beautiful default templates
  • Custom PDF Options: Configurable page size, margins, and encoding
  • Error Handling: Clear feedback for any issues

Audio Transcriber

  • Browser Recording: Uses Web Audio API for recording
  • Multiple Formats: Supports various audio file formats
  • Whisper Integration: State-of-the-art AI transcription
  • Processing Feedback: Real-time status updates
  • Language Support: Multi-language transcription capabilities

πŸ”„ Development

Local Development

# Run in development mode
cd utilities
export FLASK_ENV=development
python app.py

Rebuilding the Container

docker-compose build utility
docker-compose up -d utility

Viewing Logs

docker-compose logs -f utility

πŸ“ Configuration

Environment Variables

  • FLASK_ENV: Set to development for debug mode
  • Custom PDF options can be modified in app.py

Dependencies

All Python dependencies are managed in requirements.txt:

  • Flask 2.3.3
  • Gunicorn 21.2.0
  • pdfkit 1.0.0
  • openai-whisper 20231117
  • ffmpeg-python 0.2.0

πŸš€ Future Enhancements

  • User authentication and document storage
  • Template library with more designs
  • Batch audio processing
  • Export transcriptions in multiple formats
  • Integration with cloud storage services
  • Multi-language UI support

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

πŸ“ž Support

For support, please open an issue in the GitHub repository or contact the development team.

About

Dockerized HTML to PDF conversion using wkhtmltopdf & Speech-to-Text with Whisper and WebUI

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published