A web application that combines HTML-to-PDF generation using wkhtmltopdf with AI-powered audio transcription using Whisper.
- Real-time HTML/CSS Editing: Simple web interface to edit the HTML and CSS.
- Live Preview: See your document as you edit
- PDF Output: High-quality PDF generation with custom styling
- Sample Templates: Pre-loaded examples to get you started
- Responsive Design: Works well on desktop and mobile
- Web Audio Recording: Record directly in your browser
- File Upload Support: Upload existing audio files for transcription
- AI-Powered Transcription: Uses OpenAI Whisper for accurate speech-to-text
- Language Detection: Automatically detects the spoken language
- Copy to Clipboard: Easy text extraction and sharing
- Backend: Flask (Python) with Gunicorn
- PDF Generation: wkhtmltopdf with pdfkit
- Audio Processing: OpenAI Whisper + FFmpeg
- Frontend: Modern HTML5, CSS3, JavaScript
- Code Editors: CodeMirror with syntax highlighting
- Container: Docker with Docker Compose
- Docker and Docker Compose installed
- At least 4GB RAM (for Whisper model)
-
Clone the repository:
git clone https://github.com/sscotti/wkhtmltopdf_whisper.git cd wkhtmltopdf_whisper -
Build and start the application:
docker-compose up -d utility
-
Access the application:
- Open your browser and go to:
http://localhost:5001 - The application will be ready to use immediately!
- Open your browser and go to:
- Navigate to the PDF Generator tab
- Edit the HTML content in the left editor
- Customize styling in the CSS editor on the right
- See live preview in the preview panel
- Click Generate PDF to download your document
- Switch to the Audio Transcriber tab
- For recording: Click "Start Recording" and speak into your microphone
- For file upload: Drag and drop or select an audio file
- Click the transcription button to process
- Copy the transcribed text when ready
- Note that this loads the default basic model for Whisper. See RUN python -c "import whisper; whisper.load_model('base')" in the Dockerfile.
The application also provides REST API endpoints:
GET /- Web interfacePOST /generate-pdf- Generate PDF from HTML/CSSPOST /transcribe- Transcribe uploaded audio filePOST /transcribe-blob- Transcribe recorded audio blobPOST /ffmpeg-info- Get media file information
Generate PDF:
curl -X POST http://localhost:5001/generate-pdf \
-H "Content-Type: application/json" \
-d '{"html": "<h1>Hello World</h1>", "css": "h1 { color: blue; }"}' \
--output document.pdfTranscribe Audio:
curl -X POST http://localhost:5001/transcribe \
-F "[email protected]"utilities/
βββ app.py # Main Flask application
βββ requirements.txt # Python dependencies
βββ Dockerfile # Container configuration
βββ templates/
β βββ index.html # Web interface
βββ static/
βββ css/
β βββ style.css # Modern UI styles
βββ js/
βββ app.js # Frontend functionality
- CodeMirror Integration: Professional code editing experience
- Live Preview: Real-time HTML rendering
- Professional Styling: Beautiful default templates
- Custom PDF Options: Configurable page size, margins, and encoding
- Error Handling: Clear feedback for any issues
- Browser Recording: Uses Web Audio API for recording
- Multiple Formats: Supports various audio file formats
- Whisper Integration: State-of-the-art AI transcription
- Processing Feedback: Real-time status updates
- Language Support: Multi-language transcription capabilities
# Run in development mode
cd utilities
export FLASK_ENV=development
python app.pydocker-compose build utility
docker-compose up -d utilitydocker-compose logs -f utilityFLASK_ENV: Set todevelopmentfor debug mode- Custom PDF options can be modified in
app.py
All Python dependencies are managed in requirements.txt:
- Flask 2.3.3
- Gunicorn 21.2.0
- pdfkit 1.0.0
- openai-whisper 20231117
- ffmpeg-python 0.2.0
- User authentication and document storage
- Template library with more designs
- Batch audio processing
- Export transcriptions in multiple formats
- Integration with cloud storage services
- Multi-language UI support
This project is licensed under the MIT License - see the LICENSE file for details.
Contributions are welcome! Please feel free to submit a Pull Request.
For support, please open an issue in the GitHub repository or contact the development team.