A modern, user-friendly Streamlit web interface for OpenAI's Whisper speech-to-text model, containerized with Docker for easy deployment.
- 🖱️ Drag & Drop Interface: Simply drag audio files into the browser or click to browse
- 🧠 Model Selection: Choose from 5 different Whisper models based on your accuracy/speed needs:
tiny.en
- Fastest, least accurate (~39MB)base.en
- Good balance (~74MB)small.en
- Better accuracy (~244MB)medium.en
- High accuracy (~769MB)large
- Best accuracy (~1550MB)
- 📁 Flexible Output: Save files to default location or specify custom paths
- 📥 Download Buttons: Download transcript files (TXT, SRT, VTT) directly from the web interface
- 🐳 Docker Volume Support: Map container directories to host machine for persistent file access
- 💾 Memory Optimized: CPU-only processing with memory-friendly options
- 📱 Responsive UI: Clean, modern interface that works on desktop and mobile
git clone https://github.com/jlonge4/whisperAI-flask-docker.git
cd whisperAI-flask-docker/whisperapp
docker build -t whisper-streamlit .
Basic usage (download files via web interface):
docker run -p 8501:8501 whisper-streamlit
With volume mapping (optional - for direct file access):
docker run -p 8501:8501 -v ~/transcripts:./whisper_files whisper-streamlit
With extra memory (for larger models):
docker run --memory=4g -p 8501:8501 -v ~/transcripts:./whisper_files whisper-streamlit
Open your browser and go to: http://localhost:8501
- Upload Audio: Drag and drop your audio file or click to browse
- Supported formats: MP3, WAV, MP4, M4A, FLAC, OGG, WMA
- Select Model: Choose the Whisper model that fits your needs
- Choose Output Location: Use default
./whisper_files
or specify custom path - Start Transcription: Click the transcription button and wait for processing
- Get Results: Download files instantly using the download buttons (or access via volume mapping if configured)
The app generates three transcript formats:
.txt
- Plain text transcript.srt
- SubRip subtitle format with timestamps.vtt
- WebVTT subtitle format
Volume mapping is completely optional thanks to the built-in download buttons! But if you want direct file system access:
# Map container's ./whisper_files to host's ~/transcripts
docker run -p 8501:8501 -v ~/transcripts:./whisper_files whisper-streamlit
# Map to any host directory
docker run -p 8501:8501 -v /path/to/your/folder:./whisper_files whisper-streamlit
- Start with smaller models (
tiny.en
orbase.en
) to test functionality - Use larger models (
medium.en
orlarge
) for better accuracy on complex audio - Increase Docker memory if you get out-of-memory errors with larger models
- Download buttons provide instant file access without any setup required
- Volume mapping is optional - only needed if you want files to persist on your host machine
- Docker Desktop (with sufficient memory allocation)
- At least 2GB RAM (4GB+ recommended for larger models)
- ARM64 or AMD64 architecture support
This project has been upgraded from Flask to Streamlit with major improvements:
- Modern drag-and-drop interface
- Real-time model selection
- Built-in download functionality
- Better error handling and user feedback
- Memory optimization for Docker containers
- Multi-platform Docker support (ARM64/AMD64)
Original concept: A user-friendly way to upload files to a dockerized Flask web form and have Whisper transcribe them. Now with a modern Streamlit interface! 🎉