Effortlessly capture and transcribe your thoughts with intelligent voice recording and powerful AI.
๐ก GPU Support Available: For faster transcription, install with CUDA support. See GPU installation instructions below.
A professional voice recording system with PySide6 GUI, real-time transcription, intelligent hotkey control, and automatic text pasting. Features both command-line and graphical interfaces with robust threaded architecture for seamless performance. Now includes AI-powered text reformatting with Google Gemini integration.
- ๐ฅ๏ธ Modern GUI Interface: Intuitive PySide6 interface for seamless configuration and control.
- ๐ฅ Global Hotkey Support: Instantly toggle recording with customizable keyboard shortcuts.
- ๐ต High-Quality Audio Recording: Professional WAV output (16-bit depth, 16kHz sample rate).
- ๐ค Real-time Transcription: Accurate speech-to-text powered by HuggingFace Whisper models.
- ๐ Intelligent Text Pasting: Automatic clipboard management and text insertion.
- ๐ฑ๏ธ Program Controls: GUI buttons for start/stop with real-time status indicators.
- ๐ง AI Text Reformatter: Intelligent text enhancement with Google Gemini AI - hold Ctrl to reformat selected text with grammar fixes.
- ๐ป GPU/CUDA Acceleration: Optimized performance with NVIDIA GPUs using float16 precision.
- ๐ Automatic Model Conversion: Seamlessly converts HuggingFace PyTorch models to CTranslate2 format for faster-whisper compatibility.
- โก Optimized Distil-Whisper: Integrates Distil-Whisper models with enhanced parameters (
beam_size=5,language='en',condition_on_previous_text=False) for improved accuracy and speed. - โ๏ธ Smart Configuration: GUI-based settings with automatic GPU detection and model management.
- โก Threaded Architecture: Non-blocking operations and real-time status monitoring.
- ๐ก๏ธ Thread Safety: Robust concurrent operation handling with proper locking mechanisms.
- ๐ Comprehensive Logging: Detailed debugging and monitoring logs for issue diagnosis.
- ๐ Graceful Shutdown: Clean handling of interrupts and system termination.
- ๐ฆ Model Management: Download and manage any HuggingFace Whisper model directly from the GUI.
- ๐ค AI-Powered Text Enhancement: Background service for grammar and spelling correction.
- ๐๏ธ Reformatter Configuration: Adjustable hold duration and enable/disable controls in Settings.
- โก Independent Operation: Reformatter runs alongside main dictation system without interference.
- Python 3.8 or higher must be installed on your system
- Windows: Download from python.org
- Linux: Usually pre-installed, or use
sudo apt install python3 - macOS: Use python.org or
brew install python3
# Clone the repository
git clone <repository-url>
cd dictationer
# Run the automated setup script
python setup.py # Windows
python3 setup.py # Linux/macOSThe setup script will:
- โ Check your Python version
- โ Create a virtual environment
- โ Guide you through GPU setup options
- โ Install all dependencies
- โ Verify the installation
After setup completes, launch the program:
./start.bat # Windows
./start.sh # Linux/macOSThe setup script will explain GPU requirements, but here's a summary:
- Install CUDA Toolkit from NVIDIA
- Check your CUDA version:
nvidia-smi - Install PyTorch with CUDA (after activating the venv):
# Activate the virtual environment first
venv\Scripts\activate # Windows
source venv/bin/activate # Linux/macOS
# Then install PyTorch based on your CUDA version:
# For CUDA 12.1 (most recent GPUs):
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
# For CUDA 11.8 (older GPUs):
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
# For CPU only (if no GPU):
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpuNote: The GUI will automatically detect and use your GPU if available. You can verify this in Settings โ Device.
-
Launch the GUI:
./start.bat # Windows ./start.sh # Linux/macOS
-
Configure Your Hotkey:
- Go to Settings tab
- Find Audio Settings โ Hotkey field
- Enter your hotkey in the exact format:
ctrl+win+shift+l
-
Format Requirements:
- Lowercase letters only:
ctrlnotCtrl - Plus signs with no spaces:
ctrl+alt+rnotctrl + alt + r - Use exact modifier names:
ctrl,alt,shift,win
- Lowercase letters only:
ctrl+win+shift+l โ Default hotkey
ctrl+alt+r โ Simple combination
shift+f1 โ Function key combo
ctrl+shift+space โ With space key
Ctrl+Win+Shift+L โ Uppercase letters
ctrl + alt + r โ Spaces around plus signs
control+alt+r โ Wrong modifier name
ctrl-alt-r โ Wrong separator
After setting your hotkey in the GUI:
- Click Start Program button
- Try pressing your hotkey combination
- Look for "Recording state: ON/OFF" message in the log output
- If nothing happens, check the format and try again
๐ก Tip: The default ctrl+win+shift+l is tested to work reliably across platforms!
- CPU: Capable of running base or small Whisper models.
- GPU: Highly recommended for optimal performance, especially with larger models. Supports NVIDIA GPUs with CUDA.
# Audio Recording & Processing
pyaudio # High-quality audio capture
faster-whisper # Advanced speech recognition engine
watchdog # File system event monitoring
# System & Utilities
keyboard # Global hotkey detection
pyperclip # Clipboard automation
python-dotenv # Environment variable management
# GUI Framework (v1.1+)
PySide6 # Modern GUI toolkit
# AI & GPU Acceleration
transformers # HuggingFace model integration
torch # PyTorch for GPU support (install with CUDA)
ctranslate2 # For optimized model inference# Testing and code quality
pytest>=6.0 # Unit testing framework
pytest-cov # Coverage reporting
black # Code formatting
mypy # Type checking
flake8 # Linting# Install Microsoft Visual C++ Build Tools if needed
# Run as administrator for keyboard hook permissionssudo apt-get update
sudo apt-get install portaudio19-dev python3-dev
pip install dictationerbrew install portaudio
pip install dictationer
# Grant accessibility permissions when prompted# Launch GUI with proper virtual environment
./start.bat # Windows
./start.sh # Linux/macOS
# Or manually activate and run
source venv/bin/activate # Linux/macOS
# venv\Scripts\activate # Windows
python gui_main.py- ๐๏ธ Settings Configuration: Device selection (CPU/GPU), model selection, hotkey customization
- ๐ฆ Model Download: Download any HuggingFace Whisper model with progress tracking
- ๐ Model Management: View cached models and open models folder
- ๐ฎ Program Control: Start/Stop the main recording program with real-time status
- ๐ง AI Text Reformatter: Configure and control intelligent text enhancement
- ๐ Live Logs: Real-time log output with scrolling display
The integrated AI Text Reformatter enhances any selected text using Google Gemini AI. Perfect for improving dictated text, emails, documents, and more.
Setup:
- Get Gemini API Key: Visit Google AI Studio for a free API key
- Configure Environment: Create a
.envfile in the project root:GEMINI_API_KEY=your-api-key-here
- Enable in Settings: Go to Settings tab โ ๐ง Text Reformatter section โ Enable checkbox
Usage:
- Select text in any application (browser, editor, email, etc.)
- Hold Ctrl key for 2 seconds (configurable 1-5 seconds)
- Watch the magic - text is automatically copied, corrected for grammar and spelling, and pasted back
Status Monitoring:
- Settings shows:
Recording System: Ready | Reformatter: Active - Independent operation alongside voice recording
- Real-time status updates in GUI
# Custom hotkey and output
python main.py --output "recordings/session.wav"
# Use different Whisper model
python main.py # Configure via GUI or config filesWe welcome contributions from the community! This project has huge potential and there are many exciting features we'd love to add:
- What: Add a "Record Hotkey" button in GUI that captures key presses and auto-formats them correctly
- Why: Current hotkey format is error-prone and causes silent failures for users
- Impact: Eliminates the #1 setup frustration and improves user experience dramatically
- What: Thorough testing and optimization for macOS users
- Why: Many users want native macOS support with proper accessibility permissions
- Impact: Expands user base significantly
- What: Add support for OpenAI's official Whisper API as an alternative to local processing
- Why: Cloud processing for users without powerful hardware
- Impact: Makes transcription accessible to everyone
- What: Create a plugin system for custom transcription processors
- Why: Allow community to add new AI models and processing pipelines
- Impact: Extensible platform for innovation
- What: System tray support, real-time waveform display, custom themes
- Why: Better user experience and professional polish
- Impact: More intuitive and visually appealing
- What: UI translations and language-specific optimizations
- Why: Global accessibility for non-English users
- Impact: Worldwide adoption
- What: IDE plugins (VS Code, IntelliJ), browser extensions, mobile companion apps
- Why: Seamless workflow integration
- Impact: Professional productivity boost
-
๐ด Fork the Repository
git clone https://github.com/yourusername/dictationer.git cd dictationer -
๐ฑ Create a Feature Branch
git checkout -b feature/amazing-new-feature
-
๐ ๏ธ Make Your Changes
- Follow the coding standards in
docs/PLANNING.md - Add tests for new functionality
- Update documentation
- Follow the coding standards in
-
โ Test Thoroughly
# Run existing tests pytest # Test your feature manually ./start.bat # or .sh
-
๐ค Submit a Pull Request
- Clear description of changes
- Reference any related issues
- Include screenshots for UI changes
- ๐ฏ Start Small: Begin with documentation improvements or bug fixes
- ๐ Read the Docs: Check
docs/PLANNING.mdfor architecture details - ๐ฌ Discuss First: Open an issue for major features before implementing
- ๐งช Test Everything: Ensure cross-platform compatibility
- ๐ Document Changes: Update README and docs as needed
If Dictationer has made your life easier and saved you time, I'd be incredibly grateful for a small donation! โจ
Creating and maintaining open-source projects like this takes countless hours of development, testing, debugging, and support. Your contribution helps me:
- ๐ Keep innovating with new features and improvements
- ๐ Fix bugs and provide ongoing support
- ๐ป Buy better hardware for testing across different platforms
- โ Stay caffeinated during those late-night coding sessions!
Even if you can't donate, there are other amazing ways to support the project:
- โญ Star the repository - helps others discover the project
- ๐ Report bugs - help make it better for everyone
- ๐ Improve documentation - share your knowledge
- ๐ฌ Spread the word - tell your friends and colleagues
- ๐งช Test on different platforms - especially macOS users!
Every donation, no matter how small, means the world to me. It's not just about the money - it's knowing that something I built is genuinely helping people in their daily work and life. Whether you're a student taking notes, a professional writing documentation, or someone with accessibility needs, your support motivates me to keep building amazing tools for everyone.
Thank you for being awesome! ๐
from dictationer import RecordingController
# Simple recording with defaults
controller = RecordingController()
controller.start()from dictationer import RecordingController
# Customized recording setup
controller = RecordingController(
output_file="outputs/meeting_notes.wav",
hotkey="ctrl+alt+r",
enable_transcription=True,
model_size="base",
auto_paste=True
)
# Start the recording system
controller.start()
# Access individual components
if controller.audio.is_recording():
print("Currently recording...")
# Graceful shutdown
controller.stop()from dictationer.audio import AudioRecorder
from dictationer.processor import AudioProcessor
from dictationer.paster import ClipboardPaster
# Individual component usage
audio = AudioRecorder("output.wav")
audio.start_recording()
# ... perform operations
audio.stop_recording()
# Transcription processor
processor = AudioProcessor(model_size="base", watch_directory="outputs")
processor.start_monitoring()
# Clipboard operations
paster = ClipboardPaster()
success = paster.paste_text("Hello, world!")dictationer/
โโโ ๐ฆ src/dictationer/ # Main package source
โ โโโ ๐ง __init__.py # Package exports & version
โ โโโ ๐๏ธ main.py # RecordingController (orchestrator)
โ โโโ โจ๏ธ keyboard.py # KeyboardRecorder (hotkey detection)
โ โโโ ๐ต audio.py # AudioRecorder (audio capture)
โ โโโ ๐ค processor.py # AudioProcessor (transcription)
โ โโโ ๐ paster.py # ClipboardPaster (text automation)
โ โโโ ๐ฅ๏ธ gui.py # PySide6 GUI interface
โ โโโ โ๏ธ config.py # Configuration management
โโโ ๐ docs/ # Documentation
โ โโโ ๐๏ธ PLANNING.md # Architecture & design
โ โโโ ๐ TASK.md # Task management
โ โโโ ๐ API.md # API documentation
โโโ ๐งช tests/ # Unit tests
โโโ ๐ logs/ # Application logs
โโโ ๐ต outputs/ # Recording outputs
โโโ ๐ง config/ # Configuration files
โโโ ๐ main.py # CLI entry point
โโโ ๐ฅ๏ธ gui_main.py # GUI entry point
โโโ ๐ start.bat # Windows GUI launcher
โโโ ๐ start.sh # Linux/macOS GUI launcher
โโโ โ๏ธ pyproject.toml # Package configuration
โโโ ๐ฆ requirements.txt # Dependencies
โโโ ๐ README.md # This file
| Module | Purpose | Key Features |
|---|---|---|
| main.py | System orchestrator | Lifecycle management, signal handling, logging |
| keyboard.py | Hotkey detection | Global shortcuts, thread-safe state management |
| audio.py | Audio recording | PyAudio integration, WAV output, threading |
| processor.py | Speech-to-text | Whisper models, direct transcription, batch processing |
| paster.py | Text automation | Clipboard management, keyboard simulation |
| gui.py | GUI interface | PySide6 interface, model downloads, program control |
| config.py | Configuration | Settings management, device detection, model caching |
| Setting | Default Value | Description |
|---|---|---|
| Hotkey | ctrl+win+shift+l |
Global shortcut to toggle recording |
| Audio Format | WAV (16-bit, 16kHz, Mono) | High-quality audio output |
| Output Directory | outputs/ |
Recording storage location |
| Log Directory | logs/ |
Debug and monitoring logs |
| Whisper Model | base |
Speech recognition model |
| Auto-paste | True |
Automatic text insertion |
# Set custom defaults
export DICTATIONER_HOTKEY="ctrl+shift+r"
export WHISPER_MODEL_SIZE="large-v3"
export OUTPUT_DIRECTORY="recordings"controller = RecordingController(
output_file="meetings/2025-07-19_team_meeting.wav",
hotkey="ctrl+alt+r",
enable_transcription=True,
model_size="large-v3",
auto_paste=False
)# dictationer.yml
audio:
sample_rate: 16000
channels: 1
format: "wav"
transcription:
model_size: "base"
language: "auto"
auto_paste: true
hotkeys:
toggle_recording: "ctrl+win+shift+l"
emergency_stop: "ctrl+win+shift+x"keyboard module is very sensitive to hotkey format. Even small deviations will cause complete failure!
- Lowercase only:
ctrl,alt,shift,win(neverCtrl,Alt, etc.) - No spaces around plus signs:
ctrl+alt+r(neverctrl + alt + r) - Exact modifier names:
ctrlnotcontrol,winnotwindows - Plus sign separator: Use
+only (never-,_, or spaces)
- Modifiers:
ctrl,alt,shift,win - Letters:
a-z(lowercase only) - Numbers:
0-9 - Function:
f1-f12 - Special:
space,enter,esc,tab,backspace
# โ
CORRECT - These will work
"ctrl+shift+r" # Ctrl + Shift + R
"alt+f1" # Alt + F1
"ctrl+win+shift+l" # Ctrl + Win + Shift + L (default)
"ctrl+alt+space" # Ctrl + Alt + Space
"shift+f10" # Shift + F10
"win+alt+d" # Windows + Alt + D
# โ WRONG - These will fail silently
"Ctrl+Shift+R" # Uppercase modifiers
"ctrl + shift + r" # Spaces around plus signs
"control+shift+r" # Wrong modifier name
"ctrl-shift-r" # Wrong separator
"CTRL+SHIFT+R" # All uppercase- Test immediately: After setting a hotkey, test it right away
- Check logs: Look for "Recording state: ON/OFF" messages when pressing keys
- Use defaults first: Try
ctrl+win+shift+lto verify basic functionality - No silent failures: The keyboard module won't warn you about invalid formats
- Windows: All combinations supported (run as administrator for global hooks)
- Linux: May require X11 permissions for global hooks
- macOS: Requires accessibility permissions for global keyboard access
If your hotkey doesn't work:
- Check format exactly against the rules above
- Try the default:
ctrl+win+shift+lshould always work - Look at logs: No "Recording state" messages = bad format
- Test modifiers: Some combinations may conflict with system shortcuts
The application creates comprehensive logs for debugging and monitoring:
| Log File | Level | Purpose |
|---|---|---|
| Console | INFO+ | Real-time user feedback |
| voice_recorder_debug.log | DEBUG+ | Main system operations |
| audio_processor.log | DEBUG+ | Transcription and processing |
HH:MM:SS | LEVEL | MODULE | MESSAGE
14:30:25 | INFO | AudioRecorder | Recording started...
14:30:30 | DEBUG | AudioProcessor | Transcription completed
- Real-time Status: Recording state and system health
- Performance Metrics: Processing times and resource usage
- Error Tracking: Detailed exception handling and recovery
- Thread Monitoring: Multi-threaded operation tracking
# Clone and setup
git clone <repository-url>
cd dictationer
python -m venv venv_linux
source venv_linux/bin/activate # Linux/macOS
# Install with development tools
pip install -e .[dev]
# Install pre-commit hooks
pre-commit install# Run all tests
pytest
# Run with coverage
pytest --cov=src/dictationer --cov-report=html
# Run specific test categories
pytest tests/test_audio.py -v
pytest -k "test_recording" -v# Format code (required before commit)
black src/ tests/
# Type checking
mypy src/
# Linting
flake8 src/
# Run all quality checks
make lint # or equivalent script- Create Feature Branch:
git checkout -b feature/amazing-feature - Write Tests: Add tests for new functionality
- Implement Feature: Follow coding standards and patterns
- Run Quality Checks: Ensure all checks pass
- Update Documentation: Add/update relevant docs
- Submit PR: Include tests and documentation
# Problem: "No module named 'faster_whisper'" or similar import errors
# Solution: Run the setup script or ensure you're using the virtual environment
# Option 1: Run setup script (recommended)
python setup.py # Windows
python3 setup.py # Linux/macOS
# Option 2: Manual activation and install
# Windows
venv\Scripts\activate
pip install -r requirements.txt
# Linux/macOS
source venv/bin/activate
pip install -r requirements.txt
# Always use the launcher scripts for GUI
./start.bat # Windows
./start.sh # Linux/macOS# Problem: GPU not detected even when available
# Solution: Ensure PyTorch is installed with CUDA support
# Check current PyTorch installation
python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"
# Install PyTorch with CUDA (Windows/Linux)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
# For CPU-only systems
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu# Problem: Model download fails or gets stuck
# Solution: Clear cache and retry
# Clear HuggingFace cache
rm -rf ~/.cache/huggingface/
# Windows: rmdir /s "%USERPROFILE%\.cache\huggingface"
# Test model download manually
python -c "from faster_whisper import WhisperModel; WhisperModel('base')"
# For network issues, try different model names:
# - openai/whisper-base
# - openai/whisper-large-v3
# - distil-whisper/distil-large-v3# Problem: "No audio input device" or microphone not working
# Solution: Check audio device permissions and availability
# List available audio devices
python -c "
import pyaudio
p = pyaudio.PyAudio()
for i in range(p.get_device_count()):
info = p.get_device_info_by_index(i)
print(f'{i}: {info[\"name\"]} - Inputs: {info[\"maxInputChannels\"]}')
"
# Test microphone access
python -c "
import pyaudio
p = pyaudio.PyAudio()
try:
info = p.get_default_input_device_info()
print(f'Default microphone: {info[\"name\"]} - OK')
except:
print('No default microphone found')
"# Windows - Install Microsoft Visual C++ Build Tools
# Download from: https://visualstudio.microsoft.com/visual-cpp-build-tools/
# Then: pip install pyaudio
# Linux (Ubuntu/Debian)
sudo apt-get update
sudo apt-get install portaudio19-dev python3-dev build-essential
pip install pyaudio
# macOS
brew install portaudio
pip install pyaudio
# Alternative: Use conda
conda install pyaudio# Windows - Run as administrator (required for global hotkeys)
# Right-click Command Prompt โ "Run as administrator"
# Or run GUI from elevated terminal
# Linux - Add user to input group
sudo usermod -a -G input $USER
# Logout and login again
# macOS - Grant accessibility permissions
# System Preferences โ Security & Privacy โ Privacy โ Accessibility
# Add Terminal or your IDE to allowed applications# Problem: Hotkey doesn't trigger recording, no response when pressed
# Solution: Check format and test systematically
# 1. Verify exact format in GUI settings
# Open GUI โ Settings โ Audio Settings โ Hotkey field
# Should look exactly like: ctrl+win+shift+l
# 2. Common format issues to check:
echo "Checking common hotkey format problems..."
# โ Wrong: Uppercase letters
# "Ctrl+Win+Shift+L"
# โ Wrong: Spaces around plus signs
# "ctrl + win + shift + l"
# โ Wrong: Different modifier names
# "control+windows+shift+l"
# โ Wrong: Different separators
# "ctrl-win-shift-l" or "ctrl_win_shift_l"
# โ
Correct format:
# "ctrl+win+shift+l"
# 3. Test with known working hotkey
# Set hotkey to exactly: ctrl+win+shift+l
# This is the tested default that should work
# 4. Check for conflicts with system shortcuts
# Try a simple combination like: ctrl+alt+f1
# Some complex combinations may conflict with OS hotkeys
# 5. Verify program is running and listening
# In GUI logs, look for:
# "[KEYBOARD] Hotkey registered successfully"
# "[KEYBOARD] Starting keyboard event loop"
# 6. Test hotkey detection manually
python -c "
import keyboard
print('Testing hotkey detection...')
print('Press Ctrl+Alt+T to test (or your hotkey)')
try:
keyboard.wait('ctrl+alt+t')
print('โ
Hotkey detected successfully!')
print('Your format is correct, check Dictationer settings')
except Exception as e:
print(f'โ Hotkey detection failed: {e}')
print('Check format and admin permissions')
"
# 7. Platform-specific debugging
# Windows: Ensure running as administrator
# Linux: Check X11 permissions and input group membership
# macOS: Verify accessibility permissions granted# Problem: Recording works but no transcription appears
# Solution: Check processor initialization
# Verify dependencies are installed correctly
python -c "
try:
from faster_whisper import WhisperModel
print('faster-whisper: OK')
except Exception as e:
print(f'faster-whisper: FAILED - {e}')
try:
from watchdog.observers import Observer
print('watchdog: OK')
except Exception as e:
print(f'watchdog: FAILED - {e}')
"
# Check if audio files are being created
ls -la outputs/ # Should show .wav files after recording
# Manually test transcription
python -c "
from src.dictationer.processor import AudioProcessor
processor = AudioProcessor('base', 'outputs', True)
result = processor.transcribe_file('outputs/recording.wav')
print(f'Transcription result: {result}')
"# Problem: 'charmap' codec errors in Windows console
# Solution: Use UTF-8 encoding
# Set environment variable (Windows)
set PYTHONIOENCODING=utf-8
# Or use PowerShell instead of Command Prompt
# All GUI launchers handle this automatically# Problem: GUI fails to launch or crashes immediately
# Solution: Check PySide6 installation and dependencies
# Verify PySide6 installation
python -c "
try:
from PySide6.QtWidgets import QApplication
print('PySide6: OK')
except Exception as e:
print(f'PySide6: FAILED - {e}')
print('Install with: pip install PySide6')
"
# Check if display is available (Linux)
echo $DISPLAY # Should show :0 or similar
# For headless systems, use Xvfb
sudo apt-get install xvfb
xvfb-run python gui_main.py# Problem: Slow transcription or high CPU usage
# Solution: Optimize model and device settings
# Use smaller model for faster processing
# In GUI: Select "tiny" or "base" instead of "large"
# Enable GPU if available
# GUI will auto-detect, or manually verify:
python -c "
import torch
if torch.cuda.is_available():
print(f'GPU available: {torch.cuda.get_device_name()}')
print('Enable GPU in GUI settings for faster processing')
else:
print('No GPU available, using CPU')
"
# For very slow systems, disable real-time transcription
# Edit config/settings.json: "enable_transcription": false# Check log files for errors
tail -f logs/voice_recorder_debug.log
tail -f logs/audio_processor.log
# Look for specific error patterns
grep -i error logs/*.log
grep -i failed logs/*.log
grep -i exception logs/*.log# Test individual components
# 1. Test audio recording
python -c "
from src.dictationer.audio import AudioRecorder
recorder = AudioRecorder('test.wav')
print('Press Enter to start recording...')
input()
recorder.start_recording()
input('Press Enter to stop...')
recorder.stop_recording()
print('Check test.wav file')
"
# 2. Test transcription
python -c "
from src.dictationer.processor import AudioProcessor
processor = AudioProcessor()
result = processor.transcribe_file('test.wav')
print(f'Result: {result}')
"
# 3. Test clipboard pasting
python -c "
from src.dictationer.paster import ClipboardPaster
paster = ClipboardPaster()
success = paster.paste_text('Test message')
print(f'Paste success: {success}')
"If you're still experiencing issues:
- Check log files in the
logs/directory for detailed error messages - Run GUI launcher scripts (
start.bat/start.sh) instead of calling Python directly - Verify virtual environment is activated and all dependencies are installed
- Test components individually using the manual testing scripts above
- Check system requirements (microphone access, admin permissions, etc.)
For additional support:
- ๐ Read:
docs/PLANNING.mdfor architecture details - ๐ Report bugs: Create an issue with log files and error messages
- ๐ก Feature requests: Describe your use case and requirements
import logging
logging.basicConfig(level=logging.DEBUG)
# Or set environment variable
export DICTATIONER_LOG_LEVEL=DEBUG# Real-time log monitoring
tail -f logs/voice_recorder_debug.log
# Search for specific issues
grep ERROR logs/*.log
grep "CRITICAL\|FATAL" logs/*.log# Enable performance profiling
controller = RecordingController(profile_performance=True)
# Check system resources
import psutil
print(f"CPU: {psutil.cpu_percent()}%, Memory: {psutil.virtual_memory().percent}%")We welcome contributions! Here's how to get started:
- ๐ Bug Reports: Report issues with detailed reproduction steps
- ๐ก Feature Requests: Suggest new features or improvements
- ๐ Documentation: Improve docs, tutorials, or examples
- ๐งช Testing: Add tests or improve test coverage
- ๐ป Code: Fix bugs or implement new features
-
Fork & Clone
git clone https://github.com/yourusername/dictationer.git cd dictationer -
Setup Development Environment
python -m venv venv_linux source venv_linux/bin/activate pip install -e .[dev] -
Create Feature Branch
git checkout -b feature/amazing-feature
-
Make Changes
- Follow the coding standards in
docs/PLANNING.md - Add tests for new functionality
- Update documentation as needed
- Follow the coding standards in
-
Quality Checks
black src/ tests/ # Format code mypy src/ # Type checking pytest --cov=src/dictationer # Run tests
-
Submit Pull Request
- Include clear description of changes
- Reference any related issues
- Ensure all checks pass
- Be respectful and inclusive
- Provide constructive feedback
- Help others learn and grow
MIT License - see LICENSE file for details.
- ๐๏ธ Architecture & PRD - System design, requirements, and architecture overview
- ๐ Task Management - Project roadmap and tasks
- ๐ API Reference - Detailed API documentation
- ๐ ๏ธ Development Guide - Setup, standards, testing, and debugging
- ๐ฅ๏ธ PySide6 GUI Interface: Complete graphical user interface with modern design
- ๐๏ธ Settings Management: GUI-based configuration with device detection
- ๐ฆ Model Download Manager: Download any HuggingFace Whisper model with progress tracking
- ๐ง Configuration System: JSON-based settings with automatic GPU detection
- ๐ Launcher Scripts: Cross-platform GUI launchers with proper virtual environment handling
- ๐ฎ Program Control: Start/Stop main program from GUI with real-time status
- ๐ Live Log Display: Real-time log output with scrolling and filtering
- ๐ Model Cache Access: Direct access to model storage folder
- ๐ ๏ธ Simplified Architecture: Removed dual processing paths for better reliability
- ๐ Enhanced Debugging: Comprehensive logging and error handling
- ๐ Complete Documentation: Updated docs with troubleshooting guide
- โจ Initial Release: Complete voice recording and transcription system
- ๐๏ธ Modular Architecture: Clean separation of concerns with threaded design
- ๐ต High-Quality Audio: 16-bit WAV recording with PyAudio integration
- ๐ค Advanced Transcription: Faster-Whisper integration with multiple model sizes
- ๐ Smart Text Pasting: Intelligent clipboard management and automation
- โจ๏ธ Global Hotkeys: Configurable keyboard shortcuts for system control
- ๐ Comprehensive Logging: Detailed monitoring and debugging capabilities
- ๐ก๏ธ Thread Safety: Robust concurrent operation handling
- ๐ฆ Professional Packaging: Proper Python package structure and installation
- ๐ง AI Text Reformatter: Complete integration of Google Gemini AI for intelligent text enhancement
- ๐๏ธ Reformatter Settings: New Settings UI section with enable/disable and hold duration controls
- ๐ค Grammar Fix Mode: Automatically corrects spelling, grammar, and punctuation in selected text
- โก Background Service: Independent daemon thread operation alongside main dictation system
- ๐ฑ๏ธ Ctrl+Hold Trigger: Configurable hold duration (1-5 seconds) for reformatting selected text
- ๐ Enhanced Status Display: Combined status showing both Recording System and Reformatter service states
- ๐ก๏ธ Professional Error Handling: Comprehensive exception handling with informative GUI dialogs for missing dependencies
- ๐ Smart Lifecycle Management: Automatic service startup/shutdown with clean thread management
- โ๏ธ Configuration Integration: Full persistence of reformatter settings in JSON config file
- ๐ Clipboard Integration: Seamless text copying, reformatting, and pasting workflow
- ๐ Hotkey Monitoring Lock: Prevents multiple simultaneous reformatting operations
- ๐จ Status Window Integration: Visual feedback during reformatting process with API-synced timing
- ๐งช Unit Testing: Comprehensive test suite with 90% coverage
- ๐จ System Tray: Background operation with system tray controls
- ๐ Cross-Platform: Enhanced macOS and Linux support
- ๐ Audio Formats: Support for MP3, FLAC, and other audio formats
- ๐ Dynamic Settings Restart: Restart reformatter service when settings change without GUI restart
Made with โค๏ธ for the voice recording community
โญ Star this repo โข ๐ Report Bug โข ๐ก Request Feature
