🎤 Dictationer - Advanced Voice Recording & Transcription System

Effortlessly capture and transcribe your thoughts with intelligent voice recording and powerful AI.

💡 GPU Support Available: For faster transcription, install with CUDA support. See GPU installation instructions below.

A professional voice recording system with PySide6 GUI, real-time transcription, intelligent hotkey control, and automatic text pasting. Features both command-line and graphical interfaces with robust threaded architecture for seamless performance. Now includes AI-powered text reformatting with Google Gemini integration.

✨ Key Features

🚀 Core Functionality

🖥️ Modern GUI Interface: Intuitive PySide6 interface for seamless configuration and control.
🔥 Global Hotkey Support: Instantly toggle recording with customizable keyboard shortcuts.
🎵 High-Quality Audio Recording: Professional WAV output (16-bit depth, 16kHz sample rate).
🤖 Real-time Transcription: Accurate speech-to-text powered by HuggingFace Whisper models.
📋 Intelligent Text Pasting: Automatic clipboard management and text insertion.
🖱️ Program Controls: GUI buttons for start/stop with real-time status indicators.
🔧 AI Text Reformatter: Intelligent text enhancement with Google Gemini AI - hold Ctrl to reformat selected text with grammar fixes.

⚡ Advanced Capabilities

💻 GPU/CUDA Acceleration: Optimized performance with NVIDIA GPUs using float16 precision.
🔄 Automatic Model Conversion: Seamlessly converts HuggingFace PyTorch models to CTranslate2 format for faster-whisper compatibility.
⚡ Optimized Distil-Whisper: Integrates Distil-Whisper models with enhanced parameters (beam_size=5, language='en', condition_on_previous_text=False) for improved accuracy and speed.
⚙️ Smart Configuration: GUI-based settings with automatic GPU detection and model management.
⚡ Threaded Architecture: Non-blocking operations and real-time status monitoring.
🛡️ Thread Safety: Robust concurrent operation handling with proper locking mechanisms.
📊 Comprehensive Logging: Detailed debugging and monitoring logs for issue diagnosis.
🔄 Graceful Shutdown: Clean handling of interrupts and system termination.
📦 Model Management: Download and manage any HuggingFace Whisper model directly from the GUI.
🤖 AI-Powered Text Enhancement: Background service for grammar and spelling correction.
🎛️ Reformatter Configuration: Adjustable hold duration and enable/disable controls in Settings.
⚡ Independent Operation: Reformatter runs alongside main dictation system without interference.

📦 Installation

Prerequisites

Python 3.8 or higher must be installed on your system
- Windows: Download from python.org
- Linux: Usually pre-installed, or use sudo apt install python3
- macOS: Use python.org or brew install python3

Quick Start (Recommended)

# Clone the repository
git clone <repository-url>
cd dictationer

# Run the automated setup script
python setup.py    # Windows
python3 setup.py   # Linux/macOS

The setup script will:

✅ Check your Python version
✅ Create a virtual environment
✅ Guide you through GPU setup options
✅ Install all dependencies
✅ Verify the installation

After setup completes, launch the program:

./start.bat    # Windows
./start.sh     # Linux/macOS

GPU Support (Optional but Recommended)

The setup script will explain GPU requirements, but here's a summary:

Install CUDA Toolkit from NVIDIA
Check your CUDA version: nvidia-smi
Install PyTorch with CUDA (after activating the venv):

# Activate the virtual environment first
venv\Scripts\activate     # Windows
source venv/bin/activate  # Linux/macOS

# Then install PyTorch based on your CUDA version:
# For CUDA 12.1 (most recent GPUs):
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

# For CUDA 11.8 (older GPUs):
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# For CPU only (if no GPU):
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu

Note: The GUI will automatically detect and use your GPU if available. You can verify this in Settings → Device.

⌨️ Setting Up Your Hotkey (Important!)

⚠️ CRITICAL: The hotkey format must be EXACTLY correct or it won't work!

Quick Setup Steps

Launch the GUI:

./start.bat    # Windows
./start.sh     # Linux/macOS

Configure Your Hotkey:
- Go to Settings tab
- Find Audio Settings → Hotkey field
- Enter your hotkey in the exact format: ctrl+win+shift+l
Format Requirements:
- Lowercase letters only: ctrl not Ctrl
- Plus signs with no spaces: ctrl+alt+r not ctrl + alt + r
- Use exact modifier names: ctrl, alt, shift, win

✅ Valid Examples

ctrl+win+shift+l     ✓ Default hotkey
ctrl+alt+r           ✓ Simple combination  
shift+f1             ✓ Function key combo
ctrl+shift+space     ✓ With space key

❌ Common Mistakes

Ctrl+Win+Shift+L     ✗ Uppercase letters
ctrl + alt + r       ✗ Spaces around plus signs
control+alt+r        ✗ Wrong modifier name
ctrl-alt-r           ✗ Wrong separator

🧪 Testing Your Hotkey

After setting your hotkey in the GUI:

Click Start Program button
Try pressing your hotkey combination
Look for "Recording state: ON/OFF" message in the log output
If nothing happens, check the format and try again

💡 Tip: The default ctrl+win+shift+l is tested to work reliably across platforms!

🔧 System Requirements

CPU: Capable of running base or small Whisper models.
GPU: Highly recommended for optimal performance, especially with larger models. Supports NVIDIA GPUs with CUDA.

Core Dependencies

# Audio Recording & Processing
pyaudio          # High-quality audio capture
faster-whisper   # Advanced speech recognition engine
watchdog         # File system event monitoring

# System & Utilities
keyboard         # Global hotkey detection
pyperclip       # Clipboard automation
python-dotenv    # Environment variable management

# GUI Framework (v1.1+)
PySide6          # Modern GUI toolkit

# AI & GPU Acceleration
transformers     # HuggingFace model integration
torch            # PyTorch for GPU support (install with CUDA)
ctranslate2     # For optimized model inference

Development Dependencies

# Testing and code quality
pytest>=6.0     # Unit testing framework
pytest-cov      # Coverage reporting
black           # Code formatting
mypy            # Type checking
flake8          # Linting

🖥️ Platform-Specific Setup

Windows

# Install Microsoft Visual C++ Build Tools if needed
# Run as administrator for keyboard hook permissions

Linux (Ubuntu/Debian)

sudo apt-get update
sudo apt-get install portaudio19-dev python3-dev
pip install dictationer

macOS

brew install portaudio
pip install dictationer
# Grant accessibility permissions when prompted

🚀 Usage

🖥️ GUI Interface (Recommended)

Quick Start

# Launch GUI with proper virtual environment
./start.bat    # Windows
./start.sh     # Linux/macOS

# Or manually activate and run
source venv/bin/activate  # Linux/macOS
# venv\Scripts\activate   # Windows
python gui_main.py

GUI Features

🎛️ Settings Configuration: Device selection (CPU/GPU), model selection, hotkey customization
📦 Model Download: Download any HuggingFace Whisper model with progress tracking
📁 Model Management: View cached models and open models folder
🎮 Program Control: Start/Stop the main recording program with real-time status
🔧 AI Text Reformatter: Configure and control intelligent text enhancement
📊 Live Logs: Real-time log output with scrolling display

🔧 AI Text Reformatter Usage

The integrated AI Text Reformatter enhances any selected text using Google Gemini AI. Perfect for improving dictated text, emails, documents, and more.

Setup:

Get Gemini API Key: Visit Google AI Studio for a free API key
Configure Environment: Create a .env file in the project root:
```
GEMINI_API_KEY=your-api-key-here
```
Enable in Settings: Go to Settings tab → 🔧 Text Reformatter section → Enable checkbox

Usage:

Select text in any application (browser, editor, email, etc.)
Hold Ctrl key for 2 seconds (configurable 1-5 seconds)
Watch the magic - text is automatically copied, corrected for grammar and spelling, and pasted back

Status Monitoring:

Settings shows: Recording System: Ready | Reformatter: Active
Independent operation alongside voice recording
Real-time status updates in GUI

Advanced Configuration

# Custom hotkey and output
python main.py --output "recordings/session.wav"

# Use different Whisper model
python main.py  # Configure via GUI or config files

🤝 Contributions

We welcome contributions from the community! This project has huge potential and there are many exciting features we'd love to add:

🌟 High-Impact Contribution Ideas

⌨️ Hotkey Recorder Widget

What: Add a "Record Hotkey" button in GUI that captures key presses and auto-formats them correctly
Why: Current hotkey format is error-prone and causes silent failures for users
Impact: Eliminates the #1 setup frustration and improves user experience dramatically

🍎 macOS Support & Testing

What: Thorough testing and optimization for macOS users
Why: Many users want native macOS support with proper accessibility permissions
Impact: Expands user base significantly

🌐 OpenAI Whisper API Integration

What: Add support for OpenAI's official Whisper API as an alternative to local processing
Why: Cloud processing for users without powerful hardware
Impact: Makes transcription accessible to everyone

🔌 Plugin Architecture

What: Create a plugin system for custom transcription processors
Why: Allow community to add new AI models and processing pipelines
Impact: Extensible platform for innovation

🎨 Advanced UI Features

What: System tray support, real-time waveform display, custom themes
Why: Better user experience and professional polish
Impact: More intuitive and visually appealing

🌍 Multi-Language Support

What: UI translations and language-specific optimizations
Why: Global accessibility for non-English users
Impact: Worldwide adoption

🔧 Advanced Integrations

What: IDE plugins (VS Code, IntelliJ), browser extensions, mobile companion apps
Why: Seamless workflow integration
Impact: Professional productivity boost

📋 How to Contribute

🍴 Fork the Repository

git clone https://github.com/yourusername/dictationer.git
cd dictationer

🌱 Create a Feature Branch

git checkout -b feature/amazing-new-feature

🛠️ Make Your Changes
- Follow the coding standards in docs/PLANNING.md
- Add tests for new functionality
- Update documentation

✅ Test Thoroughly

# Run existing tests
pytest

# Test your feature manually
./start.bat  # or .sh

📤 Submit a Pull Request
- Clear description of changes
- Reference any related issues
- Include screenshots for UI changes

💡 Contribution Guidelines

🎯 Start Small: Begin with documentation improvements or bug fixes
📖 Read the Docs: Check docs/PLANNING.md for architecture details
💬 Discuss First: Open an issue for major features before implementing
🧪 Test Everything: Ensure cross-platform compatibility
📝 Document Changes: Update README and docs as needed

💖 Donations

If Dictationer has made your life easier and saved you time, I'd be incredibly grateful for a small donation! ✨

Creating and maintaining open-source projects like this takes countless hours of development, testing, debugging, and support. Your contribution helps me:

🚀 Keep innovating with new features and improvements
🐛 Fix bugs and provide ongoing support
💻 Buy better hardware for testing across different platforms
☕ Stay caffeinated during those late-night coding sessions!

🎁 Ways to Support

🌟 Other Ways to Help

Even if you can't donate, there are other amazing ways to support the project:

⭐ Star the repository - helps others discover the project
🐛 Report bugs - help make it better for everyone
📖 Improve documentation - share your knowledge
💬 Spread the word - tell your friends and colleagues
🧪 Test on different platforms - especially macOS users!

💌 A Personal Note

Every donation, no matter how small, means the world to me. It's not just about the money - it's knowing that something I built is genuinely helping people in their daily work and life. Whether you're a student taking notes, a professional writing documentation, or someone with accessibility needs, your support motivates me to keep building amazing tools for everyone.

Thank you for being awesome! 🙏

🐍 Programmatic API

Basic Recording

from dictationer import RecordingController

# Simple recording with defaults
controller = RecordingController()
controller.start()

Advanced Configuration

from dictationer import RecordingController

# Customized recording setup
controller = RecordingController(
    output_file="outputs/meeting_notes.wav",
    hotkey="ctrl+alt+r",
    enable_transcription=True,
    model_size="base",
    auto_paste=True
)

# Start the recording system
controller.start()

# Access individual components
if controller.audio.is_recording():
    print("Currently recording...")

# Graceful shutdown
controller.stop()

Component-Level Access

from dictationer.audio import AudioRecorder
from dictationer.processor import AudioProcessor
from dictationer.paster import ClipboardPaster

# Individual component usage
audio = AudioRecorder("output.wav")
audio.start_recording()
# ... perform operations
audio.stop_recording()

# Transcription processor
processor = AudioProcessor(model_size="base", watch_directory="outputs")
processor.start_monitoring()

# Clipboard operations
paster = ClipboardPaster()
success = paster.paste_text("Hello, world!")

📁 Project Architecture

dictationer/
├── 📦 src/dictationer/           # Main package source
│   ├── 🔧 __init__.py            # Package exports & version
│   ├── 🎛️ main.py               # RecordingController (orchestrator)
│   ├── ⌨️ keyboard.py            # KeyboardRecorder (hotkey detection)
│   ├── 🎵 audio.py              # AudioRecorder (audio capture)
│   ├── 🤖 processor.py          # AudioProcessor (transcription)
│   ├── 📋 paster.py             # ClipboardPaster (text automation)
│   ├── 🖥️ gui.py                # PySide6 GUI interface
│   └── ⚙️ config.py             # Configuration management
├── 📚 docs/                      # Documentation
│   ├── 🏗️ PLANNING.md           # Architecture & design
│   ├── 📋 TASK.md               # Task management
│   └── 📖 API.md                # API documentation
├── 🧪 tests/                     # Unit tests
├── 📊 logs/                      # Application logs
├── 🎵 outputs/                   # Recording outputs
├── 🔧 config/                    # Configuration files
├── 🚀 main.py                    # CLI entry point
├── 🖥️ gui_main.py               # GUI entry point
├── 🏃 start.bat             # Windows GUI launcher
├── 🏃 start.sh              # Linux/macOS GUI launcher
├── ⚙️ pyproject.toml             # Package configuration
├── 📦 requirements.txt           # Dependencies
└── 📖 README.md                 # This file

🧩 Module Overview

Module	Purpose	Key Features
main.py	System orchestrator	Lifecycle management, signal handling, logging
keyboard.py	Hotkey detection	Global shortcuts, thread-safe state management
audio.py	Audio recording	PyAudio integration, WAV output, threading
processor.py	Speech-to-text	Whisper models, direct transcription, batch processing
paster.py	Text automation	Clipboard management, keyboard simulation
gui.py	GUI interface	PySide6 interface, model downloads, program control
config.py	Configuration	Settings management, device detection, model caching

⚙️ Configuration

📋 Default Settings

Setting	Default Value	Description
Hotkey	`ctrl+win+shift+l`	Global shortcut to toggle recording
Audio Format	WAV (16-bit, 16kHz, Mono)	High-quality audio output
Output Directory	`outputs/`	Recording storage location
Log Directory	`logs/`	Debug and monitoring logs
Whisper Model	`base`	Speech recognition model
Auto-paste	`True`	Automatic text insertion

🎛️ Customization Options

Environment Variables

# Set custom defaults
export DICTATIONER_HOTKEY="ctrl+shift+r"
export WHISPER_MODEL_SIZE="large-v3"
export OUTPUT_DIRECTORY="recordings"

Programmatic Configuration

controller = RecordingController(
    output_file="meetings/2025-07-19_team_meeting.wav",
    hotkey="ctrl+alt+r",
    enable_transcription=True,
    model_size="large-v3",
    auto_paste=False
)

Configuration File (Future)

# dictationer.yml
audio:
  sample_rate: 16000
  channels: 1
  format: "wav"
  
transcription:
  model_size: "base"
  language: "auto"
  auto_paste: true
  
hotkeys:
  toggle_recording: "ctrl+win+shift+l"
  emergency_stop: "ctrl+win+shift+x"

⌨️ Hotkey Configuration

⚠️ FORMAT CRITICAL: The Python keyboard module is very sensitive to hotkey format. Even small deviations will cause complete failure!

Required Format Rules

Lowercase only: ctrl, alt, shift, win (never Ctrl, Alt, etc.)
No spaces around plus signs: ctrl+alt+r (never ctrl + alt + r)
Exact modifier names: ctrl not control, win not windows
Plus sign separator: Use + only (never -, _, or spaces)

Supported Keys

Modifiers: ctrl, alt, shift, win
Letters: a-z (lowercase only)
Numbers: 0-9
Function: f1-f12
Special: space, enter, esc, tab, backspace

Hotkey Examples

# ✅ CORRECT - These will work
"ctrl+shift+r"           # Ctrl + Shift + R
"alt+f1"                 # Alt + F1
"ctrl+win+shift+l"       # Ctrl + Win + Shift + L (default)
"ctrl+alt+space"         # Ctrl + Alt + Space
"shift+f10"              # Shift + F10
"win+alt+d"              # Windows + Alt + D

# ❌ WRONG - These will fail silently
"Ctrl+Shift+R"           # Uppercase modifiers
"ctrl + shift + r"       # Spaces around plus signs
"control+shift+r"        # Wrong modifier name
"ctrl-shift-r"           # Wrong separator
"CTRL+SHIFT+R"           # All uppercase

Format Validation Tips

Test immediately: After setting a hotkey, test it right away
Check logs: Look for "Recording state: ON/OFF" messages when pressing keys
Use defaults first: Try ctrl+win+shift+l to verify basic functionality
No silent failures: The keyboard module won't warn you about invalid formats

Platform Considerations

Windows: All combinations supported (run as administrator for global hooks)
Linux: May require X11 permissions for global hooks
macOS: Requires accessibility permissions for global keyboard access

Troubleshooting Invalid Hotkeys

If your hotkey doesn't work:

Check format exactly against the rules above
Try the default: ctrl+win+shift+l should always work
Look at logs: No "Recording state" messages = bad format
Test modifiers: Some combinations may conflict with system shortcuts

📊 Logging & Monitoring

Log Structure

The application creates comprehensive logs for debugging and monitoring:

Log File	Level	Purpose
Console	INFO+	Real-time user feedback
voice_recorder_debug.log	DEBUG+	Main system operations
audio_processor.log	DEBUG+	Transcription and processing

Log Format

HH:MM:SS | LEVEL    | MODULE          | MESSAGE
14:30:25 | INFO     | AudioRecorder   | Recording started...
14:30:30 | DEBUG    | AudioProcessor  | Transcription completed

Monitoring Features

Real-time Status: Recording state and system health
Performance Metrics: Processing times and resource usage
Error Tracking: Detailed exception handling and recovery
Thread Monitoring: Multi-threaded operation tracking

🛠️ Development

🚀 Quick Development Setup

# Clone and setup
git clone <repository-url>
cd dictationer
python -m venv venv_linux
source venv_linux/bin/activate  # Linux/macOS

# Install with development tools
pip install -e .[dev]

# Install pre-commit hooks
pre-commit install

🧪 Testing

# Run all tests
pytest

# Run with coverage
pytest --cov=src/dictationer --cov-report=html

# Run specific test categories
pytest tests/test_audio.py -v
pytest -k "test_recording" -v

🎨 Code Quality

# Format code (required before commit)
black src/ tests/

# Type checking
mypy src/

# Linting
flake8 src/

# Run all quality checks
make lint  # or equivalent script

📋 Development Workflow

Create Feature Branch: git checkout -b feature/amazing-feature
Write Tests: Add tests for new functionality
Implement Feature: Follow coding standards and patterns
Run Quality Checks: Ensure all checks pass
Update Documentation: Add/update relevant docs
Submit PR: Include tests and documentation

🔧 Troubleshooting

🚨 Common Issues & Solutions

Virtual Environment Issues

# Problem: "No module named 'faster_whisper'" or similar import errors
# Solution: Run the setup script or ensure you're using the virtual environment

# Option 1: Run setup script (recommended)
python setup.py    # Windows
python3 setup.py   # Linux/macOS

# Option 2: Manual activation and install
# Windows
venv\Scripts\activate
pip install -r requirements.txt

# Linux/macOS  
source venv/bin/activate
pip install -r requirements.txt

# Always use the launcher scripts for GUI
./start.bat    # Windows
./start.sh     # Linux/macOS

GPU Detection Problems

# Problem: GPU not detected even when available
# Solution: Ensure PyTorch is installed with CUDA support

# Check current PyTorch installation
python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"

# Install PyTorch with CUDA (Windows/Linux)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

# For CPU-only systems
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu

Model Download Failures

# Problem: Model download fails or gets stuck
# Solution: Clear cache and retry

# Clear HuggingFace cache
rm -rf ~/.cache/huggingface/
# Windows: rmdir /s "%USERPROFILE%\.cache\huggingface"

# Test model download manually
python -c "from faster_whisper import WhisperModel; WhisperModel('base')"

# For network issues, try different model names:
# - openai/whisper-base
# - openai/whisper-large-v3
# - distil-whisper/distil-large-v3

Audio Recording Issues

# Problem: "No audio input device" or microphone not working
# Solution: Check audio device permissions and availability

# List available audio devices
python -c "
import pyaudio
p = pyaudio.PyAudio()
for i in range(p.get_device_count()):
    info = p.get_device_info_by_index(i)
    print(f'{i}: {info[\"name\"]} - Inputs: {info[\"maxInputChannels\"]}')
"

# Test microphone access
python -c "
import pyaudio
p = pyaudio.PyAudio()
try:
    info = p.get_default_input_device_info()
    print(f'Default microphone: {info[\"name\"]} - OK')
except:
    print('No default microphone found')
"

PyAudio Installation Problems

# Windows - Install Microsoft Visual C++ Build Tools
# Download from: https://visualstudio.microsoft.com/visual-cpp-build-tools/
# Then: pip install pyaudio

# Linux (Ubuntu/Debian)
sudo apt-get update
sudo apt-get install portaudio19-dev python3-dev build-essential
pip install pyaudio

# macOS
brew install portaudio
pip install pyaudio

# Alternative: Use conda
conda install pyaudio

Keyboard Hook Permissions

# Windows - Run as administrator (required for global hotkeys)
# Right-click Command Prompt → "Run as administrator"
# Or run GUI from elevated terminal

# Linux - Add user to input group
sudo usermod -a -G input $USER
# Logout and login again

# macOS - Grant accessibility permissions
# System Preferences → Security & Privacy → Privacy → Accessibility
# Add Terminal or your IDE to allowed applications

Hotkey Not Working / Not Responding

# Problem: Hotkey doesn't trigger recording, no response when pressed
# Solution: Check format and test systematically

# 1. Verify exact format in GUI settings
# Open GUI → Settings → Audio Settings → Hotkey field
# Should look exactly like: ctrl+win+shift+l

# 2. Common format issues to check:
echo "Checking common hotkey format problems..."

# ❌ Wrong: Uppercase letters
# "Ctrl+Win+Shift+L" 

# ❌ Wrong: Spaces around plus signs  
# "ctrl + win + shift + l"

# ❌ Wrong: Different modifier names
# "control+windows+shift+l"

# ❌ Wrong: Different separators
# "ctrl-win-shift-l" or "ctrl_win_shift_l"

# ✅ Correct format:
# "ctrl+win+shift+l"

# 3. Test with known working hotkey
# Set hotkey to exactly: ctrl+win+shift+l
# This is the tested default that should work

# 4. Check for conflicts with system shortcuts
# Try a simple combination like: ctrl+alt+f1
# Some complex combinations may conflict with OS hotkeys

# 5. Verify program is running and listening
# In GUI logs, look for:
# "[KEYBOARD] Hotkey registered successfully"
# "[KEYBOARD] Starting keyboard event loop"

# 6. Test hotkey detection manually
python -c "
import keyboard
print('Testing hotkey detection...')
print('Press Ctrl+Alt+T to test (or your hotkey)')
try:
    keyboard.wait('ctrl+alt+t')
    print('✅ Hotkey detected successfully!')
    print('Your format is correct, check Dictationer settings')
except Exception as e:
    print(f'❌ Hotkey detection failed: {e}')
    print('Check format and admin permissions')
"

# 7. Platform-specific debugging
# Windows: Ensure running as administrator
# Linux: Check X11 permissions and input group membership
# macOS: Verify accessibility permissions granted

Transcription Not Working

# Problem: Recording works but no transcription appears
# Solution: Check processor initialization

# Verify dependencies are installed correctly
python -c "
try:
    from faster_whisper import WhisperModel
    print('faster-whisper: OK')
except Exception as e:
    print(f'faster-whisper: FAILED - {e}')

try:
    from watchdog.observers import Observer
    print('watchdog: OK')
except Exception as e:
    print(f'watchdog: FAILED - {e}')
"

# Check if audio files are being created
ls -la outputs/  # Should show .wav files after recording

# Manually test transcription
python -c "
from src.dictationer.processor import AudioProcessor
processor = AudioProcessor('base', 'outputs', True)
result = processor.transcribe_file('outputs/recording.wav')
print(f'Transcription result: {result}')
"

Unicode/Encoding Errors

# Problem: 'charmap' codec errors in Windows console
# Solution: Use UTF-8 encoding

# Set environment variable (Windows)
set PYTHONIOENCODING=utf-8

# Or use PowerShell instead of Command Prompt
# All GUI launchers handle this automatically

GUI Won't Start

# Problem: GUI fails to launch or crashes immediately
# Solution: Check PySide6 installation and dependencies

# Verify PySide6 installation
python -c "
try:
    from PySide6.QtWidgets import QApplication
    print('PySide6: OK')
except Exception as e:
    print(f'PySide6: FAILED - {e}')
    print('Install with: pip install PySide6')
"

# Check if display is available (Linux)
echo $DISPLAY  # Should show :0 or similar

# For headless systems, use Xvfb
sudo apt-get install xvfb
xvfb-run python gui_main.py

Performance Issues

# Problem: Slow transcription or high CPU usage
# Solution: Optimize model and device settings

# Use smaller model for faster processing
# In GUI: Select "tiny" or "base" instead of "large"

# Enable GPU if available
# GUI will auto-detect, or manually verify:
python -c "
import torch
if torch.cuda.is_available():
    print(f'GPU available: {torch.cuda.get_device_name()}')
    print('Enable GPU in GUI settings for faster processing')
else:
    print('No GPU available, using CPU')
"

# For very slow systems, disable real-time transcription
# Edit config/settings.json: "enable_transcription": false

🔍 Debug Mode & Logging

Enable Detailed Logging

# Check log files for errors
tail -f logs/voice_recorder_debug.log
tail -f logs/audio_processor.log

# Look for specific error patterns
grep -i error logs/*.log
grep -i failed logs/*.log
grep -i exception logs/*.log

Manual Component Testing

# Test individual components

# 1. Test audio recording
python -c "
from src.dictationer.audio import AudioRecorder
recorder = AudioRecorder('test.wav')
print('Press Enter to start recording...')
input()
recorder.start_recording()
input('Press Enter to stop...')
recorder.stop_recording()
print('Check test.wav file')
"

# 2. Test transcription
python -c "
from src.dictationer.processor import AudioProcessor
processor = AudioProcessor()
result = processor.transcribe_file('test.wav')
print(f'Result: {result}')
"

# 3. Test clipboard pasting
python -c "
from src.dictationer.paster import ClipboardPaster
paster = ClipboardPaster()
success = paster.paste_text('Test message')
print(f'Paste success: {success}')
"

📞 Getting Help

If you're still experiencing issues:

Check log files in the logs/ directory for detailed error messages
Run GUI launcher scripts (start.bat/start.sh) instead of calling Python directly
Verify virtual environment is activated and all dependencies are installed
Test components individually using the manual testing scripts above
Check system requirements (microphone access, admin permissions, etc.)

For additional support:

📖 Read: docs/PLANNING.md for architecture details
🐛 Report bugs: Create an issue with log files and error messages
💡 Feature requests: Describe your use case and requirements

🔍 Debug Mode

Enable Detailed Logging

import logging
logging.basicConfig(level=logging.DEBUG)

# Or set environment variable
export DICTATIONER_LOG_LEVEL=DEBUG

Log Analysis

# Real-time log monitoring
tail -f logs/voice_recorder_debug.log

# Search for specific issues
grep ERROR logs/*.log
grep "CRITICAL\|FATAL" logs/*.log

Performance Debugging

# Enable performance profiling
controller = RecordingController(profile_performance=True)

# Check system resources
import psutil
print(f"CPU: {psutil.cpu_percent()}%, Memory: {psutil.virtual_memory().percent}%")

🤝 Contributing

We welcome contributions! Here's how to get started:

🌟 Ways to Contribute

🐛 Bug Reports: Report issues with detailed reproduction steps
💡 Feature Requests: Suggest new features or improvements
📖 Documentation: Improve docs, tutorials, or examples
🧪 Testing: Add tests or improve test coverage
💻 Code: Fix bugs or implement new features

📋 Contribution Guidelines

Fork & Clone

git clone https://github.com/yourusername/dictationer.git
cd dictationer

Setup Development Environment

python -m venv venv_linux
source venv_linux/bin/activate
pip install -e .[dev]

Create Feature Branch
```
git checkout -b feature/amazing-feature
```
Make Changes
- Follow the coding standards in docs/PLANNING.md
- Add tests for new functionality
- Update documentation as needed

Quality Checks

black src/ tests/          # Format code
mypy src/                  # Type checking
pytest --cov=src/dictationer  # Run tests

Submit Pull Request
- Include clear description of changes
- Reference any related issues
- Ensure all checks pass

📝 Code of Conduct

Be respectful and inclusive
Provide constructive feedback
Help others learn and grow

📄 License

MIT License - see LICENSE file for details.

📚 Documentation

🏗️ Architecture & PRD - System design, requirements, and architecture overview
📋 Task Management - Project roadmap and tasks
📖 API Reference - Detailed API documentation
🛠️ Development Guide - Setup, standards, testing, and debugging

🆕 Changelog

Version 1.1.0 (2025-07-19) - GUI Release

🖥️ PySide6 GUI Interface: Complete graphical user interface with modern design
🎛️ Settings Management: GUI-based configuration with device detection
📦 Model Download Manager: Download any HuggingFace Whisper model with progress tracking
🔧 Configuration System: JSON-based settings with automatic GPU detection
🚀 Launcher Scripts: Cross-platform GUI launchers with proper virtual environment handling
🎮 Program Control: Start/Stop main program from GUI with real-time status
📊 Live Log Display: Real-time log output with scrolling and filtering
📁 Model Cache Access: Direct access to model storage folder
🛠️ Simplified Architecture: Removed dual processing paths for better reliability
🔍 Enhanced Debugging: Comprehensive logging and error handling
📚 Complete Documentation: Updated docs with troubleshooting guide

Version 1.0.0 (2025-07-19) - Initial Release

✨ Initial Release: Complete voice recording and transcription system
🏗️ Modular Architecture: Clean separation of concerns with threaded design
🎵 High-Quality Audio: 16-bit WAV recording with PyAudio integration
🤖 Advanced Transcription: Faster-Whisper integration with multiple model sizes
📋 Smart Text Pasting: Intelligent clipboard management and automation
⌨️ Global Hotkeys: Configurable keyboard shortcuts for system control
📊 Comprehensive Logging: Detailed monitoring and debugging capabilities
🛡️ Thread Safety: Robust concurrent operation handling
📦 Professional Packaging: Proper Python package structure and installation

Version 1.1.0 (2025-07-20) - AI Text Reformatter Integration

🔧 AI Text Reformatter: Complete integration of Google Gemini AI for intelligent text enhancement
🎛️ Reformatter Settings: New Settings UI section with enable/disable and hold duration controls
🤖 Grammar Fix Mode: Automatically corrects spelling, grammar, and punctuation in selected text
⚡ Background Service: Independent daemon thread operation alongside main dictation system
🖱️ Ctrl+Hold Trigger: Configurable hold duration (1-5 seconds) for reformatting selected text
📊 Enhanced Status Display: Combined status showing both Recording System and Reformatter service states
🛡️ Professional Error Handling: Comprehensive exception handling with informative GUI dialogs for missing dependencies
🔄 Smart Lifecycle Management: Automatic service startup/shutdown with clean thread management
⚙️ Configuration Integration: Full persistence of reformatter settings in JSON config file
📋 Clipboard Integration: Seamless text copying, reformatting, and pasting workflow
🔒 Hotkey Monitoring Lock: Prevents multiple simultaneous reformatting operations
🎨 Status Window Integration: Visual feedback during reformatting process with API-synced timing

Upcoming in 1.2.0

🧪 Unit Testing: Comprehensive test suite with 90% coverage
🎨 System Tray: Background operation with system tray controls
🌐 Cross-Platform: Enhanced macOS and Linux support
🔊 Audio Formats: Support for MP3, FLAC, and other audio formats
🔄 Dynamic Settings Restart: Restart reformatter service when settings change without GUI restart

Made with ❤️ for the voice recording community

⭐ Star this repo • 🐛 Report Bug • 💡 Request Feature

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.claude		.claude
docs		docs
example_docs		example_docs
src/dictationer		src/dictationer
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
gui_main.py		gui_main.py
logo.png		logo.png
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
sample copy.env		sample copy.env
setup.py		setup.py
start.bat		start.bat
start.sh		start.sh

skibsthebear/Dictationator-Fastest-whisper-Transcriber

Folders and files

Latest commit

History

Repository files navigation

🎤 Dictationer - Advanced Voice Recording & Transcription System

✨ Key Features

🚀 Core Functionality

⚡ Advanced Capabilities

📦 Installation

Prerequisites

Quick Start (Recommended)

GPU Support (Optional but Recommended)

⌨️ Setting Up Your Hotkey (Important!)

Quick Setup Steps

✅ Valid Examples

❌ Common Mistakes

🧪 Testing Your Hotkey

🔧 System Requirements

Core Dependencies

Development Dependencies

🖥️ Platform-Specific Setup

Windows

Linux (Ubuntu/Debian)

macOS

🚀 Usage

🖥️ GUI Interface (Recommended)

Quick Start

GUI Features

🔧 AI Text Reformatter Usage

Advanced Configuration

🤝 Contributions

🌟 High-Impact Contribution Ideas

⌨️ Hotkey Recorder Widget

🍎 macOS Support & Testing

🌐 OpenAI Whisper API Integration

🔌 Plugin Architecture

🎨 Advanced UI Features

🌍 Multi-Language Support

🔧 Advanced Integrations

📋 How to Contribute

💡 Contribution Guidelines

💖 Donations

🎁 Ways to Support

🌟 Other Ways to Help

💌 A Personal Note

🐍 Programmatic API

Basic Recording

Advanced Configuration

Component-Level Access

📁 Project Architecture

🧩 Module Overview

⚙️ Configuration

📋 Default Settings

🎛️ Customization Options

Environment Variables

Programmatic Configuration

Configuration File (Future)

⌨️ Hotkey Configuration

Required Format Rules

Supported Keys

Hotkey Examples

Format Validation Tips

Platform Considerations

Troubleshooting Invalid Hotkeys

📊 Logging & Monitoring

Log Structure

Log Format

Monitoring Features

🛠️ Development

🚀 Quick Development Setup

🧪 Testing

🎨 Code Quality

📋 Development Workflow

🔧 Troubleshooting

🚨 Common Issues & Solutions

Virtual Environment Issues

GPU Detection Problems

Model Download Failures

Audio Recording Issues

Packages