Voice-powered development workflows - Push-to-talk β Whisper transcription β AI enhancement β Instant feature specifications
Transform voice into detailed development specifications in seconds.
Press F12 β Speak your feature brief β Release β Get enhanced prompt with objectives, risks, acceptance criteria, and more.
lazy-ptt-enhancer is a globally-installable voice-to-prompt toolkit that:
- Captures your voice via push-to-talk (F12 default)
- Transcribes locally with GPU-accelerated Whisper (offline capable)
- Enhances with AI using OpenAI into structured specifications
- Saves to your workspace - Prompts appear directly in
project-management/prompts/ - Works everywhere - Install once, use in any project directory
No copy-paste. No context switching. Just speak and code.
pip install lazy-ptt-enhancercd ~/my-awesome-project
lazy-ptt initThis will:
- β Check dependencies (Python, audio devices, etc.)
- β
Create
project-management/prompts/directory - β
Generate
.envconfiguration template - β Download Whisper model (optional)
# Edit .env file
OPENAI_API_KEY=sk-your-actual-api-keylazy-ptt daemon --verbose-cycle- Press F12
- Say: "Add user authentication with OAuth2 and session management"
- Release F12
Result: Enhanced prompt saved to ./project-management/prompts/PROMPT-{timestamp}.md
# FEATURE Plan
**Summary**: Add user authentication with OAuth2 and session management
## Objectives
- Implement OAuth2 authentication flow
- Add JWT-based session management
- Create user profile management
## Acceptance Criteria
- [ ] Users can sign in with Google/GitHub
- [ ] Sessions persist across browser restarts
- [ ] Users can view and edit their profile
---
π€ Generated with lazy-ptt-enhancer by @therouxe- β Global installation - Install once with pip, use anywhere
- β
Per-project initialization -
lazy-ptt initin any directory - β Push-to-talk audio capture - F12 (configurable via CLI)
- β Local Whisper transcription - GPU-accelerated, offline capable
- β AI prompt enhancement - Structured output with objectives, risks, criteria
- β
Workspace-aware storage - Saves to current directory's
project-management/prompts/ - β
Auto-move by default - No staging folder (configurable via
--no-auto-move) - β Always-on daemon mode - Background process for any project
- β Claude Code integration - Designed for plugin compatibility
- β Branded output - Attribution to @therouxe in all generated prompts
- β‘ GPU acceleration - CUDA support for faster transcription
- π Multi-language - Transcribe in English, Spanish, French, German, etc.
- ποΈ Fully configurable - Environment variables, CLI flags, or YAML config
- π Privacy-first - Whisper runs locally, only enhancement hits API
- π Metadata tracking - JSON metadata alongside each prompt
- π REST API - FastAPI server for non-Python clients
- ποΈ Device selection - Choose your microphone with
lazy-ptt devices
- Python 3.9+ (3.11+ recommended)
- PortAudio (for audio capture)
- macOS:
brew install portaudio - Debian/Ubuntu:
sudo apt-get install libportaudio2 - Windows: Included with pip packages
- macOS:
- CUDA Toolkit (optional, for GPU acceleration)
- OpenAI API Key (for prompt enhancement)
# Using pip (recommended)
pip install lazy-ptt-enhancer
# Or using uv (faster)
uv pip install lazy-ptt-enhancer
# Verify installation
lazy-ptt --helpcd ~/my-project
lazy-ptt init
# This creates:
# - project-management/prompts/ directory
# - .lazy-ptt/staging/ directory
# - .env configuration template
# - Downloads Whisper model (optional)Edit the generated .env file:
# REQUIRED
OPENAI_API_KEY=sk-your-key
# OPTIONAL (defaults shown)
WHISPER_MODEL_SIZE=medium
WHISPER_DEVICE=auto
PTT_HOTKEY=<f12>Run once per work session:
lazy-ptt daemon --verbose-cycleThen press F12 anytime to capture voice input in ANY directory.
Output:
π€ Daemon started. Press <f12> to capture voice anytime.
Auto-move: β
ENABLED (saves to project-management)
Working directory: /home/user/my-project
[β
project-management] Prompt: ./project-management/prompts/PROMPT-20251030.md (FEATURE)
Tip: The daemon works across all projects. Change directories and press F12 - prompts save to the new directory's project-management/.
Capture one voice input and exit:
lazy-ptt listenPress F12, speak, release F12.
Output:
Push-to-talk active. Hold the configured hotkey, speak, and release to process.
Prompt saved to: ./project-management/prompts/PROMPT-20251030-143022.md
β
Prompt saved to project-management workspace (auto-move enabled)
Detected work type: FEATURE
Summary: Add payment processing with Stripe integration
Disable auto-move (keep in staging):
lazy-ptt listen --no-auto-moveHave a text brief already? Enhance it directly:
lazy-ptt enhance-text --text "Add payment processing with Stripe"Or from a file:
lazy-ptt enhance-text --file brief.txtAlready have a recording?
lazy-ptt process-audio recording.wavSupports: .wav, .mp3, .flac, .ogg
All settings configurable via CLI:
# Disable auto-move (keep in staging)
lazy-ptt listen --no-auto-move
# Custom story ID
lazy-ptt listen --story-id US-3.4 --story-title "User Authentication"
# Verbose logging
lazy-ptt daemon --verbose-cycle# Required
export OPENAI_API_KEY=sk-...
# Optional (defaults shown)
export WHISPER_MODEL_SIZE=medium # tiny, base, small, medium, large
export WHISPER_DEVICE=auto # auto, cpu, cuda
export PTT_HOTKEY="<f12>"
export PROJECT_MANAGEMENT_ROOT=./project-management
export PTT_OUTPUT_ROOT=./project-management/promptsCreate .lazy-ptt.yaml in project root (optional):
openai:
api_key: ${OPENAI_API_KEY} # Reference env vars
model: gpt-4
temperature: 0.7
whisper:
model_size: medium
language: en
device: auto
ptt:
hotkey: "<f12>"
output_root: project-management/prompts
paths:
project_management_root: ./project-management| Command | Description |
|---|---|
lazy-ptt init |
Initialize lazy-ptt in current directory |
lazy-ptt listen |
Capture single voice input |
lazy-ptt enhance-text |
Enhance text brief (no voice) |
lazy-ptt process-audio |
Transcribe + enhance audio file |
lazy-ptt daemon |
Run always-on background listener |
lazy-ptt devices |
List available microphones |
lazy-ptt --help |
Show help message |
--no-auto-move # Keep in staging (auto-move is DEFAULT)
--story-id ID # Override story ID (default: auto-generate)
--story-title "Title" # Add story title metadata
--verbose # Enable verbose logging
--verbose-cycle # Log each daemon capture cycle
--no-download # Skip Whisper model download (init only)# Initialize in new project
cd ~/new-project
lazy-ptt init
# List available microphones
lazy-ptt devices
# Start daemon with verbose output
lazy-ptt daemon --verbose-cycle
# Capture voice with metadata
lazy-ptt listen --story-id US-3.4 --story-title "User Authentication"
# Enhance text brief
lazy-ptt enhance-text --text "Fix login timeout bug"
# Process pre-recorded audio
lazy-ptt process-audio demo.wav
# Keep prompt in staging (disable auto-move)
lazy-ptt listen --no-auto-moveTerminal 1 (run once per session):
lazy-ptt daemon --verbose-cycleTerminal 2 (use Claude Code):
cd ~/my-project
claude-code
# Voice workflow:
# 1. Press F12 anywhere, say "Add OAuth2 authentication"
# 2. Release F12
# 3. Prompt auto-saved to ./project-management/prompts/
# 4. In Claude Code: /lazy create-feature project-management/prompts/PROMPT-{timestamp}.mdAdd to your plugin's .claude/commands/voice.md:
# /voice - Capture voice input
## Implementation
```bash
lazy-ptt listen --verbose
# Get the last prompt path
PROMPT_PATH=$(ls -t project-management/prompts/PROMPT-*.md | head -1)
echo "β
Prompt saved to: $PROMPT_PATH"
echo "Next: /lazy create-feature $PROMPT_PATH"
Usage in Claude Code:
```bash
/voice
# β Press F12, speak
# β Prompt auto-saved
# β Follow suggested command to create feature
Run daemon as systemd service (Linux):
# Copy service file
sudo cp ops/systemd/lazy-ptt-daemon.service /etc/systemd/system/
# Edit paths and environment
sudo nano /etc/systemd/system/lazy-ptt-daemon.service
# Enable and start
sudo systemctl enable lazy-ptt-daemon
sudo systemctl start lazy-ptt-daemon
sudo systemctl status lazy-ptt-daemonOr launchd (macOS):
# Copy plist
cp ops/launchd/io.lazy.ptt.daemon.plist ~/Library/LaunchAgents/
# Edit paths
nano ~/Library/LaunchAgents/io.lazy.ptt.daemon.plist
# Load and start
launchctl load ~/Library/LaunchAgents/io.lazy.ptt.daemon.plist
launchctl start io.lazy.ptt.daemonSee CLAUDE_CODE_INTEGRATION.md for complete integration guide.
Run the API server:
lazy-ptt-api # Serves on http://127.0.0.1:8000# Enhance text
curl -X POST http://127.0.0.1:8000/enhance-text \
-H 'Content-Type: application/json' \
-d '{"text":"Add OAuth2 authentication"}' | jq .
# Process audio file
curl -X POST http://127.0.0.1:8000/process-audio \
-F '[email protected]' | jq .
# Trigger PTT capture (requires active desktop session)
curl -X POST http://127.0.0.1:8000/listen-once | jq .Solution:
pip install lazy-ptt-enhancer
which lazy-ptt # Verify installationSolution:
# List available audio devices
lazy-ptt devices
# Select device by index
PTT_INPUT_DEVICE_INDEX=1 lazy-ptt listenSolution:
# Set in environment
export OPENAI_API_KEY=sk-...
# Or create .env file
echo "OPENAI_API_KEY=sk-..." > .envSolution:
# Install faster-whisper
pip install faster-whisper
# Or skip download during init
lazy-ptt init --no-downloadSolution:
# Use smaller Whisper model
export WHISPER_MODEL_SIZE=small # or base, tiny
# Or force CPU mode
export WHISPER_DEVICE=cpuSolution:
# Check working directory
pwd
# Auto-move is DEFAULT, but verify:
lazy-ptt daemon --verbose-cycle # Should show "Auto-move: β
ENABLED"
# If needed, re-initialize
lazy-ptt init- README.md (this file) - User guide and quick start
- CLAUDE_CODE_INTEGRATION.md - Plugin integration patterns
- DEV_SPEC.md - Development specification and roadmap
- PROJECT_STATUS.md - Current implementation status
- examples/EXAMPLE_OUTPUT.md - Sample branded output
- docs/TROUBLESHOOTING.md - Detailed troubleshooting
- β Global pip install workflow
- β
Per-project initialization (
lazy-ptt init) - β Auto-move by default (configurable)
- β Push-to-talk audio capture
- β Local Whisper transcription
- β AI prompt enhancement
- β Always-on daemon mode
- β REST API server
- β Branding footer
- β systemd/launchd service configs
- Local LLM support (Ollama, llama.cpp)
- Custom enhancement profiles (security, marketing, etc.)
- Profile hot-reload
- Multi-language transcription (auto-detect)
- Multi-language enhancement (French, Spanish, German, etc.)
- Qt/Electron desktop app
- Live audio levels + transcription preview
- Session history browser
- Visual configuration editor
Contributions welcome! See CONTRIBUTING.md for guidelines.
git clone https://github.com/MacroMan5/STT-Devellopement-Prompt-Enhancer.git
cd STT-Devellopement-Prompt-Enhancer
python -m venv .venv
source .venv/bin/activate
pip install -e ".[api,ui,stt]"
pytest tests/- Formatter: Black (line length 100)
- Linter: Ruff
- Type Checker: Mypy (planned)
- Docstrings: Google style
MIT License - See LICENSE for details.
Copyright (c) 2025 @therouxe
- OpenAI Whisper - Fast, accurate speech recognition
- faster-whisper - GPU-accelerated Whisper implementation
- OpenAI API - Powerful prompt enhancement
- Claude Code - AI-assisted development workflows
- GitHub Issues: Report bugs or request features
- Documentation: Complete guides
- Twitter/X: @therouxe
lazy-ptt-enhancer - Voice-powered development workflows Created by @therouxe
β Star on GitHub | π Documentation | π Report Issues