Skip to content

MacroMan5/STT-Devellopement-Prompt-Enhancer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

39 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

lazy-ptt-enhancer

Voice-powered development workflows - Push-to-talk β†’ Whisper transcription β†’ AI enhancement β†’ Instant feature specifications

PyPI version Python 3.9+ License: MIT Code style: black

Transform voice into detailed development specifications in seconds.

Press F12 β†’ Speak your feature brief β†’ Release β†’ Get enhanced prompt with objectives, risks, acceptance criteria, and more.


🎯 What is This?

lazy-ptt-enhancer is a globally-installable voice-to-prompt toolkit that:

  1. Captures your voice via push-to-talk (F12 default)
  2. Transcribes locally with GPU-accelerated Whisper (offline capable)
  3. Enhances with AI using OpenAI into structured specifications
  4. Saves to your workspace - Prompts appear directly in project-management/prompts/
  5. Works everywhere - Install once, use in any project directory

No copy-paste. No context switching. Just speak and code.


⚑ Quick Start (5 Minutes)

1. Install Globally

pip install lazy-ptt-enhancer

2. Initialize in Your Project

cd ~/my-awesome-project
lazy-ptt init

This will:

  • βœ… Check dependencies (Python, audio devices, etc.)
  • βœ… Create project-management/prompts/ directory
  • βœ… Generate .env configuration template
  • βœ… Download Whisper model (optional)

3. Configure API Key

# Edit .env file
OPENAI_API_KEY=sk-your-actual-api-key

4. Start Daemon (Always-On Mode)

lazy-ptt daemon --verbose-cycle

5. Use Voice Input Anytime

  • Press F12
  • Say: "Add user authentication with OAuth2 and session management"
  • Release F12

Result: Enhanced prompt saved to ./project-management/prompts/PROMPT-{timestamp}.md

# FEATURE Plan

**Summary**: Add user authentication with OAuth2 and session management

## Objectives
- Implement OAuth2 authentication flow
- Add JWT-based session management
- Create user profile management

## Acceptance Criteria
- [ ] Users can sign in with Google/GitHub
- [ ] Sessions persist across browser restarts
- [ ] Users can view and edit their profile

---

🎀 Generated with lazy-ptt-enhancer by @therouxe

πŸš€ Features

Core Features

  • βœ… Global installation - Install once with pip, use anywhere
  • βœ… Per-project initialization - lazy-ptt init in any directory
  • βœ… Push-to-talk audio capture - F12 (configurable via CLI)
  • βœ… Local Whisper transcription - GPU-accelerated, offline capable
  • βœ… AI prompt enhancement - Structured output with objectives, risks, criteria
  • βœ… Workspace-aware storage - Saves to current directory's project-management/prompts/
  • βœ… Auto-move by default - No staging folder (configurable via --no-auto-move)
  • βœ… Always-on daemon mode - Background process for any project
  • βœ… Claude Code integration - Designed for plugin compatibility
  • βœ… Branded output - Attribution to @therouxe in all generated prompts

Advanced Features

  • ⚑ GPU acceleration - CUDA support for faster transcription
  • 🌐 Multi-language - Transcribe in English, Spanish, French, German, etc.
  • πŸŽ›οΈ Fully configurable - Environment variables, CLI flags, or YAML config
  • πŸ”’ Privacy-first - Whisper runs locally, only enhancement hits API
  • πŸ“Š Metadata tracking - JSON metadata alongside each prompt
  • πŸ”Œ REST API - FastAPI server for non-Python clients
  • 🎚️ Device selection - Choose your microphone with lazy-ptt devices

πŸ“¦ Installation

Prerequisites

  • Python 3.9+ (3.11+ recommended)
  • PortAudio (for audio capture)
    • macOS: brew install portaudio
    • Debian/Ubuntu: sudo apt-get install libportaudio2
    • Windows: Included with pip packages
  • CUDA Toolkit (optional, for GPU acceleration)
  • OpenAI API Key (for prompt enhancement)

Install Package

# Using pip (recommended)
pip install lazy-ptt-enhancer

# Or using uv (faster)
uv pip install lazy-ptt-enhancer

# Verify installation
lazy-ptt --help

First-Time Setup in a Project

cd ~/my-project
lazy-ptt init

# This creates:
# - project-management/prompts/ directory
# - .lazy-ptt/staging/ directory
# - .env configuration template
# - Downloads Whisper model (optional)

Configure Environment

Edit the generated .env file:

# REQUIRED
OPENAI_API_KEY=sk-your-key

# OPTIONAL (defaults shown)
WHISPER_MODEL_SIZE=medium
WHISPER_DEVICE=auto
PTT_HOTKEY=<f12>

🎀 Usage

Mode 1: Always-On Daemon (Recommended)

Run once per work session:

lazy-ptt daemon --verbose-cycle

Then press F12 anytime to capture voice input in ANY directory.

Output:

🎀 Daemon started. Press <f12> to capture voice anytime.
Auto-move: βœ… ENABLED (saves to project-management)
Working directory: /home/user/my-project

[βœ… project-management] Prompt: ./project-management/prompts/PROMPT-20251030.md (FEATURE)

Tip: The daemon works across all projects. Change directories and press F12 - prompts save to the new directory's project-management/.


Mode 2: Single Voice Capture

Capture one voice input and exit:

lazy-ptt listen

Press F12, speak, release F12.

Output:

Push-to-talk active. Hold the configured hotkey, speak, and release to process.
Prompt saved to: ./project-management/prompts/PROMPT-20251030-143022.md
βœ… Prompt saved to project-management workspace (auto-move enabled)
Detected work type: FEATURE
Summary: Add payment processing with Stripe integration

Disable auto-move (keep in staging):

lazy-ptt listen --no-auto-move

Mode 3: Enhance Text Brief (No Voice)

Have a text brief already? Enhance it directly:

lazy-ptt enhance-text --text "Add payment processing with Stripe"

Or from a file:

lazy-ptt enhance-text --file brief.txt

Mode 4: Process Existing Audio File

Already have a recording?

lazy-ptt process-audio recording.wav

Supports: .wav, .mp3, .flac, .ogg


πŸ”§ Configuration

CLI Flags (Highest Priority)

All settings configurable via CLI:

# Disable auto-move (keep in staging)
lazy-ptt listen --no-auto-move

# Custom story ID
lazy-ptt listen --story-id US-3.4 --story-title "User Authentication"

# Verbose logging
lazy-ptt daemon --verbose-cycle

Environment Variables (Medium Priority)

# Required
export OPENAI_API_KEY=sk-...

# Optional (defaults shown)
export WHISPER_MODEL_SIZE=medium  # tiny, base, small, medium, large
export WHISPER_DEVICE=auto        # auto, cpu, cuda
export PTT_HOTKEY="<f12>"
export PROJECT_MANAGEMENT_ROOT=./project-management
export PTT_OUTPUT_ROOT=./project-management/prompts

YAML Config (Lowest Priority)

Create .lazy-ptt.yaml in project root (optional):

openai:
  api_key: ${OPENAI_API_KEY}  # Reference env vars
  model: gpt-4
  temperature: 0.7

whisper:
  model_size: medium
  language: en
  device: auto

ptt:
  hotkey: "<f12>"
  output_root: project-management/prompts

paths:
  project_management_root: ./project-management

πŸŽ›οΈ CLI Reference

Commands

Command Description
lazy-ptt init Initialize lazy-ptt in current directory
lazy-ptt listen Capture single voice input
lazy-ptt enhance-text Enhance text brief (no voice)
lazy-ptt process-audio Transcribe + enhance audio file
lazy-ptt daemon Run always-on background listener
lazy-ptt devices List available microphones
lazy-ptt --help Show help message

Common Flags

--no-auto-move           # Keep in staging (auto-move is DEFAULT)
--story-id ID            # Override story ID (default: auto-generate)
--story-title "Title"    # Add story title metadata
--verbose                # Enable verbose logging
--verbose-cycle          # Log each daemon capture cycle
--no-download            # Skip Whisper model download (init only)

Examples

# Initialize in new project
cd ~/new-project
lazy-ptt init

# List available microphones
lazy-ptt devices

# Start daemon with verbose output
lazy-ptt daemon --verbose-cycle

# Capture voice with metadata
lazy-ptt listen --story-id US-3.4 --story-title "User Authentication"

# Enhance text brief
lazy-ptt enhance-text --text "Fix login timeout bug"

# Process pre-recorded audio
lazy-ptt process-audio demo.wav

# Keep prompt in staging (disable auto-move)
lazy-ptt listen --no-auto-move

πŸ”Œ Claude Code Integration

Pattern 1: Standalone Daemon (Simplest)

Terminal 1 (run once per session):

lazy-ptt daemon --verbose-cycle

Terminal 2 (use Claude Code):

cd ~/my-project
claude-code

# Voice workflow:
# 1. Press F12 anywhere, say "Add OAuth2 authentication"
# 2. Release F12
# 3. Prompt auto-saved to ./project-management/prompts/
# 4. In Claude Code: /lazy create-feature project-management/prompts/PROMPT-{timestamp}.md

Pattern 2: Plugin Command

Add to your plugin's .claude/commands/voice.md:

# /voice - Capture voice input

## Implementation

```bash
lazy-ptt listen --verbose

# Get the last prompt path
PROMPT_PATH=$(ls -t project-management/prompts/PROMPT-*.md | head -1)

echo "βœ… Prompt saved to: $PROMPT_PATH"
echo "Next: /lazy create-feature $PROMPT_PATH"

Usage in Claude Code:
```bash
/voice
# β†’ Press F12, speak
# β†’ Prompt auto-saved
# β†’ Follow suggested command to create feature

Pattern 3: Background Daemon (Production)

Run daemon as systemd service (Linux):

# Copy service file
sudo cp ops/systemd/lazy-ptt-daemon.service /etc/systemd/system/

# Edit paths and environment
sudo nano /etc/systemd/system/lazy-ptt-daemon.service

# Enable and start
sudo systemctl enable lazy-ptt-daemon
sudo systemctl start lazy-ptt-daemon
sudo systemctl status lazy-ptt-daemon

Or launchd (macOS):

# Copy plist
cp ops/launchd/io.lazy.ptt.daemon.plist ~/Library/LaunchAgents/

# Edit paths
nano ~/Library/LaunchAgents/io.lazy.ptt.daemon.plist

# Load and start
launchctl load ~/Library/LaunchAgents/io.lazy.ptt.daemon.plist
launchctl start io.lazy.ptt.daemon

See CLAUDE_CODE_INTEGRATION.md for complete integration guide.


🌐 REST API (Optional)

Run the API server:

lazy-ptt-api  # Serves on http://127.0.0.1:8000

Endpoints

# Enhance text
curl -X POST http://127.0.0.1:8000/enhance-text \
  -H 'Content-Type: application/json' \
  -d '{"text":"Add OAuth2 authentication"}' | jq .

# Process audio file
curl -X POST http://127.0.0.1:8000/process-audio \
  -F '[email protected]' | jq .

# Trigger PTT capture (requires active desktop session)
curl -X POST http://127.0.0.1:8000/listen-once | jq .

πŸ› οΈ Troubleshooting

Issue: lazy-ptt command not found

Solution:

pip install lazy-ptt-enhancer
which lazy-ptt  # Verify installation

Issue: No audio input detected

Solution:

# List available audio devices
lazy-ptt devices

# Select device by index
PTT_INPUT_DEVICE_INDEX=1 lazy-ptt listen

Issue: OpenAI API key not found

Solution:

# Set in environment
export OPENAI_API_KEY=sk-...

# Or create .env file
echo "OPENAI_API_KEY=sk-..." > .env

Issue: Whisper model download fails

Solution:

# Install faster-whisper
pip install faster-whisper

# Or skip download during init
lazy-ptt init --no-download

Issue: CUDA out of memory

Solution:

# Use smaller Whisper model
export WHISPER_MODEL_SIZE=small  # or base, tiny

# Or force CPU mode
export WHISPER_DEVICE=cpu

Issue: Prompts not saving to project-management

Solution:

# Check working directory
pwd

# Auto-move is DEFAULT, but verify:
lazy-ptt daemon --verbose-cycle  # Should show "Auto-move: βœ… ENABLED"

# If needed, re-initialize
lazy-ptt init

πŸ“– Documentation


🚦 Roadmap

v1.0.0 (Q1 2025) - Production Release βœ…

  • βœ… Global pip install workflow
  • βœ… Per-project initialization (lazy-ptt init)
  • βœ… Auto-move by default (configurable)
  • βœ… Push-to-talk audio capture
  • βœ… Local Whisper transcription
  • βœ… AI prompt enhancement
  • βœ… Always-on daemon mode
  • βœ… REST API server
  • βœ… Branding footer
  • βœ… systemd/launchd service configs

v1.1.0 (Q2 2025) - Local Models

  • Local LLM support (Ollama, llama.cpp)
  • Custom enhancement profiles (security, marketing, etc.)
  • Profile hot-reload

v1.2.0 (Q2 2025) - Multi-Language

  • Multi-language transcription (auto-detect)
  • Multi-language enhancement (French, Spanish, German, etc.)

v2.0.0 (Q3 2025) - Desktop UI

  • Qt/Electron desktop app
  • Live audio levels + transcription preview
  • Session history browser
  • Visual configuration editor

🀝 Contributing

Contributions welcome! See CONTRIBUTING.md for guidelines.

Development Setup

git clone https://github.com/MacroMan5/STT-Devellopement-Prompt-Enhancer.git
cd STT-Devellopement-Prompt-Enhancer
python -m venv .venv
source .venv/bin/activate
pip install -e ".[api,ui,stt]"
pytest tests/

Code Style

  • Formatter: Black (line length 100)
  • Linter: Ruff
  • Type Checker: Mypy (planned)
  • Docstrings: Google style

πŸ“„ License

MIT License - See LICENSE for details.

Copyright (c) 2025 @therouxe


πŸ™ Acknowledgments

  • OpenAI Whisper - Fast, accurate speech recognition
  • faster-whisper - GPU-accelerated Whisper implementation
  • OpenAI API - Powerful prompt enhancement
  • Claude Code - AI-assisted development workflows

πŸ“ž Support


lazy-ptt-enhancer - Voice-powered development workflows Created by @therouxe

⭐ Star on GitHub | πŸ“– Documentation | πŸ› Report Issues

About

A usefull tools for dev that are tired of typing prompt for their coding agent ! Still in devellopement

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •