RayWhisper2

A cross-platform (Windows/macOS) voice-to-text tool with RAG-enhanced transcription using state-of-the-art speech recognition, vector database embeddings, and keyboard-driven interaction.

Features

🎤 High-Quality Speech Recognition: Uses faster-whisper with support for large-v3 and distil-large-v3 models
🧠 RAG-Enhanced Transcription: Retrieves relevant context from your documents to improve transcription accuracy
⌨️ Keyboard-Driven: Global hotkeys for hands-free operation
🚀 Fast & Efficient: Optimized with INT8/FP16 quantization for GPU acceleration
🔄 Cross-Platform: Works on both Windows and macOS (not tested on macOS yet, but Claude promises it works 😄)
📝 Smart Output: Types transcribed text directly into any application

Architecture

RayWhisper2 follows Clean Architecture principles with clear separation of concerns:

Domain Layer: Core business logic (entities, value objects, interfaces)
Application Layer: Use cases and application services
Infrastructure Layer: External implementations (Whisper, ChromaDB, audio, keyboard)
Presentation Layer: CLI and user interface

Installation

Prerequisites

Python 3.11 or higher
For GPU acceleration:
- NVIDIA GPU with CUDA 12 and cuDNN 9 support
- If you don't have a compatible NVIDIA GPU, you must use CPU mode (see Configuration below)

Install from source

# Clone the repository
git clone https://github.com/Fredrik-C/RayWhisper2.git
cd RayWhisper2

# Create a virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -e ".[dev]"

Configuration

Copy the example configuration files:

cp .env.example .env
cp config/config.example.yaml config/config.yaml

Edit .env or config/config.yaml to customize settings:

Whisper Model: Default is Systran/faster-whisper-medium.en (or use: tiny, base, small, medium, large-v2, large-v3, distil-large-v3)
Device: Default is cuda (change to cpu if you don't have CUDA 12/cuDNN 9)
Compute Type: Default is float16 (change to int8 for CPU mode)
Embedding Model: Default is BAAI/bge-base-en-v1.5
Hotkeys: Default is super+o (Windows key + comma; customize as needed)

⚠️ Important - CUDA 12 and cuDNN 9 Required for GPU Mode:

The default configuration uses GPU mode (device: "cuda")
GPU mode requires CUDA 12 and cuDNN 9 with a compatible NVIDIA GPU
If you don't have CUDA 12/cuDNN 9, you must change to CPU configuration:

whisper:
  model_size: "base"      # or "small" for better accuracy
  device: "cpu"           # REQUIRED if no CUDA 12/cuDNN 9
  compute_type: "int8"    # int8 is fastest on CPU

Usage

Populate the Vector Database

Before using RAG-enhanced transcription, populate the vector database with your documents:

raywhisper populate ./docs2ingest --clear

This will parse and embed all Markdown files from the specified directories.

Run the Application

raywhisper run

Hold your configured hotkey (default: super+o) to start recording and release to stop and transcribe. The transcribed text will be typed into the active application.

Note: On some platforms, the hotkey LED or status may provide visual feedback; otherwise check logs to verify recording state.

Development

Run Tests

# Run all tests
pytest

# Run with coverage
pytest --cov=raywhisper --cov-report=html

# Run specific test
pytest tests/unit/domain/test_transcription.py -v

Code Quality

# Linting
ruff check src/

# Type checking
mypy src/

# Format code
ruff format src/

Pre-commit Hooks

pre-commit install
pre-commit run --all-files

Project Structure

raywhisper2/
├── src/raywhisper/          # Source code
│   ├── domain/              # Domain layer
│   ├── application/         # Application layer
│   ├── infrastructure/      # Infrastructure layer
│   ├── presentation/        # Presentation layer
│   └── config/              # Configuration
├── tests/                   # Tests
├── config/                  # Configuration files
├── scripts/                 # Utility scripts
└── docs2ingest/                    # Documentation

Technology Stack

Speech-to-Text: faster-whisper (CTranslate2)
Vector Database: ChromaDB
Embeddings: BAAI/bge-small-en-v1.5 (sentence-transformers)
Reranking: BAAI/bge-reranker-v2-m3
Audio: sounddevice
Keyboard: pynput
Configuration: pydantic + pydantic-settings

License

MIT License - see LICENSE file for details

Contributing

Contributions are welcome!

Acknowledgments

faster-whisper for efficient Whisper inference
ChromaDB for vector database
BAAI for BGE embeddings and reranker models

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.augment/rules		.augment/rules
.github		.github
.pyinstaller-hooks		.pyinstaller-hooks
config		config
docs2ingest		docs2ingest
examples		examples
src/raywhisper		src/raywhisper
tests		tests
.gitignore		.gitignore
.ruff.toml		.ruff.toml
BUILD_START_HERE.md		BUILD_START_HERE.md
CONTRIBUTING.md		CONTRIBUTING.md
DISTRIBUTION_README.txt		DISTRIBUTION_README.txt
LICENSE		LICENSE
QUICKSTART.md		QUICKSTART.md
README.md		README.md
build_exe.py		build_exe.py
build_exe_simple.bat		build_exe_simple.bat
install.ps1		install.ps1
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
setup_venv.ps1		setup_venv.ps1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RayWhisper2

Features

Architecture

Installation

Prerequisites

Install from source

Configuration

Usage

Populate the Vector Database

Run the Application

Development

Run Tests

Code Quality

Pre-commit Hooks

Project Structure

Technology Stack

License

Contributing

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

License

Fredrik-C/RayWhisper2

Folders and files

Latest commit

History

Repository files navigation

RayWhisper2

Features

Architecture

Installation

Prerequisites

Install from source

Configuration

Usage

Populate the Vector Database

Run the Application

Development

Run Tests

Code Quality

Pre-commit Hooks

Project Structure

Technology Stack

License

Contributing

Acknowledgments

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages