Skip to content

liebe-magi/OpenSW

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

92 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

OpenSW

Open-source Speech-to-Text Desktop Application

OpenSW Logo

ζ—₯本θͺžη‰ˆ README はこけら

Tauri React Rust License

Overview

OpenSW is a cross-platform desktop application for quick and efficient speech-to-text conversion. It leverages OpenAI Whisper for local transcription and optionally integrates with Ollama to refine transcribed text using LLMs.

Key Features

  • 🎀 Local Speech Recognition – Uses Whisper for on-device transcription (no cloud required)
  • ⚑ GPU Acceleration – CUDA on Windows, Metal on macOS for fast inference
  • πŸ€– LLM Text Refinement – Optional Ollama integration to clean up filler words and improve punctuation
  • ⌨️ Global Shortcut – Press Ctrl+Alt+Space from anywhere to start/stop recording
  • πŸ“‹ Auto Clipboard – Transcribed text is automatically copied to clipboard
  • πŸ”” System Notifications – Get notified when transcription is complete
  • πŸ“ System Tray – Runs in background with easy access from tray icon
  • πŸ–₯️ Compact Recording Mode – Minimal floating window during recording

Screenshots

Main Window

Main Window

Recording Workflow

Recording Transcribing Refining Copied
Recording Transcribing Refining Copied

Installation

Download Pre-built Binaries

Pre-built binaries are available for Windows and macOS (Apple Silicon):

πŸ‘‰ Download from Releases

Platform File
Windows (exe) OpenSW.exe
Windows (msi) OpenSW_0.1.0_x64_en-US.msi
macOS (Apple Silicon) OpenSW_0.1.0_aarch64.dmg

Build from Source

Prerequisites

Platform-Specific Requirements

Windows:

  • Visual Studio Build Tools 2019+
  • CUDA Toolkit (recommended for GPU acceleration)

macOS:

  • Xcode Command Line Tools
  • Metal is used automatically for GPU acceleration

Linux:

  • Standard development tools (build-essential, etc.)
  • CUDA Toolkit (for GPU acceleration)

Build Commands

# Clone the repository
git clone https://github.com/liebe-magi/OpenSW.git
cd OpenSW

# Install dependencies
bun install

# Run in development mode
bun run tauri dev

# Build for production (sets platform-specific environment variables automatically)
bun run tauri:build

Download Whisper Model

Download a Whisper GGML model from:

πŸ‘‰ https://huggingface.co/ggerganov/whisper.cpp/tree/main

Model Size Accuracy Speed
ggml-tiny.bin ~75 MB Low Fastest
ggml-base.bin ~142 MB Medium Fast
ggml-small.bin ~466 MB Good Moderate
ggml-medium.bin ~1.5 GB High Slow
ggml-large-v3-turbo.bin ~1.6 GB High Moderate
ggml-large-v3.bin ~3 GB Highest Slowest

Tip: For Japanese transcription, ggml-medium.bin or larger is recommended for best accuracy.

Usage

Quick Start

  1. Select a Whisper model – On first launch, click "Select" to choose your downloaded Whisper GGML model file (.bin).

  2. Configure audio input – Select your preferred microphone from the dropdown.

  3. Start recording – Press Ctrl+Alt+Space or click the tray icon.

  4. Stop recording – Press Ctrl+Alt+Space again. The audio will be transcribed and copied to your clipboard.

Optional: Ollama Integration

To enable LLM-based text refinement:

  1. Install and run Ollama
  2. Pull a model (e.g., ollama pull llama3.2)
  3. In OpenSW, configure the Ollama settings:
    • URL: http://localhost:11434 (default)
    • Model: Select your installed model
    • Prompt: Customize the refinement prompt

Troubleshooting

macOS: "App is damaged and can't be opened"

If you see a message saying "OpenSW.app is damaged and can't be opened" when trying to run the app, this is due to macOS Gatekeeper security settings (because the app is not notarized by Apple).

Solution:

Run the following command in Terminal to remove the quarantine attribute:

xattr -cr /Applications/OpenSW.app

(Adjust the path if you installed the app somewhere else)

Configuration

All settings are stored locally and persist across sessions:

Setting Description
Audio Device Select input microphone
Whisper Model Path to GGML model file
Language Transcription language (Japanese/English)
Ollama URL Ollama server address
Ollama Model LLM model for text refinement
Prompt Template Custom prompt for refinement

Tech Stack

  • Frontend: React 18, TypeScript, Vite
  • Backend: Rust, Tauri 2.0
  • Speech Recognition: whisper-rs (whisper.cpp bindings)
  • Audio: cpal, hound, rodio
  • LLM Integration: Ollama API via reqwest

Signing & Auto-Update

OpenSW includes built-in auto-update functionality. The distributed binaries are signed for secure updates.

For Contributors / Self-Builders

If you build from source, signing is optional:

  • Without signing: Development builds work normally (bun run tauri dev)
  • With signing: Required only for distributing signed releases with auto-update

Setting Up Signing (Maintainers Only)

# Generate signing keys
bunx tauri signer generate -w ~/.tauri/opensw.key

# Copy .env.example to .env.local and configure
cp .env.example .env.local
# Edit .env.local with your key path and password

# Build with signing
bun run tauri:build

# Generate latest.json for release
bun run release:prepare

Note: The public key in tauri.conf.json is used to verify updates. If you fork this project and want auto-updates, you'll need to generate your own key pair and update the public key.

Development

Project Structure

OpenSW/
β”œβ”€β”€ src/                    # React frontend
β”‚   β”œβ”€β”€ components/         # UI components
β”‚   └── App.tsx
β”œβ”€β”€ src-tauri/              # Rust backend
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ main.rs         # Application entry point
β”‚   β”‚   β”œβ”€β”€ audio.rs        # Audio recording/playback
β”‚   β”‚   β”œβ”€β”€ ollama.rs       # Ollama API client
β”‚   β”‚   β”œβ”€β”€ clipboard.rs    # Clipboard operations
β”‚   β”‚   └── tray.rs         # System tray setup
β”‚   └── Cargo.toml
└── package.json

Commands

# Development
bun run dev          # Start Vite dev server
bun run tauri dev    # Run Tauri in development mode

# Build
bun run build           # Build frontend
bun run tauri:build     # Build distributable (with signing if configured)
bun run release:prepare # Generate latest.json for updater

# Code Quality
bun run lint         # Run ESLint
bun run format       # Format with Prettier

Contributing

Contributions are welcome! Please feel free to submit issues and pull requests.

License

This project is dual-licensed under either:

at your option.

Acknowledgments

  • OpenAI Whisper – Speech recognition model
  • whisper.cpp – Lightweight Whisper implementation
  • Tauri – Cross-platform desktop framework
  • Ollama – Local LLM runtime

About

Open-source Speech-to-Text Desktop Application

Resources

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT

Stars

Watchers

Forks

Packages

 
 
 

Contributors