Skip to content

Commit 9d7eb76

Browse files
aj47zackees
andauthored
Attempt fix windows (#45)
* Update README.md * feat: Add custom prompt support for improved transcription accuracy This commit implements custom prompt functionality to improve Whisper transcription accuracy for domain-specific vocabulary, names, and technical terms. ## New Features ### CLI Arguments - --initial_prompt: Direct prompt specification - --prompt_file: Load prompt from text file ### Python API - Added initial_prompt parameter to transcribe() function - Maintains backward compatibility ### Backend Support - Works with all Whisper backends (cpu, cuda, insane, mps) - Updated whisper_mac.py to support other_args parameter - Prompts are passed as --initial_prompt to underlying Whisper engines ## Examples and Documentation - Added example prompt files for common domains: - examples/prompts/technical_terms.txt (AI, programming terms) - examples/prompts/medical_terms.txt (medical terminology) - examples/prompts/business_names.txt (company names) - Updated README with comprehensive usage examples - Added demo script showing various use cases ## Testing - Added unit tests for prompt functionality - Verified CLI argument parsing - Tested API parameter integration ## Usage Examples CLI: transcribe-anything video.mp4 --initial_prompt "AI, machine learning, neural networks" transcribe-anything video.mp4 --prompt_file examples/prompts/technical_terms.txt Python API: transcribe("video.mp4", initial_prompt="Custom vocabulary terms here") This enhancement addresses the need for better recognition of: - Technical terminology - Proper names and company names - Domain-specific jargon - Industry-specific vocabulary Fixes: Improves transcription accuracy for specialized content * refactor: Remove example files and demo scripts Keep only the core functionality for custom prompt support: - Removed example prompt files - Removed demo scripts - Updated README to remove references to example files - Focused on clean, minimal implementation * feat: integrate aj47/whisper-mps fork with initial_prompt support - Update whisper-mps dependency to use aj47/whisper-mps fork - Add native --initial_prompt support for MPS backend - Remove filtering of --initial_prompt argument for MPS backend - Add test for MPS backend with initial_prompt functionality - Enable custom vocabulary support with Apple Silicon acceleration Fixes issue where --initial_prompt was not supported by whisper-mps backend. Now users can use custom vocabulary with fast MPS acceleration. * Refactor Mac implementation from whisper-mps to lightning-whisper-mlx - Replace whisper-mps backend with lightning-whisper-mlx for 10x performance improvement - Add support for multiple languages (remove English-only restriction) - Add support for custom vocabulary via --initial_prompt parameter - Add support for both transcribe and translate tasks - Maintain backward compatibility with run_whisper_mac_english function - Update environment setup to use lightning-whisper-mlx dependencies - Enhance argument parsing to support new MLX features - Update tests to cover new functionality and maintain backward compatibility - Update documentation to reflect new capabilities and performance improvements - Bump version to 3.1.0 This addresses the user's request for custom vocabulary support while significantly improving performance and expanding language support on Apple Silicon Macs. * Fix MLX refactor: update lightning-whisper-mlx version requirement and improve error handling - Fix version requirement from >=0.1.0 to >=0.0.10 (latest available version) - Add better error handling with stderr/stdout capture for debugging - Add warnings for unsupported features (initial_prompt, word_timestamps, temperature) - All tests passing and transcription working correctly - README formatting improvements * Rename MPS to MLX throughout codebase - Change Device enum from MPS to MLX in api.py - Update command-line argument parser to accept 'mlx' as device option - Maintain backward compatibility by accepting 'mps' as alias for 'mlx' - Update README.md to use --device mlx instead of --device mps - Rename test file from test_insanely_fast_whisper_mps.py to test_insanely_fast_whisper_mlx.py - Update all documentation and examples to use MLX terminology - All tests passing with new MLX naming convention * Integrate aj47/lightning-whisper-mlx fork with initial_prompt support - Updated MLX environment to use aj47/lightning-whisper-mlx fork - Added initial_prompt parameter support to MLX transcription - Updated .gitignore to exclude model files and test outputs - Enables custom vocabulary and context prompting for better transcription accuracy * move mlx model location * Fix: Cache NVIDIA detection to prevent repeated torch downloads - Add caching to has_nvidia_smi() function to ensure consistent results - Cache based on system fingerprint to handle hardware changes properly - Add debug logging to track pyproject.toml content changes - Add --clear-nvidia-cache command line option for cache management - Prevents uv-iso-env from triggering unnecessary 2.2GB+ torch reinstalls - Fixes Windows development environment performance issues Files modified: - src/transcribe_anything/util.py: Enhanced NVIDIA detection with caching - src/transcribe_anything/whisper.py: Added debug logging - src/transcribe_anything/insanley_fast_whisper_reqs.py: Added debug logging - src/transcribe_anything/whisper_mac.py: Added debug logging - src/transcribe_anything/_cmd.py: Added clear cache CLI option - tests/test_nvidia_cache.py: Comprehensive test coverage - NVIDIA_CACHE_FIX.md: Detailed documentation of the fix --------- Co-authored-by: Zachary Vorhies <[email protected]>
1 parent fa466a8 commit 9d7eb76

16 files changed

+1172
-251
lines changed

.gitignore

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -138,4 +138,16 @@ transcribe_anything/WHISPER_OPTIONS.json
138138
!.aider.conf.yml
139139
!.aiderignore
140140
activate
141-
uv.lock
141+
uv.lock
142+
143+
# MLX model files (downloaded at runtime)
144+
mlx_models/
145+
src/mlx_models/
146+
147+
# Test output files
148+
tests/localfile/output.json
149+
tests/localfile/text_video_*
150+
151+
# Video files
152+
*.mp4
153+
*.wav

NVIDIA_CACHE_FIX.md

Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
# NVIDIA Detection Cache Fix
2+
3+
## Problem
4+
5+
On Windows development environments, tests were taking very long due to repeated downloads of the 2.2GB+ torch package. This was caused by:
6+
7+
1. **Inconsistent NVIDIA Detection**: The `has_nvidia_smi()` function was returning different values between runs
8+
2. **Dynamic PyProject Generation**: Multiple modules generate different `pyproject.toml` content based on NVIDIA GPU detection
9+
3. **uv-iso-env Behavior**: The `uv-iso-env` package performs a "nuke and pave" reinstall whenever the `pyproject.toml` fingerprint changes
10+
4. **Repeated Downloads**: Each fingerprint change triggered a complete reinstall including the large torch download
11+
12+
## Root Cause
13+
14+
The issue was that `has_nvidia_smi()` was being called multiple times during test runs, and on Windows systems, the detection could be inconsistent due to:
15+
- System state changes
16+
- Process timing issues
17+
- Environment variable changes
18+
- Path resolution inconsistencies
19+
20+
This caused different `pyproject.toml` content to be generated between runs, changing the fingerprint and triggering reinstalls.
21+
22+
## Solution
23+
24+
### 1. NVIDIA Detection Caching
25+
26+
Enhanced `has_nvidia_smi()` in `src/transcribe_anything/util.py` to:
27+
- Cache detection results based on system fingerprint
28+
- Store cache in `~/.transcribe_anything_nvidia_cache.json`
29+
- Use system information (platform, machine, version) + nvidia-smi existence as fingerprint
30+
- Provide consistent results across runs for the same system configuration
31+
32+
### 2. Debug Logging
33+
34+
Added debug logging to environment generation functions:
35+
- `src/transcribe_anything/whisper.py`
36+
- `src/transcribe_anything/insanley_fast_whisper_reqs.py`
37+
- `src/transcribe_anything/whisper_mac.py`
38+
39+
Each now logs the MD5 hash of generated `pyproject.toml` content to help track changes.
40+
41+
### 3. Cache Management
42+
43+
Added command-line option to clear cache when needed:
44+
```bash
45+
transcribe-anything --clear-nvidia-cache
46+
```
47+
48+
### 4. Testing
49+
50+
Created comprehensive tests in `tests/test_nvidia_cache.py` to verify:
51+
- Caching behavior works correctly
52+
- Cache clearing functionality
53+
- Different system fingerprints are handled properly
54+
55+
## Files Modified
56+
57+
- `src/transcribe_anything/util.py` - Enhanced NVIDIA detection with caching
58+
- `src/transcribe_anything/whisper.py` - Added debug logging
59+
- `src/transcribe_anything/insanley_fast_whisper_reqs.py` - Added debug logging
60+
- `src/transcribe_anything/whisper_mac.py` - Added debug logging
61+
- `src/transcribe_anything/_cmd.py` - Added clear cache command-line option
62+
- `tests/test_nvidia_cache.py` - New test file for cache functionality
63+
64+
## Usage
65+
66+
### Normal Operation
67+
The caching is automatic and transparent. The first run will detect NVIDIA availability and cache the result. Subsequent runs will use the cached result, ensuring consistent `pyproject.toml` generation.
68+
69+
### Debugging
70+
If you suspect caching issues, you can:
71+
72+
1. **View debug output**: The system will print debug messages showing:
73+
- Cached vs fresh NVIDIA detection results
74+
- PyProject.toml content hashes for each module
75+
76+
2. **Clear cache**: If hardware changes or you need to force re-detection:
77+
```bash
78+
transcribe-anything --clear-nvidia-cache
79+
```
80+
81+
### Expected Behavior
82+
- **First run**: Detects NVIDIA, caches result, generates environment
83+
- **Subsequent runs**: Uses cached result, generates identical environment
84+
- **No more repeated downloads**: Same fingerprint = no reinstall needed
85+
86+
## Benefits
87+
88+
1. **Faster Testing**: Eliminates repeated 2.2GB+ torch downloads
89+
2. **Consistent Behavior**: Same system configuration always produces same results
90+
3. **Debuggable**: Clear logging shows what's happening
91+
4. **Manageable**: Easy cache clearing when needed
92+
5. **Backward Compatible**: No changes to existing API or behavior
93+
94+
## Technical Details
95+
96+
The cache file (`~/.transcribe_anything_nvidia_cache.json`) stores mappings from system fingerprints to detection results:
97+
98+
```json
99+
{
100+
"Windows-AMD64-10.0.19041-nvidia_smi:true": true,
101+
"Linux-x86_64-5.4.0-nvidia_smi:false": false
102+
}
103+
```
104+
105+
The system fingerprint includes:
106+
- Platform system (Windows, Linux, Darwin)
107+
- Machine architecture (AMD64, x86_64, arm64)
108+
- Platform version
109+
- Whether nvidia-smi executable exists
110+
111+
This ensures that hardware or driver changes are properly detected while maintaining consistency for the same configuration.

0 commit comments

Comments
 (0)