-
-
Notifications
You must be signed in to change notification settings - Fork 24
ADAPTIVE_BATCH
Adaptive Batch Processing is an intelligent job allocation system that dynamically distributes audio transcription tasks between GPU and CPU for optimal performance. The system automatically detects your hardware capabilities, learns from job performance, and makes smart decisions about where to process each audio segment.
- GPU VRAM Detection: Automatically calculates how many concurrent GPU jobs your system can handle
- RAM-Based CPU Suggestions: Suggests optimal CPU batch slots based on available system memory
- Zero Configuration: Works out-of-the-box with sensible defaults
- Historical Tracking: Records processing times for GPU and CPU jobs
- Predictive Allocation: Uses past performance to predict which device is best for each job
- Continuous Improvement: Gets smarter as it processes more segments
-
Priority-Based Allocation:
- Longest jobs → GPU (maximum performance benefit)
- Shortest jobs → CPU (minimal speed loss)
- Dynamic Queue Management: Fills available slots optimally
- Max CPU Time Limits: Prevents CPU from being overwhelmed by long jobs
- 80% Rule (configurable): Stops allocating to CPU near completion
- Predictable Finish Times: Ensures last jobs complete on faster GPU
- Prevents Bottlenecks: Avoids waiting for slow CPU jobs at the end
- Real-Time Analysis: Monitors system performance during processing
- Actionable Recommendations: Suggests configuration improvements
- Learn and Adapt: Helps you tune settings for your specific hardware
System Analysis:
├─ GPU: 12GB VRAM → Max 2 concurrent jobs (auto-detected)
├─ RAM: 32GB → Suggests 3 CPU slots (user can override)
└─ Total Capacity: 5 concurrent jobs
The first few jobs are used to learn your system's performance characteristics:
Job 1: 3.2s audio on GPU → took 8 seconds (ratio: 2.5x)
Job 2: 1.5s audio on CPU → took 36 seconds (ratio: 24x)
→ System learns: CPU is ~10x slower than GPU
Jobs are sorted by "GPU benefit" (how much faster they'd be on GPU):
Queue sorted by duration:
1. 26.8s segment → GPU (predict: 35s on CPU, 7s on GPU, benefit: 28s saved)
2. 15.3s segment → GPU (predict: 25s on CPU, 5s on GPU, benefit: 20s saved)
3. 0.7s segment → CPU (predict: 17s on CPU, 2s on GPU, benefit: 15s saved)
4. 1.8s segment → CPU (predict: 43s on CPU, 4s on GPU, benefit: 39s saved)
5. 2.1s segment → CPU (predict: 50s on CPU, 5s on GPU, benefit: 45s saved)
Allocation:
🎮 GPU Slot 1: Segment 1 (longest)
🎮 GPU Slot 2: Segment 2 (2nd longest)
💻 CPU Slot 1: Segment 3 (shortest)
💻 CPU Slot 2: Segment 4 (2nd shortest)
💻 CPU Slot 3: Segment 5 (3rd shortest)
Progress: 82% complete
→ Stop allocating to CPU
→ Wait for GPU slots only
→ Ensures predictable completion time
-
Model Source: FasterWhisper (
--model_source fasterwhisper) -
Device: GPU required (
--device cudaor auto-detect, NOT--device cpu) -
Mode: Caption generation (
--makecaptions)
Important: Adaptive batch processing is designed to intelligently distribute work between GPU and CPU. If you only have CPU available, all jobs will run on CPU anyway, making adaptive batch unnecessary. Use regular --batchmode instead.
Override CPU batch slots:
python synthalingua.py --makecaptions --adaptive_batch --model_source fasterwhisper --cpu_batches 4 --file_input video.mp4Set maximum CPU time per job (5 minutes):
python synthalingua.py --makecaptions --adaptive_batch --model_source fasterwhisper --max_cpu_time 300 --file_input video.mp4Adjust endgame threshold (stop CPU at 70% instead of 80%):
python synthalingua.py --makecaptions --adaptive_batch --model_source fasterwhisper --stop_cpu_at 0.7 --file_input video.mp4python synthalingua.py \
--makecaptions \
--adaptive_batch \
--model_source fasterwhisper \
--cpu_batches 3 \
--max_cpu_time 300 \
--stop_cpu_at 0.8 \
--file_input video.mp4 \
--ram 11gb-v3 \
--silent_detectType: Flag (no value needed)
Default: Disabled
Description: Enable intelligent adaptive batch processing
Requirements:
- Must be used with
--makecaptions - Requires
--model_source fasterwhisper - Requires GPU (cannot use
--device cpu) - Overrides
--batchmodeif both are specified
Example:
python synthalingua.py --makecaptions --adaptive_batch --model_source fasterwhisper --file_input video.mp4Type: Integer
Default: Auto-detected based on RAM
- <16GB RAM → 1 CPU slot
- 16-32GB RAM → 2 CPU slots
-
32GB RAM → 3 CPU slots
Description: Number of concurrent CPU batch processing slots
Recommendations:
- Conservative (1-2): Safest, minimal system impact
- Balanced (3-4): Good throughput, recommended for most systems
- Aggressive (5+): Maximum speed but may cause system slowdown
Example:
# Use 4 CPU slots for high-RAM systems
python synthalingua.py --makecaptions --adaptive_batch --cpu_batches 4 --file_input video.mp4Type: Integer
Default: 300 (5 minutes)
Range: 60-600 seconds recommended
Description: Maximum time a job can run on CPU before being forced to wait for GPU
Use Cases:
- Lower values (60-120s): Prioritize GPU for more jobs
- Default (300s): Balanced approach
- Higher values (400-600s): Allow more CPU usage
Example:
# Limit CPU jobs to 2 minutes max
python synthalingua.py --makecaptions --adaptive_batch --max_cpu_time 120 --file_input video.mp4Type: Float
Default: 4.0 (GB)
Range: 0.1-12.0
Description: Model size in GB used for GPU capacity calculation
Purpose: Tells the system how much VRAM each concurrent job requires, allowing accurate calculation of how many jobs fit in available VRAM.
Model Size Guidelines:
- 0.1-0.9 GB: Tiny models in optimized modes (e.g., 3GB mode using ~800MB)
- 1-2 GB: Tiny, base models
- 3-4 GB: Small, medium models (default)
- 6-7 GB: Large models, turbo
- 10-11 GB: Large-v2, large-v3 models
Example:
# Using 3GB model in optimized mode (~800MB actual usage)
python synthalingua.py --makecaptions --adaptive_batch --model_source fasterwhisper --ram 3gb --batchjobsize 0.8 --file_input video.mp4
# Using 11GB model (large-v3) with 12GB VRAM → allows 1 GPU slot
python synthalingua.py --makecaptions --adaptive_batch --model_source fasterwhisper --ram 11gb-v3 --batchjobsize 11 --file_input video.mp4Type: Float
Default: 0.8 (80%)
Range: 0.6-0.95
Description: Progress threshold at which to stop allocating new jobs to CPU
The Endgame Strategy:
- Lower values (0.6-0.7): Finish faster, more GPU-focused
- Default (0.8): Balanced predictability
- Higher values (0.85-0.95): Maximize CPU utilization
Example:
# Stop CPU allocation at 70% progress
python synthalingua.py --makecaptions --adaptive_batch --stop_cpu_at 0.7 --file_input video.mp4python synthalingua.py \
--makecaptions \
--adaptive_batch \
--cpu_batches 1 \
--max_cpu_time 180 \
--stop_cpu_at 0.75 \
--file_input video.mp4Best for: Systems with <16GB RAM, want minimal system impact
python synthalingua.py \
--makecaptions \
--adaptive_batch \
--cpu_batches 3 \
--max_cpu_time 300 \
--stop_cpu_at 0.8 \
--file_input video.mp4Best for: Most systems, good balance of speed and stability
python synthalingua.py \
--makecaptions \
--adaptive_batch \
--cpu_batches 5 \
--max_cpu_time 400 \
--stop_cpu_at 0.85 \
--file_input video.mp4Best for: Systems with >32GB RAM and powerful CPU
Time: ~8 minutes
Approach: Fixed batch size, no device awareness
Bottleneck: May overflow to CPU unpredictably
Example with 15 segments:
├─ Batch size: 3
├─ Device: Whatever's available
└─ No optimization
Time: ~5 minutes (38% faster!)
Approach: Smart GPU/CPU allocation
Benefits:
✓ Auto-detected capacity (2 GPU + 3 CPU = 5 concurrent)
✓ Longest jobs to GPU, shortest to CPU
✓ Endgame strategy prevents slowdowns
✓ Continuous learning improves allocation
Example with 15 segments:
├─ Max parallel: 5 jobs (2 GPU + 3 CPU)
├─ Smart sorting by duration
├─ Performance prediction
└─ Optimization suggestions
The system analyzes performance and provides actionable recommendations:
💡 OPTIMIZATION SUGGESTION
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
CPU jobs finishing quickly, can handle more parallel work
Current: 3 CPU batches
Suggested: 4 CPU batches
Benefit: ~1-2 minutes faster completion
💡 OPTIMIZATION SUGGESTION
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
CPU performance is good (only 8.2x slower than GPU)
Current: 300s max CPU time
Suggested: 360s max CPU time
Benefit: More efficient job distribution
GPU Capacity Calculation:
Total VRAM: 12 GB
OS Reserved: ~0.2 GB (auto-detected or 0.5 GB fallback)
Available VRAM: 11.8 GB
Model Size per Job: 4 GB
Max GPU Batches: 11.8 ÷ 4 = 2 (rounded down)CPU Capacity Suggestion:
if RAM < 16 GB:
suggest 1 CPU slot # Conservative
elif RAM < 32 GB:
suggest 2 CPU slots # Moderate
else:
suggest 3 CPU slots # BalancedThe system maintains two performance logs:
-
GPU Jobs:
[(audio_length, processing_time), ...] -
CPU Jobs:
[(audio_length, processing_time), ...]
Prediction Formula:
# After 3+ jobs, use average ratio
average_ratio = sum(processing_time / audio_length) / count
# Predict new job
predicted_time = audio_length * average_ratio
# Until 3 jobs, use estimates:
GPU: audio_length * 2.0 (rough estimate)
CPU: audio_length * 24.0 (12x slower than GPU)for each segment:
gpu_time = predict_time(segment, "gpu")
cpu_time = predict_time(segment, "cpu")
gpu_benefit = cpu_time - gpu_time
# Sort by gpu_benefit (descending)
# Highest benefit = longest segments = allocated to GPU first
# Lowest benefit = shortest segments = allocated to CPUSolution: System has low GPU memory
- Try with CPU-only:
--cpu_batches 3and no GPU - Or reduce model size:
--ram 6gbinstead of--ram 11gb-v3
Solution: Too many CPU batches for your system
- Reduce CPU slots:
--cpu_batches 1or--cpu_batches 2 - Lower max CPU time:
--max_cpu_time 120
Solution: System is too conservative
- Increase CPU batches:
--cpu_batches 4 - Raise stop threshold:
--stop_cpu_at 0.9
Solution: CPU time limit too high
- Lower max CPU time:
--max_cpu_time 180 - Lower stop threshold:
--stop_cpu_at 0.7
- Start with defaults - Let the system auto-detect optimal settings first
- Monitor first run - Watch the allocation patterns and suggestions
- Adjust gradually - Make small changes based on recommendations
-
Consider your use case:
- Fast turnaround: Use aggressive settings
- System stability: Use conservative settings
- Batch processing: Use balanced settings
The adaptive batch system consists of four main classes:
- Manages configuration parameters
- Auto-detects GPU capacity
- Suggests CPU capacity based on RAM
- Displays formatted configuration
- Records completed job metrics
- Predicts processing times
- Calculates CPU/GPU speed ratios
- Manages job queues
- Allocates jobs to GPU/CPU slots
- Implements endgame strategy
- Tracks completion progress
- Analyzes performance data
- Generates actionable recommendations
- Displays suggestions to user
- Python 3.7+
- PyTorch (for GPU detection)
- psutil (for system resource detection)
- CUDA-capable GPU (optional, falls back to CPU-only mode)
All dependencies are already included in Synthalingua.
- Only works with
--makecaptionsmode - Requires
--silent_detectfor optimal performance (optional but recommended) - GPU detection requires CUDA-capable device
- First batch may not be optimally allocated (learning phase)
Potential improvements for future versions:
- Dynamic batch size adjustment during runtime
- Model size auto-detection based on
--ramsetting - Support for multiple GPU devices
- Persistent performance history across sessions
- Web UI integration for real-time monitoring
Implemented as part of Synthalingua 1.2.5 by the Synthalingua development team.
Need Help? Check out the main documentation or open an issue on GitHub.