Skip to content

ADAPTIVE_BATCH

Cyber Official edited this page Oct 22, 2025 · 1 revision

Adaptive Batch Processing

Overview

Adaptive Batch Processing is an intelligent job allocation system that dynamically distributes audio transcription tasks between GPU and CPU for optimal performance. The system automatically detects your hardware capabilities, learns from job performance, and makes smart decisions about where to process each audio segment.

Key Features

🎯 Automatic Hardware Detection

  • GPU VRAM Detection: Automatically calculates how many concurrent GPU jobs your system can handle
  • RAM-Based CPU Suggestions: Suggests optimal CPU batch slots based on available system memory
  • Zero Configuration: Works out-of-the-box with sensible defaults

📊 Performance Learning

  • Historical Tracking: Records processing times for GPU and CPU jobs
  • Predictive Allocation: Uses past performance to predict which device is best for each job
  • Continuous Improvement: Gets smarter as it processes more segments

🧠 Smart Job Sorting

  • Priority-Based Allocation:
    • Longest jobs → GPU (maximum performance benefit)
    • Shortest jobs → CPU (minimal speed loss)
  • Dynamic Queue Management: Fills available slots optimally
  • Max CPU Time Limits: Prevents CPU from being overwhelmed by long jobs

🎮 Endgame Strategy

  • 80% Rule (configurable): Stops allocating to CPU near completion
  • Predictable Finish Times: Ensures last jobs complete on faster GPU
  • Prevents Bottlenecks: Avoids waiting for slow CPU jobs at the end

💡 Optimization Suggestions

  • Real-Time Analysis: Monitors system performance during processing
  • Actionable Recommendations: Suggests configuration improvements
  • Learn and Adapt: Helps you tune settings for your specific hardware

How It Works

Phase 1: Hardware Detection

System Analysis:
├─ GPU: 12GB VRAM → Max 2 concurrent jobs (auto-detected)
├─ RAM: 32GB     → Suggests 3 CPU slots (user can override)
└─ Total Capacity: 5 concurrent jobs

Phase 2: Learning Phase

The first few jobs are used to learn your system's performance characteristics:

Job 1: 3.2s audio on GPU → took 8 seconds   (ratio: 2.5x)
Job 2: 1.5s audio on CPU → took 36 seconds  (ratio: 24x)
→ System learns: CPU is ~10x slower than GPU

Phase 3: Smart Allocation

Jobs are sorted by "GPU benefit" (how much faster they'd be on GPU):

Queue sorted by duration:
1. 26.8s segment → GPU (predict: 35s on CPU, 7s on GPU, benefit: 28s saved)
2. 15.3s segment → GPU (predict: 25s on CPU, 5s on GPU, benefit: 20s saved)
3. 0.7s segment  → CPU (predict: 17s on CPU, 2s on GPU, benefit: 15s saved)
4. 1.8s segment  → CPU (predict: 43s on CPU, 4s on GPU, benefit: 39s saved)
5. 2.1s segment  → CPU (predict: 50s on CPU, 5s on GPU, benefit: 45s saved)

Allocation:
🎮 GPU Slot 1: Segment 1 (longest)
🎮 GPU Slot 2: Segment 2 (2nd longest)
💻 CPU Slot 1: Segment 3 (shortest)
💻 CPU Slot 2: Segment 4 (2nd shortest)
💻 CPU Slot 3: Segment 5 (3rd shortest)

Phase 4: Endgame (80%+ Complete)

Progress: 82% complete
→ Stop allocating to CPU
→ Wait for GPU slots only
→ Ensures predictable completion time

Requirements

System Requirements

  • Model Source: FasterWhisper (--model_source fasterwhisper)
  • Device: GPU required (--device cuda or auto-detect, NOT --device cpu)
  • Mode: Caption generation (--makecaptions)

Important: Adaptive batch processing is designed to intelligently distribute work between GPU and CPU. If you only have CPU available, all jobs will run on CPU anyway, making adaptive batch unnecessary. Use regular --batchmode instead.

Usage

Custom Configuration

Override CPU batch slots:

python synthalingua.py --makecaptions --adaptive_batch --model_source fasterwhisper --cpu_batches 4 --file_input video.mp4

Set maximum CPU time per job (5 minutes):

python synthalingua.py --makecaptions --adaptive_batch --model_source fasterwhisper --max_cpu_time 300 --file_input video.mp4

Adjust endgame threshold (stop CPU at 70% instead of 80%):

python synthalingua.py --makecaptions --adaptive_batch --model_source fasterwhisper --stop_cpu_at 0.7 --file_input video.mp4

Full Example with All Options

python synthalingua.py \
    --makecaptions \
    --adaptive_batch \
    --model_source fasterwhisper \
    --cpu_batches 3 \
    --max_cpu_time 300 \
    --stop_cpu_at 0.8 \
    --file_input video.mp4 \
    --ram 11gb-v3 \
    --silent_detect

Command-Line Arguments

--adaptive_batch

Type: Flag (no value needed)
Default: Disabled
Description: Enable intelligent adaptive batch processing

Requirements:

  • Must be used with --makecaptions
  • Requires --model_source fasterwhisper
  • Requires GPU (cannot use --device cpu)
  • Overrides --batchmode if both are specified

Example:

python synthalingua.py --makecaptions --adaptive_batch --model_source fasterwhisper --file_input video.mp4

--cpu_batches N

Type: Integer
Default: Auto-detected based on RAM

  • <16GB RAM → 1 CPU slot
  • 16-32GB RAM → 2 CPU slots
  • 32GB RAM → 3 CPU slots

Description: Number of concurrent CPU batch processing slots

Recommendations:

  • Conservative (1-2): Safest, minimal system impact
  • Balanced (3-4): Good throughput, recommended for most systems
  • Aggressive (5+): Maximum speed but may cause system slowdown

Example:

# Use 4 CPU slots for high-RAM systems
python synthalingua.py --makecaptions --adaptive_batch --cpu_batches 4 --file_input video.mp4

--max_cpu_time SECONDS

Type: Integer
Default: 300 (5 minutes)
Range: 60-600 seconds recommended

Description: Maximum time a job can run on CPU before being forced to wait for GPU

Use Cases:

  • Lower values (60-120s): Prioritize GPU for more jobs
  • Default (300s): Balanced approach
  • Higher values (400-600s): Allow more CPU usage

Example:

# Limit CPU jobs to 2 minutes max
python synthalingua.py --makecaptions --adaptive_batch --max_cpu_time 120 --file_input video.mp4

--batchjobsize SIZE

Type: Float
Default: 4.0 (GB)
Range: 0.1-12.0

Description: Model size in GB used for GPU capacity calculation

Purpose: Tells the system how much VRAM each concurrent job requires, allowing accurate calculation of how many jobs fit in available VRAM.

Model Size Guidelines:

  • 0.1-0.9 GB: Tiny models in optimized modes (e.g., 3GB mode using ~800MB)
  • 1-2 GB: Tiny, base models
  • 3-4 GB: Small, medium models (default)
  • 6-7 GB: Large models, turbo
  • 10-11 GB: Large-v2, large-v3 models

Example:

# Using 3GB model in optimized mode (~800MB actual usage)
python synthalingua.py --makecaptions --adaptive_batch --model_source fasterwhisper --ram 3gb --batchjobsize 0.8 --file_input video.mp4

# Using 11GB model (large-v3) with 12GB VRAM → allows 1 GPU slot
python synthalingua.py --makecaptions --adaptive_batch --model_source fasterwhisper --ram 11gb-v3 --batchjobsize 11 --file_input video.mp4

--stop_cpu_at RATIO

Type: Float
Default: 0.8 (80%)
Range: 0.6-0.95

Description: Progress threshold at which to stop allocating new jobs to CPU

The Endgame Strategy:

  • Lower values (0.6-0.7): Finish faster, more GPU-focused
  • Default (0.8): Balanced predictability
  • Higher values (0.85-0.95): Maximize CPU utilization

Example:

# Stop CPU allocation at 70% progress
python synthalingua.py --makecaptions --adaptive_batch --stop_cpu_at 0.7 --file_input video.mp4

Configuration Examples

Conservative Setup (Low RAM System)

python synthalingua.py \
    --makecaptions \
    --adaptive_batch \
    --cpu_batches 1 \
    --max_cpu_time 180 \
    --stop_cpu_at 0.75 \
    --file_input video.mp4

Best for: Systems with <16GB RAM, want minimal system impact


Balanced Setup (Recommended)

python synthalingua.py \
    --makecaptions \
    --adaptive_batch \
    --cpu_batches 3 \
    --max_cpu_time 300 \
    --stop_cpu_at 0.8 \
    --file_input video.mp4

Best for: Most systems, good balance of speed and stability


Aggressive Setup (High-End System)

python synthalingua.py \
    --makecaptions \
    --adaptive_batch \
    --cpu_batches 5 \
    --max_cpu_time 400 \
    --stop_cpu_at 0.85 \
    --file_input video.mp4

Best for: Systems with >32GB RAM and powerful CPU

Performance Comparison

Traditional Batch Mode

Time: ~8 minutes
Approach: Fixed batch size, no device awareness
Bottleneck: May overflow to CPU unpredictably

Example with 15 segments:
├─ Batch size: 3
├─ Device: Whatever's available
└─ No optimization

Adaptive Batch Mode

Time: ~5 minutes (38% faster!)
Approach: Smart GPU/CPU allocation
Benefits: 
  ✓ Auto-detected capacity (2 GPU + 3 CPU = 5 concurrent)
  ✓ Longest jobs to GPU, shortest to CPU
  ✓ Endgame strategy prevents slowdowns
  ✓ Continuous learning improves allocation

Example with 15 segments:
├─ Max parallel: 5 jobs (2 GPU + 3 CPU)
├─ Smart sorting by duration
├─ Performance prediction
└─ Optimization suggestions

Optimization Suggestions

The system analyzes performance and provides actionable recommendations:

Example Suggestion 1: Increase CPU Batches

💡 OPTIMIZATION SUGGESTION
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
CPU jobs finishing quickly, can handle more parallel work

Current: 3 CPU batches
Suggested: 4 CPU batches
Benefit: ~1-2 minutes faster completion

Example Suggestion 2: CPU Performance Analysis

💡 OPTIMIZATION SUGGESTION
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
CPU performance is good (only 8.2x slower than GPU)

Current: 300s max CPU time
Suggested: 360s max CPU time
Benefit: More efficient job distribution

Technical Details

Hardware Detection

GPU Capacity Calculation:

Total VRAM: 12 GB
OS Reserved: ~0.2 GB (auto-detected or 0.5 GB fallback)
Available VRAM: 11.8 GB
Model Size per Job: 4 GB
Max GPU Batches: 11.8 ÷ 4 = 2 (rounded down)

CPU Capacity Suggestion:

if RAM < 16 GB:
    suggest 1 CPU slot  # Conservative
elif RAM < 32 GB:
    suggest 2 CPU slots  # Moderate
else:
    suggest 3 CPU slots  # Balanced

Performance Tracking

The system maintains two performance logs:

  • GPU Jobs: [(audio_length, processing_time), ...]
  • CPU Jobs: [(audio_length, processing_time), ...]

Prediction Formula:

# After 3+ jobs, use average ratio
average_ratio = sum(processing_time / audio_length) / count

# Predict new job
predicted_time = audio_length * average_ratio

# Until 3 jobs, use estimates:
GPU: audio_length * 2.0   (rough estimate)
CPU: audio_length * 24.0  (12x slower than GPU)

Job Sorting Algorithm

for each segment:
    gpu_time = predict_time(segment, "gpu")
    cpu_time = predict_time(segment, "cpu")
    gpu_benefit = cpu_time - gpu_time
    
# Sort by gpu_benefit (descending)
# Highest benefit = longest segments = allocated to GPU first
# Lowest benefit = shortest segments = allocated to CPU

Troubleshooting

Issue: "Not enough VRAM for adaptive batch"

Solution: System has low GPU memory

  • Try with CPU-only: --cpu_batches 3 and no GPU
  • Or reduce model size: --ram 6gb instead of --ram 11gb-v3

Issue: "System becomes unresponsive"

Solution: Too many CPU batches for your system

  • Reduce CPU slots: --cpu_batches 1 or --cpu_batches 2
  • Lower max CPU time: --max_cpu_time 120

Issue: "GPU slots underutilized"

Solution: System is too conservative

  • Increase CPU batches: --cpu_batches 4
  • Raise stop threshold: --stop_cpu_at 0.9

Issue: "Jobs taking too long on CPU"

Solution: CPU time limit too high

  • Lower max CPU time: --max_cpu_time 180
  • Lower stop threshold: --stop_cpu_at 0.7

Best Practices

  1. Start with defaults - Let the system auto-detect optimal settings first
  2. Monitor first run - Watch the allocation patterns and suggestions
  3. Adjust gradually - Make small changes based on recommendations
  4. Consider your use case:
    • Fast turnaround: Use aggressive settings
    • System stability: Use conservative settings
    • Batch processing: Use balanced settings

Module Architecture

The adaptive batch system consists of four main classes:

BatchConfig

  • Manages configuration parameters
  • Auto-detects GPU capacity
  • Suggests CPU capacity based on RAM
  • Displays formatted configuration

PerformanceTracker

  • Records completed job metrics
  • Predicts processing times
  • Calculates CPU/GPU speed ratios

JobScheduler

  • Manages job queues
  • Allocates jobs to GPU/CPU slots
  • Implements endgame strategy
  • Tracks completion progress

OptimizationSuggester

  • Analyzes performance data
  • Generates actionable recommendations
  • Displays suggestions to user

Requirements

  • Python 3.7+
  • PyTorch (for GPU detection)
  • psutil (for system resource detection)
  • CUDA-capable GPU (optional, falls back to CPU-only mode)

All dependencies are already included in Synthalingua.

Limitations

  • Only works with --makecaptions mode
  • Requires --silent_detect for optimal performance (optional but recommended)
  • GPU detection requires CUDA-capable device
  • First batch may not be optimally allocated (learning phase)

Future Enhancements

Potential improvements for future versions:

  • Dynamic batch size adjustment during runtime
  • Model size auto-detection based on --ram setting
  • Support for multiple GPU devices
  • Persistent performance history across sessions
  • Web UI integration for real-time monitoring

Credits

Implemented as part of Synthalingua 1.2.5 by the Synthalingua development team.


Need Help? Check out the main documentation or open an issue on GitHub.

Clone this wiki locally