Skip to content

Conversation

@tony2037
Copy link

This commit adds comprehensive CPU support for inference operations while maintaining full backward compatibility with CUDA workflows. The changes enable ChronoEdit to run on systems without GPU access by providing graceful fallback mechanisms.

Key Changes

1. New Central Device Management Module (chronoedit/utils/device_utils.py)

  • Implements get_device() for automatic device detection with CUDA->CPU fallback
  • Provides get_device_type() for device type string resolution
  • Adds get_device_map() for HuggingFace model compatibility
  • Emits clear warnings when falling back to CPU
  • Handles device validation and error cases gracefully

2. Prompt Enhancer Updates (scripts/prompt_enhancer.py)

  • Added device parameter to load_model() function
  • Updated pick_attn_implementation() to handle CPU-only mode
  • Uses device_utils for consistent device management
  • Automatically selects 'eager' attention for CPU (required)
  • Supports explicit device specification or auto-detection

3. Inference Script Enhancements (scripts/run_inference_diffusers.py)

  • Integrated device_utils for robust device handling
  • Updated device parameter flow throughout pipeline
  • Fixed generator device compatibility
  • Passes device parameter to prompt enhancer
  • Maintains existing --device CLI argument with enhanced fallback

4. Pipeline CPU Compatibility (chronoedit_diffusers/pipeline_chronoedit.py)

  • Wrapped torch.cuda.empty_cache() calls with availability checks
  • All three cache clearing locations now check torch.cuda.is_available()
  • Prevents crashes when running on CPU-only systems
  • No functional changes for CUDA users

5. Device Utility Hardening (chronoedit/_ext/imaginaire/utils/device.py)

  • Made pynvml import optional (try/except with PYNVML_AVAILABLE flag)
  • Updated all GPU-related functions to check CUDA availability
  • get_gpu_architecture() returns None for CPU mode
  • print_gpu_mem() provides informative message for CPU mode
  • gpu0_has_80gb_or_less() returns conservative default on CPU
  • Device class raises clear error when instantiated without CUDA

6. Test Infrastructure

  • Created test_cpu_inference.sh for automated testing
  • Configures HF_HOME to avoid disk space issues
  • Activates conda environment properly
  • Validates PyTorch and CUDA availability before running
  • Tests full inference pipeline with minimal steps for speed

Design Principles

  1. Minimal Changes: Centralized device management reduces scattered modifications
  2. Backward Compatible: No breaking changes to existing CUDA workflows
  3. Graceful Degradation: CPU fallback with clear user warnings
  4. Clean Separation: Training remains GPU-only, inference supports both
  5. Maintainable: Single source of truth for device detection

Usage Examples

Auto-detect (CUDA if available, else CPU with warning) python scripts/run_inference_diffusers.py --input image.png --prompt "..." --output out.mp4

Explicit CPU

python scripts/run_inference_diffusers.py --device cpu --input image.png --prompt "..." --output out.mp4

Explicit CUDA (with auto-fallback to CPU if unavailable) python scripts/run_inference_diffusers.py --device cuda --input image.png --prompt "..." --output out.mp4

Testing

Run the test script to verify CPU inference:
bash test_cpu_inference.sh cpu

Notes

  • Training functionality remains GPU-only (FSDP, distributed, etc.)
  • CPU inference will be significantly slower than CUDA
  • Flash attention automatically disabled on CPU (uses eager mode)
  • All pynvml-dependent functions gracefully handle absence of GPU

Files Modified

  • chronoedit/utils/init.py (new)
  • chronoedit/utils/device_utils.py (new)
  • chronoedit/_ext/imaginaire/utils/device.py
  • chronoedit_diffusers/pipeline_chronoedit.py
  • scripts/prompt_enhancer.py
  • scripts/run_inference_diffusers.py
  • test_cpu_inference.sh (new)

Tested on: PyTorch 2.7.1+cu126 with CUDA unavailable
Environment: chronoedit_mini conda environment

This commit adds comprehensive CPU support for inference operations while
maintaining full backward compatibility with CUDA workflows. The changes
enable ChronoEdit to run on systems without GPU access by providing
graceful fallback mechanisms.

## Key Changes

### 1. New Central Device Management Module (chronoedit/utils/device_utils.py)
- Implements get_device() for automatic device detection with CUDA->CPU fallback
- Provides get_device_type() for device type string resolution
- Adds get_device_map() for HuggingFace model compatibility
- Emits clear warnings when falling back to CPU
- Handles device validation and error cases gracefully

### 2. Prompt Enhancer Updates (scripts/prompt_enhancer.py)
- Added device parameter to load_model() function
- Updated pick_attn_implementation() to handle CPU-only mode
- Uses device_utils for consistent device management
- Automatically selects 'eager' attention for CPU (required)
- Supports explicit device specification or auto-detection

### 3. Inference Script Enhancements (scripts/run_inference_diffusers.py)
- Integrated device_utils for robust device handling
- Updated device parameter flow throughout pipeline
- Fixed generator device compatibility
- Passes device parameter to prompt enhancer
- Maintains existing --device CLI argument with enhanced fallback

### 4. Pipeline CPU Compatibility (chronoedit_diffusers/pipeline_chronoedit.py)
- Wrapped torch.cuda.empty_cache() calls with availability checks
- All three cache clearing locations now check torch.cuda.is_available()
- Prevents crashes when running on CPU-only systems
- No functional changes for CUDA users

### 5. Device Utility Hardening (chronoedit/_ext/imaginaire/utils/device.py)
- Made pynvml import optional (try/except with PYNVML_AVAILABLE flag)
- Updated all GPU-related functions to check CUDA availability
- get_gpu_architecture() returns None for CPU mode
- print_gpu_mem() provides informative message for CPU mode
- gpu0_has_80gb_or_less() returns conservative default on CPU
- Device class raises clear error when instantiated without CUDA

### 6. Test Infrastructure
- Created test_cpu_inference.sh for automated testing
- Configures HF_HOME to avoid disk space issues
- Activates conda environment properly
- Validates PyTorch and CUDA availability before running
- Tests full inference pipeline with minimal steps for speed

## Design Principles

1. **Minimal Changes**: Centralized device management reduces scattered modifications
2. **Backward Compatible**: No breaking changes to existing CUDA workflows
3. **Graceful Degradation**: CPU fallback with clear user warnings
4. **Clean Separation**: Training remains GPU-only, inference supports both
5. **Maintainable**: Single source of truth for device detection

## Usage Examples

# Auto-detect (CUDA if available, else CPU with warning)
python scripts/run_inference_diffusers.py --input image.png --prompt "..." --output out.mp4

# Explicit CPU
python scripts/run_inference_diffusers.py --device cpu --input image.png --prompt "..." --output out.mp4

# Explicit CUDA (with auto-fallback to CPU if unavailable)
python scripts/run_inference_diffusers.py --device cuda --input image.png --prompt "..." --output out.mp4

## Testing

Run the test script to verify CPU inference:
bash test_cpu_inference.sh cpu

## Notes

- Training functionality remains GPU-only (FSDP, distributed, etc.)
- CPU inference will be significantly slower than CUDA
- Flash attention automatically disabled on CPU (uses eager mode)
- All pynvml-dependent functions gracefully handle absence of GPU

## Files Modified

- chronoedit/utils/__init__.py (new)
- chronoedit/utils/device_utils.py (new)
- chronoedit/_ext/imaginaire/utils/device.py
- chronoedit_diffusers/pipeline_chronoedit.py
- scripts/prompt_enhancer.py
- scripts/run_inference_diffusers.py
- test_cpu_inference.sh (new)

Tested on: PyTorch 2.7.1+cu126 with CUDA unavailable
Environment: chronoedit_mini conda environment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant