Skip to content

Conversation

@RuixiangMa
Copy link

Summary

This PR significantly improves the Multi-Instance GPU (MIG) capability detection logic in the NVIDIA GPU Operator by expanding the list of supported GPU architectures and implementing a more comprehensive pattern-matching approach.

Changes Made

1. Enhanced MIG Detection Logic (controllers/state_manager.go)

  • Refactored the hasMIGCapableGPU function to use a dedicated helper function isMIGCapableGPUProduct
  • Expanded MIG support from 3 basic models to comprehensive architecture coverage
  • Implemented structured pattern matching with clear architectural categorization

2. Comprehensive GPU Architecture Support

The updated detection now supports:

Hopper Architecture (Data Center)

  • H100, H800, H200, H20,GH200

Ampere Architecture

  • A100, A800, A30

Blackwell Architecture (Next Generation)

  • GB200, B200, GB300, B300

Professional Workstation GPUs

  • RTX PRO 6000
  • RTX PRO 5000
  • Dual format support: Both "rtx-pro-6000" and "rtx pro 6000" naming conventions

Verification and Testing

Test Coverage

  • All supported GPU models across architectures
  • Multiple naming format variations
  • Negative test cases for non-MIG GPUs (T4, V100)
  • Edge cases (empty strings, partial matches)

Test Results

截屏2025-11-17 16 47 44

@copy-pr-bot
Copy link

copy-pr-bot bot commented Nov 17, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

…l Workstation GPUs introduced in R525

Signed-off-by: Lancer <[email protected]>
@RuixiangMa RuixiangMa changed the title Enhance MIG Support Detection for NVIDIA GPUs introduced in R525 Enhance MIG Support Detection for NVIDIA GPUs introduced in R580 Nov 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant