Skip to content

Conversation

@lokielse
Copy link
Contributor

Description

This PR introduces a flexible image tag configuration system for the Helm chart, allowing users to override image tags at different levels with a clear priority order. Additionally, this PR fixes an incorrect reference where .Chart.Version was being used instead of .Chart.AppVersion for image tags.

Changes Made

  1. Added tag field to all component image configurations in values.yaml:

    • operator.image.tag
    • podgrouper.image.tag
    • podgroupcontroller.image.tag
    • binder.image.tag
    • binder.resourceReservationImage.tag
    • scheduler.image.tag
    • queuecontroller.image.tag
    • admission.image.tag
    • nodescaleadjuster.image.tag
    • nodescaleadjuster.scalingPodImage.tag
    • crdupgrader.image.tag
  2. Implemented priority-based tag resolution using Helm's default function:

    Priority: <component>.image.tag > global.tag > .Chart.AppVersion
    
  3. Fixed incorrect Chart reference: Changed all occurrences of .Chart.Version to .Chart.AppVersion for image tags

    • .Chart.Version is intended for the Helm chart version
    • .Chart.AppVersion is the correct field for application/image versions

Updated Templates

  • deployments/kai-scheduler/templates/kai-config.yaml
  • deployments/kai-scheduler/templates/services/operator.yaml
  • deployments/kai-scheduler/templates/crd-upgrader.yaml

Benefits

  • Flexibility: Users can now override image tags at three different levels
  • Global control: Set global.tag to apply a tag to all components
  • Component-specific overrides: Set individual component tags when needed (e.g., testing a new version of one component)
  • Backward compatibility: Existing deployments continue to work without changes
  • Correctness: Fixed the incorrect use of .Chart.Version for image tags

Usage Examples

# Use global.tag for all components
global:
  tag: "v1.2.0"

# Override specific component
operator:
  image:
    tag: "v1.3.0-rc1"  # operator will use v1.3.0-rc1, others use v1.2.0

# Fall back to Chart.AppVersion when nothing is set
# (default behavior, no configuration needed)

Related Issues

N/A

Checklist

  • Self-reviewed
  • Added/updated tests (tested with helm template command)
  • Updated CHANGELOG.md (if needed)
  • Updated documentation (if needed)

Breaking Changes

None. This change is fully backward compatible. Existing deployments will continue to use global.tag (default: "latest") or fall back to .Chart.AppVersion if global.tag is not set.

Additional Notes

Testing

The changes were validated using helm template:

  1. Default behavior (uses global.tag):

    helm template kai-scheduler deployments/kai-scheduler
    # Result: All images use tag "latest" from global.tag
  2. Component-specific override:

    helm template kai-scheduler deployments/kai-scheduler --set operator.image.tag=v1.2.3 --set global.tag=v1.0.0
    # Result: operator uses v1.2.3, other components use v1.0.0
  3. Fallback to Chart.AppVersion:

    helm template kai-scheduler deployments/kai-scheduler --set global.tag=""
    # Result: All images use Chart.AppVersion (0.0.0)

Implementation Details

All tag references now use the pattern:

tag: {{ .Values.<component>.image.tag | default .Values.global.tag | default .Chart.AppVersion }}

This ensures consistent behavior across all components while maintaining maximum flexibility for users.

@enoodle
Copy link
Collaborator

enoodle commented Nov 10, 2025

Why not just install KAI normally and then edit the the config? kubectl edit config.kai.scheduler kai-config

@lokielse
Copy link
Contributor Author

Why not just install KAI normally and then edit the the config? kubectl edit config.kai.scheduler kai-config

For some reasons:

  1. We want to make it git-ops friendly as we want to track the changes.
  2. Allow us to override a single image tag instead of building all the images by setting .Values.<component>.image.tag. For example, we want to build a image for binder like feat(binder): specify CPU and memory requests and limits for GPU reservation pod #626

@enoodle
Copy link
Collaborator

enoodle commented Nov 11, 2025

Why not just install KAI normally and then edit the the config? kubectl edit config.kai.scheduler kai-config

For some reasons:

  1. We want to make it git-ops friendly as we want to track the changes.
  2. Allow us to override a single image tag instead of building all the images by setting .Values.<component>.image.tag. For example, we want to build a image for binder like feat(binder): specify CPU and memory requests and limits for GPU reservation pod #626

In you example, wouldn't it be easier to just update only the binder's tag?

@lokielse
Copy link
Contributor Author

Why not just install KAI normally and then edit the the config? kubectl edit config.kai.scheduler kai-config

For some reasons:

  1. We want to make it git-ops friendly as we want to track the changes.
  2. Allow us to override a single image tag instead of building all the images by setting .Values.<component>.image.tag. For example, we want to build a image for binder like feat(binder): specify CPU and memory requests and limits for GPU reservation pod #626

In you example, wouldn't it be easier to just update only the binder's tag?

But it doesn't support just updating binder tag by values, I have to modify helm chart.
https://github.com/NVIDIA/KAI-Scheduler/blob/main/deployments/kai-scheduler/values.yaml#L49-L50

@enoodle enoodle enabled auto-merge (squash) November 11, 2025 14:48
@enoodle enoodle merged commit 9cade65 into NVIDIA:main Nov 11, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants