Skip to content

bug: Helm uninstall leaves behind scheduler pods due to "kept" SchedulingShard #661

@googs1025

Description

@googs1025
➜  dynamo kubectl get pods -nkai-scheduler
NAME                                     READY   STATUS    RESTARTS        AGE
kai-operator-6c7598cd96-zqwp2            1/1     Running   5 (2m14s ago)   13m
kai-scheduler-default-7b9fbfbc97-vsftq   1/1     Running   0               34m
➜  dynamo kubectl get pods -nkai-scheduler
NAME                                     READY   STATUS             RESTARTS        AGE
kai-operator-6c7598cd96-zqwp2            0/1     CrashLoopBackOff   6 (3m48s ago)   23m
kai-scheduler-default-7b9fbfbc97-vsftq   1/1     Running            0               44m
➜  dynamo helm delete -nkai-scheduler kai-scheduler
These resources were kept due to the resource policy:
[Queue] default-parent-queue
[Queue] default-queue
[SchedulingShard] default

release "kai-scheduler" uninstalled
➜  dynamo kubectl get pods -nkai-scheduler
NAME                                     READY   STATUS    RESTARTS   AGE
kai-scheduler-default-7b9fbfbc97-vsftq   1/1     Running   0          44m

the kai-scheduler-default-xxxxx pod continues running, because:

  • The SchedulingShard/default CR is not deleted (due to helm.sh/resource-policy: keep)
  • The operator (or built-in controller) continues to reconcile it and maintain the Deployment/Pod

This leads to unexpected leftover workloads after helm uninstall, which is confusing

Expected Behavior:
helm delete should fully clean up all scheduler components (including SchedulingShard), unless explicitly opted out.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions