-
Notifications
You must be signed in to change notification settings - Fork 112
Open
Description
➜ dynamo helm upgrade -i kai-scheduler oci://ghcr.io/nvidia/kai-scheduler/kai-scheduler -n kai-scheduler --create-namespace --version v0.10.0
Release "kai-scheduler" does not exist. Installing it now.
Pulled: ghcr.io/nvidia/kai-scheduler/kai-scheduler:v0.10.0
Digest: sha256:d81ec1236acbe7d6cdb6c9e8f3986ce46f8c08d27cabe6b4e586fe0138d27755
NAME: kai-scheduler
LAST DEPLOYED: Wed Nov 19 19:00:30 2025
NAMESPACE: kai-scheduler
STATUS: deployed
REVISION: 1
DESCRIPTION: Install complete
TEST SUITE: NoneWhile deploying the kai-scheduler Helm chart, I noticed that it deploys several distinct controller components within the kai-scheduler namespace:
➜ dynamo kubectl get pods -nkai-scheduler
NAME READY STATUS RESTARTS AGE
admission-7786b67c67-bwl8d 1/1 Running 0 19s
binder-665c5f6f7d-xf4t8 1/1 Running 0 18s
kai-operator-6c7598cd96-5hk6v 1/1 Running 0 27s
kai-scheduler-default-7b9fbfbc97-vsftq 1/1 Running 0 19s
pod-grouper-5db6d945b7-xtpkt 1/1 Running 0 19s
podgroup-controller-5fc6cbc67c-mrtnw 1/1 Running 0 18s
queue-controller-89fd4f965-6q7x6 1/1 Running 0 18sQuestion
Given that these components are all part of the same system (kai-scheduler), why is there a need to split them into multiple controllers?
Current Concerns:
- Operational Complexity: Managing multiple controllers increases operational overhead. Each component needs its own monitoring, logging, and debugging setup.
- Troubleshooting Difficulty: When an issue arises, it's more challenging to pinpoint which controller is at fault. Logs are spread across multiple pods, making it harder to correlate events.
Suggestion
Would it be possible to consolidate these controllers into a single controller pod? This could simplify deployment, reduce operational complexity, and make troubleshooting easier.
Metadata
Metadata
Assignees
Labels
No labels