|
| 1 | +# BindRequest Annotations Mutation Plugin Point |
| 2 | + |
| 3 | +## Summary |
| 4 | + |
| 5 | +This design document outlines a new plugin point in the KAI-Scheduler that allows scheduler plugins to modify BindRequest annotations before they are created. This enhancement addresses synchronization issues between the scheduler and binder components. |
| 6 | + |
| 7 | +## Motivation |
| 8 | + |
| 9 | +The current architecture of KAI-Scheduler separates the scheduling and binding processes into distinct components, which improves scalability and error resilience. However, the communication between these components is currently limited to the fixed fields defined in the BindRequest specification. |
| 10 | + |
| 11 | +We need a more open and flexible API that allows plugins in the scheduler to communicate additional information to plugins in the binder without requiring changes to the BindRequest CRD for each new use case. This will enable more sophisticated scheduling and binding behaviors while maintaining a clean separation between components. |
| 12 | + |
| 13 | +By allowing scheduler plugins to add annotations to BindRequests that can be interpreted by corresponding binder plugins, we create an extensible communication channel that can evolve without API changes. This approach preserves backward compatibility while enabling new functionality through plugins. |
| 14 | + |
| 15 | +## Usage Stories |
| 16 | + |
| 17 | +### Node State Synchronization |
| 18 | + |
| 19 | +A plugin in the scheduler detects that a node is suitable for certain GPU workloads and adds annotations to the BindRequest indicating which plugin should handle the bind on the binder. If that plugin can't detect the same environments it can try to refresh the status of the node in the binder cache or reject the bind request. |
| 20 | + |
| 21 | +### Topology Information Transfer |
| 22 | + |
| 23 | +For topology-aware scheduling, a plugin can inject the exact topology location information into the BindRequest annotations. This allows each pod to receive information about its position in the topology (relevant for block implementation) without adding fields to the BindRequest specification. |
| 24 | + |
| 25 | +### Device Management Plugin Communication |
| 26 | + |
| 27 | +Device management plugins can transfer parameters between the scheduler and binder without adding custom fields to the BindRequest, maintaining a clean API while enabling rich functionality. |
| 28 | + |
| 29 | +## Goals |
| 30 | + |
| 31 | +- Enable scheduler plugins to modify BindRequest annotations before creation |
| 32 | +- Create a flexible interface for transferring information from scheduler plugins to binder plugins |
| 33 | +- Maintain backward compatibility with existing scheduler and binder behavior |
| 34 | + |
| 35 | +## Design Details |
| 36 | + |
| 37 | +### Extension Point Definition |
| 38 | + |
| 39 | +Following the scheduler's plugin extension conventions, we introduce a function type for mutating BindRequest annotations: |
| 40 | + |
| 41 | +```go |
| 42 | +// In pkg/scheduler/api/types.go |
| 43 | +// BindRequestMutateFn allows plugins to mutate annotations before BindRequest creation. |
| 44 | +type BindRequestMutateFn func(pod *pod_info.PodInfo, nodeName string) map[string]string |
| 45 | +``` |
| 46 | + |
| 47 | +A slice of these functions is added to the `Session` struct, and a registration method is provided: |
| 48 | + |
| 49 | +```go |
| 50 | +// In pkg/scheduler/framework/session.go and session_plugins.go |
| 51 | +// In Session struct: |
| 52 | +BindRequestMutateFns []api.BindRequestMutateFn |
| 53 | + |
| 54 | +// Registration method: |
| 55 | +func (ssn *Session) AddBindRequestMutateFn(fn api.BindRequestMutateFn) { |
| 56 | + ssn.BindRequestMutateFns = append(ssn.BindRequestMutateFns, fn) |
| 57 | +} |
| 58 | +``` |
| 59 | + |
| 60 | +### Plugin Registration |
| 61 | + |
| 62 | +Plugins register their mutate function during `OnSessionOpen`: |
| 63 | + |
| 64 | +```go |
| 65 | +func (p *MyPlugin) OnSessionOpen(ssn *framework.Session) { |
| 66 | + ssn.AddBindRequestMutateFn(p.MyBindRequestMutateFn) |
| 67 | +} |
| 68 | + |
| 69 | +func (p *MyPlugin) MyBindRequestMutateFn(pod *pod_info.PodInfo, nodeName string) map[string]string { |
| 70 | + annotations := map[string]string{} |
| 71 | + annotations["my-plugin.kai.scheduler/some-key"] = "some-value" |
| 72 | + return annotations |
| 73 | +} |
| 74 | +``` |
| 75 | + |
| 76 | +### Usage in the Scheduler |
| 77 | + |
| 78 | +When creating a BindRequest, the scheduler will call all registered mutate functions: |
| 79 | + |
| 80 | +```go |
| 81 | +// In createBindRequest (simplified): |
| 82 | +annotations := make(map[string]string) |
| 83 | + |
| 84 | +for _, fn := range ssn.BindRequestMutateFns { |
| 85 | + annotations = maps.Copy(fn(podInfo, nodeName), annotations) |
| 86 | +} |
| 87 | +// ... proceed to create the BindRequest with these annotations |
| 88 | +``` |
| 89 | + |
| 90 | +### Binder Plugin Access |
| 91 | + |
| 92 | +Binder plugins already have access to the BindRequest object during the PreBind and PostBind phases, so they can read the annotations added by scheduler plugins: |
| 93 | + |
| 94 | +```go |
| 95 | +func (p *MyBinderPlugin) PreBind(ctx context.Context, pod *v1.Pod, node *v1.Node, |
| 96 | + bindRequest *v1alpha2.BindRequest, state *state.BindingState) error { |
| 97 | + // Read annotations added by scheduler plugins |
| 98 | + if value, exists := bindRequest.Annotations["my-plugin.kai.scheduler/some-key"]; exists { |
| 99 | + // Use the annotation value to modify binding behavior |
| 100 | + } |
| 101 | + return nil |
| 102 | +} |
| 103 | +``` |
| 104 | + |
| 105 | +### Annotation Naming Convention |
| 106 | + |
| 107 | +To avoid conflicts between different plugins, we recommend using a namespaced approach for annotation keys: |
| 108 | + |
| 109 | +``` |
| 110 | +<plugin-name>.kai.scheduler/<key> |
| 111 | +``` |
| 112 | + |
| 113 | +For example: |
| 114 | +``` |
| 115 | +topology-plugin.kai.scheduler/topology-level: "rack" |
| 116 | +gpu-plugin.kai.scheduler/requires-env-vars: "true" |
| 117 | +``` |
0 commit comments