Skip to content

Commit e4c9d64

Browse files
authored
feat(operator): Add default podAntiAffinity and service-level Affinity support (#619)
Signed-off-by: Omer Yahud <[email protected]>
1 parent 30fd88c commit e4c9d64

File tree

13 files changed

+9090
-1741
lines changed

13 files changed

+9090
-1741
lines changed

CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,8 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
1212
- Option to configure reservation pods runtime class.
1313
- Added a tool to run time-aware fairness simulations over multiple cycles (see [Time-Aware Fairness Simulator](cmd/time-aware-simulator/README.md))
1414
- Added enforcement of the `nvidia` runtime class for GPU pods, with the option to enforce a custom runtime class, or disable enforcement entirely.
15+
- Added a preferred podAntiAffinity term by default for all services, can be set to required instead by setting `global.requireDefaultPodAffinityTerm`
16+
- Added support for service-level affinities
1517

1618
### Fixed
1719
- Fixed a bug where the scheduler would not re-try updating podgroup status after failure

deployments/kai-scheduler/.helmignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,3 +35,4 @@ stable-index/*
3535
.circleci/
3636
.github/
3737
tests/*
38+

deployments/kai-scheduler/crds/kai.scheduler_configs.yaml

Lines changed: 8212 additions & 1738 deletions
Large diffs are not rendered by default.

deployments/kai-scheduler/templates/kai-config.yaml

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,9 @@ spec:
2727
affinity:
2828
{{- toYaml .Values.global.affinity | nindent 6 }}
2929
{{- end }}
30+
{{- if .Values.global.requireDefaultPodAntiAffinityTerm }}
31+
requireDefaultPodAntiAffinityTerm: true
32+
{{- end }}
3033
{{- if .Values.global.tolerations }}
3134
tolerations:
3235
{{- toYaml .Values.global.tolerations | nindent 6 }}
@@ -52,6 +55,10 @@ spec:
5255
resources:
5356
{{- toYaml .Values.binder.resources | nindent 8 }}
5457
{{- end }}
58+
{{- if .Values.binder.affinity }}
59+
affinity:
60+
{{- toYaml .Values.binder.affinity | nindent 8 }}
61+
{{- end }}
5562
metricsPort: {{ .Values.binder.ports.metricsPort }}
5663
resourceReservation:
5764
{{- if .Values.binder.runtimeClassName }}
@@ -72,6 +79,10 @@ spec:
7279
resources:
7380
{{- toYaml .Values.podgroupcontroller.resources | nindent 8 }}
7481
{{- end }}
82+
{{- if .Values.podgroupcontroller.affinity }}
83+
affinity:
84+
{{- toYaml .Values.podgroupcontroller.affinity | nindent 8 }}
85+
{{- end }}
7586

7687
queueController:
7788
service:
@@ -85,6 +96,10 @@ spec:
8596
resources:
8697
{{- toYaml .Values.queuecontroller.resources | nindent 8 }}
8798
{{- end }}
99+
{{- if .Values.queuecontroller.affinity }}
100+
affinity:
101+
{{- toYaml .Values.queuecontroller.affinity | nindent 8 }}
102+
{{- end }}
88103

89104
admission:
90105
service:
@@ -98,6 +113,10 @@ spec:
98113
resources:
99114
{{- toYaml .Values.admission.resources | nindent 8 }}
100115
{{- end }}
116+
{{- if .Values.admission.affinity }}
117+
affinity:
118+
{{- toYaml .Values.admission.affinity | nindent 8 }}
119+
{{- end }}
101120
gpuSharing: {{ .Values.global.gpuSharing | default false }}
102121
queueLabelSelector: false
103122
webhook:
@@ -119,6 +138,10 @@ spec:
119138
resources:
120139
{{- toYaml .Values.nodescaleadjuster.resources | nindent 8 }}
121140
{{- end }}
141+
{{- if .Values.nodescaleadjuster.affinity }}
142+
affinity:
143+
{{- toYaml .Values.nodescaleadjuster.affinity | nindent 8 }}
144+
{{- end }}
122145
args:
123146
nodeScaleNamespace: {{ .Values.nodescaleadjuster.scalingPodNamespace }}
124147
scalingPodImage:
@@ -138,6 +161,10 @@ spec:
138161
resources:
139162
{{- toYaml .Values.scheduler.resources | nindent 8 }}
140163
{{- end }}
164+
{{- if .Values.scheduler.affinity }}
165+
affinity:
166+
{{- toYaml .Values.scheduler.affinity | nindent 8 }}
167+
{{- end }}
141168
{{- if and .Values.scheduler.ports .Values.scheduler.ports.metricsPort }}
142169
schedulerService:
143170
port: {{ .Values.scheduler.ports.metricsPort }}

0 commit comments

Comments
 (0)