Skip to content

Commit 0755b24

Browse files
authored
Merge pull request #13 from NVIDIA/tuned-no-upgrade
docs(tuned): Added example tuned CRD
2 parents c1ea3ae + 2daa066 commit 0755b24

File tree

1 file changed

+70
-0
lines changed

1 file changed

+70
-0
lines changed

tuned/README.md

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -178,6 +178,76 @@ Deploy the package with apply mode to install tuned with default settings.
178178
2. Create a `tuned_profile` configmap specifying which profile to activate
179179
3. Deploy the package with config mode to apply the custom configuration
180180

181+
### Complete Skyhook Configuration Example
182+
183+
Here's a complete example of using the tuned package with Skyhook to deploy custom AI/ML performance profiles:
184+
185+
```yaml
186+
apiVersion: skyhook.nvidia.com/v1alpha1
187+
kind: Skyhook
188+
metadata:
189+
labels:
190+
app.kubernetes.io/part-of: skyhook-operator
191+
app.kubernetes.io/created-by: skyhook-operator
192+
name: skyhook-test
193+
spec:
194+
nodeSelectors:
195+
matchLabels:
196+
eks.amazonaws.com/nodegroup: ml-nodes
197+
packages:
198+
tuned:
199+
image: nvcr.io/nvidian/swgpu-baseos/tuned
200+
version: 1.1.0
201+
interrupt:
202+
type: reboot
203+
configInterrupts:
204+
tuned_profile:
205+
type: reboot
206+
custom_profile:
207+
type: reboot
208+
custom_profile_1:
209+
type: reboot
210+
env:
211+
- name: INTERRUPT
212+
value: "true"
213+
configMap:
214+
tuned_profile: custom_profile
215+
custom_profile: |-
216+
[main]
217+
summary=AI/ML kernel settings
218+
include=custom_profile_1
219+
220+
[sysctl]
221+
kernel.numa_balancing=1 # avoid NUMA page bouncing
222+
kernel.panic=10
223+
224+
[bootloader]
225+
cmdline_myprofile=-kernel.panic +kernel.panic=20
226+
custom_profile_1: |-
227+
[main]
228+
summary=AI/ML performance profile
229+
230+
[cpu]
231+
governor=performance # lock CPUs at max frequency
232+
energy_perf_bias=performance # disable energy saving bias
233+
force_latency=0 # minimize C-state latency
234+
235+
[disk]
236+
readahead=4096 # bigger readahead for large dataset loads
237+
238+
[vm]
239+
transparent_hugepages=always # large pages help with tensor allocations
240+
swappiness=10 # avoid swapping under load
241+
```
242+
243+
This example demonstrates:
244+
- **Node targeting**: Using `nodeSelectors` to target specific node groups
245+
- **Interrupt handling**: Configuring reboot interrupts for kernel-level changes
246+
- **Environment variables**: Setting `INTERRUPT=true` to handle verification during config changes
247+
- **Custom profiles**: Creating hierarchical profiles with `include` directive
248+
- **AI/ML optimizations**: Performance settings optimized for machine learning workloads
249+
- **Kernel parameters**: Using `[sysctl]` and `[bootloader]` sections for low-level tuning
250+
181251
### Available Tuned Profiles
182252

183253
Common built-in profiles include:

0 commit comments

Comments
 (0)