Open
Description
What would you like to be added:
apiVersion: llmaz.io/v1alpha1
kind: OpenModel
metadata:
name: opt-125m
spec:
familyName: opt
source:
modelHub:
modelID: facebook/opt-125m
inferenceConfig:
flavors:
- name: h800
priority: 5 # higher priority
nodeSelector:
karpenter.k8s.aws/instance-gpu-name: h800
limits:
nvidia.com/gpu: 4
- name: h100
priority: 4
nodeSelector:
karpenter.k8s.aws/instance-gpu-name: h100
limits:
nvidia.com/gpu: 4
- name: a100
priority: 3
nodeSelector:
karpenter.k8s.aws/instance-gpu-name: a100
limits:
nvidia.com/gpu: 4
- name: a20
priority: 2
nodeSelector:
karpenter.k8s.aws/instance-gpu-name: a20
limits:
nvidia.com/gpu: 4
- name: t4
priority: 1 # lower priority
nodeSelector:
karpenter.k8s.aws/instance-gpu-name: t4
limits:
nvidia.com/gpu: 4
Why is this needed:
When multiple flavors are defined for a model, there is currently no explicit way to control their matching order during scheduling. The scheduler uses the order defined in the list, which may not reflect the intended preference.
Completion requirements:
This enhancement requires the following artifacts:
- Design doc
- API change
- Docs update
The artifacts should be linked in subsequent comments.