Skip to content

feat: Add priority field to Flavor #469

Open
@googs1025

Description

@googs1025

What would you like to be added:

apiVersion: llmaz.io/v1alpha1
kind: OpenModel
metadata:
  name: opt-125m
spec:
  familyName: opt
  source:
    modelHub:
      modelID: facebook/opt-125m
  inferenceConfig:
    flavors:
      - name: h800
        priority: 5  # higher priority
        nodeSelector:
          karpenter.k8s.aws/instance-gpu-name: h800
        limits:
          nvidia.com/gpu: 4
      - name: h100
        priority: 4
        nodeSelector:
          karpenter.k8s.aws/instance-gpu-name: h100
        limits:
          nvidia.com/gpu: 4
      - name: a100
        priority: 3
        nodeSelector:
          karpenter.k8s.aws/instance-gpu-name: a100
        limits:
          nvidia.com/gpu: 4
      - name: a20
        priority: 2
        nodeSelector:
          karpenter.k8s.aws/instance-gpu-name: a20
        limits:
          nvidia.com/gpu: 4
      - name: t4
        priority: 1  # lower priority
        nodeSelector:
          karpenter.k8s.aws/instance-gpu-name: t4
        limits:
          nvidia.com/gpu: 4

Why is this needed:

When multiple flavors are defined for a model, there is currently no explicit way to control their matching order during scheduling. The scheduler uses the order defined in the list, which may not reflect the intended preference.

https://github.com/InftyAI/scheduler-plugins/blob/685a4d9f8a769f7f5634a6680f374a05c72823cd/pkg/plugins/resource_fungibility/resource_fungibility.go#L228-L248

Completion requirements:

This enhancement requires the following artifacts:

  • Design doc
  • API change
  • Docs update

The artifacts should be linked in subsequent comments.

Metadata

Metadata

Assignees

Labels

featureCategorizes issue or PR as related to a new feature.needs-priorityIndicates a PR lacks a label and requires one.needs-triageIndicates an issue or PR lacks a label and requires one.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions