Feature: LoRA based filtering and scoring

Create LoRA affinity filter or scorer (*Note*: Confirm it is not available already in upstream GIE epp packages)

Possible  filter behavior:
- Separating pods into two groups: those with target model affinity and those with available capacity
- Using a probability threshold to sometimes select from non-affinity pods to enable load balancing
- Falling back to whatever group has pods if one group is empty

Possible Scorer behavior:
- Provide maximum score for pods that have the required LoRA loaded and zero score for all other pods

Decision required for building list of active LoRAs:
vLLM metrics contain all permutations of LoRAs of running and waiting requests with timestamp

- Option1: use only latest metrics which defines the most recent loras state. Problematic when vLLM load is not 100%. Need to understand how vLLM works, is LoRA is offloaded once the request processing finished
- Option2: use not only the most recent metric to get running and waiting loras. In case the load is lower than 100%, we want to go back to less recent events and collect loras up to max_loras.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature: LoRA based filtering and scoring #10

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature: LoRA based filtering and scoring #10

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions