Skip to content

[Umbrella] advanced traffic load balancing algorithms #376

Open
@kerthcet

Description

@kerthcet

What would you like to be added:

We will focus on LLM-specific characteristics to load-balance traffic, like prefix-cache aware, kv-cache aware, lora-aware, load-aware, request-profile aware(summary or chat) and so on.

They're plugins baked into the envoy gateway.

Why is this needed:

Better performance.

Completion requirements:

This enhancement requires the following artifacts:

  • Design doc
  • API change
  • Docs update

The artifacts should be linked in subsequent comments.

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureCategorizes issue or PR as related to a new feature.needs-priorityIndicates a PR lacks a label and requires one.needs-triageIndicates an issue or PR lacks a label and requires one.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions