Skip to content

Support conditional prefill based on actual decode kv cache state #382

@elevran

Description

@elevran

Enable conditional disaggregated prefill based on actual cache state. vLLM can return a missing cache response which indicates to the P/D sidecar that a prefill should be used.

See proposal in this doc and related vLLM issue

cc: @nilig @kfirwolfson

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

Status

Backlog

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions