Support conditional prefill based on actual decode kv cache state

Enable conditional disaggregated prefill based on actual cache state. vLLM can return a missing cache response which indicates to the P/D sidecar that a prefill should be used.

See proposal in this [doc](https://docs.google.com/document/d/1-vMameI-rVbg5KNOOwj-QRmGpdcWy1-k0a3XOsou1Y8/edit?pli=1&tab=t.0#heading=h.lr4th2al41cy) and related vLLM [issue](https://github.com/vllm-project/vllm/issues/24256)

cc: @nilig @kfirwolfson

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support conditional prefill based on actual decode kv cache state #382

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support conditional prefill based on actual decode kv cache state #382

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions