🚀 Introducing an efficient KV cache compression framework tailerd for next-scale prediction.

Thanks a lot to VAR's wonderful work! We introduce Scale-Aware KV Cache (ScaleKV), an efficient KV Cache compression framework tailored for Visual Auto-Regressive Modeling. On Infinity-8B, it achieves 10x memory reduction from 85 GB to 8.5 GB with negligible quality degradation.
Arxiv：https://arxiv.org/abs/2505.19602
Github：https://github.com/StargazerX0/ScaleKV
Huggingface Daily Paper：https://huggingface.co/papers/2505.19602

![Image](https://github.com/user-attachments/assets/16bed22c-81a2-4562-b4ea-aa0000279dc0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🚀 Introducing an efficient KV cache compression framework tailerd for next-scale prediction. #166

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

🚀 Introducing an efficient KV cache compression framework tailerd for next-scale prediction. #166

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions