Skip to content

🚀 Introducing an efficient KV cache compression framework tailerd for next-scale prediction. #166

@StargazerX0

Description

@StargazerX0

Thanks a lot to VAR's wonderful work! We introduce Scale-Aware KV Cache (ScaleKV), an efficient KV Cache compression framework tailored for Visual Auto-Regressive Modeling. On Infinity-8B, it achieves 10x memory reduction from 85 GB to 8.5 GB with negligible quality degradation.
Arxiv:https://arxiv.org/abs/2505.19602
Github:https://github.com/StargazerX0/ScaleKV
Huggingface Daily Paper:https://huggingface.co/papers/2505.19602

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    ResearchThird-party repositories or research which use VAR

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions