Skip to content

Cornell-RelaxML/yaqa-quantization

Repository files navigation

This repository contains code for Yet Another Quantization Algorithm (YAQA), a quantization framework that uses a Kronecker-factored approximation of the layerwise Hessian with respect to the full-model KL divergence to better preserve model outputs after quantization. YAQA reduces the KL divergence to the original model by a factor of 1/3 over LDLQ/GPTQ across a wide range of models and quantizers, translating to state of the art performance on downstream tasks. For more details, see the paper.

How to use this codebase

This codebase is based off of the QTIP codebase, with modifications made to support YAQA's quantization algorithm. To collect Hessians, see the README in hessian_llama/. To quantize models, follow the instructions in the QTIP codebase. Prequantized models and Sketch-B Hessians (see paper) can be found here.

Other

If you found this work useful, please consider citing

@misc{tseng2025modelpreservingadaptiverounding,
      title={Model-Preserving Adaptive Rounding}, 
      author={Albert Tseng and Zhaofeng Sun and Christopher De Sa},
      year={2025},
      eprint={2505.22988},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2505.22988}, 
}

Use of Llama models is governed by the Llama Community License. Use of this code is governed by the GNU GPL v3 license.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published