Model-Preserving Adaptive Rounding (Yet Another Quantization Algorithm)

This repository contains code for Yet Another Quantization Algorithm (YAQA), a quantization framework that uses a Kronecker-factored approximation of the layerwise Hessian with respect to the full-model KL divergence to better preserve model outputs after quantization. YAQA reduces the KL divergence to the original model by a factor of 1/3 over LDLQ/GPTQ across a wide range of models and quantizers, translating to state of the art performance on downstream tasks. For more details, see the paper.

How to use this codebase

This codebase is based off of the QTIP codebase, with modifications made to support YAQA's quantization algorithm. To collect Hessians, see the README in hessian_llama/. To quantize models, follow the instructions in the QTIP codebase. Prequantized models and Sketch-B Hessians (see paper) can be found here.

Other

If you found this work useful, please consider citing

@misc{tseng2025modelpreservingadaptiverounding,
      title={Model-Preserving Adaptive Rounding}, 
      author={Albert Tseng and Zhaofeng Sun and Christopher De Sa},
      year={2025},
      eprint={2505.22988},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2505.22988}, 
}

Use of Llama models is governed by the Llama Community License. Use of this code is governed by the GNU GPL v3 license.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
assets		assets
eval		eval
hessian_llama		hessian_llama
lib		lib
model		model
qtip-kernels		qtip-kernels
quantize_llama		quantize_llama
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
google4c371916444cb238.html		google4c371916444cb238.html
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Model-Preserving Adaptive Rounding (Yet Another Quantization Algorithm)

How to use this codebase

Other

About

Uh oh!

Releases

Packages

Languages

License

Cornell-RelaxML/yaqa-quantization

Folders and files

Latest commit

History

Repository files navigation

Model-Preserving Adaptive Rounding (Yet Another Quantization Algorithm)

How to use this codebase

Other

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages