Skip to content

Questions Regarding Tequila Implementation and Quantization Details #92

@tuzhijun

Description

@tuzhijun

Hi, thanks for open-sourcing this interesting work!

I have several questions about the implementation of Tequila that I couldn't fully clarify from the paper or code:

1. Quantization Methods in models/utils_quant.py

I noticed multiple quantization methods such as ultraquant, ultraquantv2, ultraquantv3, and ultraquantv4, but only ultraquantv2 and ultraquantv3 seem to have actual implementations. Could you clarify which version is the final one used in the paper? The paper doesn't explicitly specify this.

2. Differentiable Reactivation (Bypassing STE)

The paper highlights "Bypassing STE through Differentiable Reactivation" as a key contribution, but I couldn't locate the corresponding code. Could you point me to where this is implemented?

3. Repurposing Dead Weights as Biases

The mechanism for "Repurposing Dead Weights as Biases" is a bit unclear. Is the bias applied per-channel or per-group? It would be helpful to understand how the bias is structured and initialized.

4. Inference-Time Ternary Weight Conversion

Could you provide or clarify the inference procedure for converting quantized weights into ternary weights (e.g., {-1, 0, +1}) using the learned bias? Even a fake-quant implementation would be very helpful for understanding the deployment behavior.

5. Training Configuration and Efficiency

  • What batch size was used in the experiments?
  • How does the overall training cost (in terms of time and GPU memory) compare to ParetoQ?
  • Specifically, in UltraQuant V2, each layer introduces an additional learnable parameter of the same size as the weight matrix—this seems non-trivial in terms of memory and optimization overhead. Was this mitigated in practice?

Thanks in advance for your clarification!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions