Questions Regarding Tequila Implementation and Quantization Details

Hi, thanks for open-sourcing this interesting work!

I have several questions about the implementation of Tequila that I couldn't fully clarify from the paper or code:

### 1. Quantization Methods in `models/utils_quant.py`
I noticed multiple quantization methods such as `ultraquant`, `ultraquantv2`, `ultraquantv3`, and `ultraquantv4`, but only `ultraquantv2` and `ultraquantv3` seem to have actual implementations. Could you clarify which version is the final one used in the paper? The paper doesn't explicitly specify this.

### 2. Differentiable Reactivation (Bypassing STE)
The paper highlights "Bypassing STE through Differentiable Reactivation" as a key contribution, but I couldn't locate the corresponding code. Could you point me to where this is implemented?

### 3. Repurposing Dead Weights as Biases
The mechanism for "Repurposing Dead Weights as Biases" is a bit unclear. Is the bias applied per-channel or per-group? It would be helpful to understand how the bias is structured and initialized.

### 4. Inference-Time Ternary Weight Conversion
Could you provide or clarify the inference procedure for converting quantized weights into ternary weights (e.g., {-1, 0, +1}) using the learned bias? Even a fake-quant implementation would be very helpful for understanding the deployment behavior.

### 5. Training Configuration and Efficiency
- What batch size was used in the experiments?
- How does the overall training cost (in terms of time and GPU memory) compare to ParetoQ?
- Specifically, in UltraQuant V2, each layer introduces an additional learnable parameter of the same size as the weight matrix—this seems non-trivial in terms of memory and optimization overhead. Was this mitigated in practice?

Thanks in advance for your clarification!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Questions Regarding Tequila Implementation and Quantization Details #92

1. Quantization Methods in `models/utils_quant.py`

2. Differentiable Reactivation (Bypassing STE)

3. Repurposing Dead Weights as Biases

4. Inference-Time Ternary Weight Conversion

5. Training Configuration and Efficiency

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Questions Regarding Tequila Implementation and Quantization Details #92

Description

1. Quantization Methods in models/utils_quant.py

2. Differentiable Reactivation (Bypassing STE)

3. Repurposing Dead Weights as Biases

4. Inference-Time Ternary Weight Conversion

5. Training Configuration and Efficiency

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

1. Quantization Methods in `models/utils_quant.py`