Replies: 1 comment 1 reply
-
It does four columns in each thread, and each column has its own scale and offset. It's faster this way, loading four offsets as one |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi, I read the code of gptq reconstruct and find this line:

It seems that 4 values of zero_point are read out from the matrix for the same group. From the theory of gptq quantization, isn't it that there is only one particular zero_point and scale value for one specific group?
Beta Was this translation helpful? Give feedback.
All reactions