-
Notifications
You must be signed in to change notification settings - Fork 12.1k
quantize: Add UINT type to KV overrides #14182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Uhm, I think that problem is specific to |
That's correct @CISC, overriding during inference works fine. The use case I had in mind was to change the parameter's value during quantisation so that it becomes the new default "baked" into the quantised model. I did consider modifying the type check instead, but not knowing how many other models fall into the same category (working assumption is MoEs), and not fully groking the effect of relaxing the check, I thought adding the new type was the safest approach since, if it were to blow up somewhere, it will be only for models specifically quantised with the new type, instead of affecting standard functionality. Having said that, happy to go the other way if that's the preferred approach. |
Ah, I see what's happening, that complicates things a little. Ideally we shouldn't allow the user to change the metadata type like that, but enforcing that is not great for maintainability either. |
Don't disagree, but I suppose the exception to the rule is for parameters that are meant to be tweaked, like number of active experts for example. Either way, happy with whatever the final decision is 😁 |
You know what though, all integer values written by the conversion script is |
That's a more surgical change! I've implemented the suggestion in another PR and will close this once all tests succeed. A couple of points to keep in mind: the conversion script also generates other types (UINT16, UINT64, etc.) but those params are not/should not be overridable (i.e. GGUF.tensor_count, split.count, etc.); the llama-quant.cpp logic in As for checking negative values, I thought a bit about this and opted to just drop the sign by taking the absolute value. It keeps the code simple and, on the basis that the originally converted model does not support negative values anyway (it's all UINTxx), it won't change the original logic. |
Replaced by #14197 |
When quantising certain models (Qwen3 but suspect others as well), and overriding some metadata parameters like
qwen3moe.expert_used_count
for example, their expected type is UINTxx but the current--override-kv
functionality only supportsbool
,int
,float
andstr
.Attempts to use the wrong type will usually trigger an error:
llama-quantize --override-kv qwen3moe.expert_used_count=int:16 Qwen3-30B-A3B-BF16.gguf Qwen3-30B-A3B-Q4_K_M.gguf Q4_K_M
llama-simple -m Qwen3-30B-A3B-Q4_K_M.gguf "Hello, world!"
Will lead to an
error loading model hyperparameters: key qwen3moe.expert_used_count has wrong type i32 but expected type u32
exceptionThis PR adds
uint
as a fifth option, andllama-quantize --override-kv qwen3moe.expert_used_count=uint:16 Qwen3-30B-A3B-BF16.gguf Qwen3-30B-A3B-Q4_K_M.gguf Q4_K_M
will generate a functioning model