Skip to content

quantize: Add UINT type to KV overrides #14182

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

EAddario
Copy link
Contributor

When quantising certain models (Qwen3 but suspect others as well), and overriding some metadata parameters like qwen3moe.expert_used_count for example, their expected type is UINTxx but the current --override-kv functionality only supports bool, int, float and str.

Attempts to use the wrong type will usually trigger an error:

  1. Quantise & override: llama-quantize --override-kv qwen3moe.expert_used_count=int:16 Qwen3-30B-A3B-BF16.gguf Qwen3-30B-A3B-Q4_K_M.gguf Q4_K_M
  2. Load model: llama-simple -m Qwen3-30B-A3B-Q4_K_M.gguf "Hello, world!"

Will lead to an error loading model hyperparameters: key qwen3moe.expert_used_count has wrong type i32 but expected type u32 exception

This PR adds uint as a fifth option, and llama-quantize --override-kv qwen3moe.expert_used_count=uint:16 Qwen3-30B-A3B-BF16.gguf Qwen3-30B-A3B-Q4_K_M.gguf Q4_K_M will generate a functioning model

@CISC
Copy link
Collaborator

CISC commented Jun 14, 2025

Uhm, I think that problem is specific to quantize as it works for inferencing, so unless there's a scenario where you need to set a value higher than INT_MAX since the input is 64bit it's perhaps better to just relax the check a little?

@EAddario
Copy link
Contributor Author

That's correct @CISC, overriding during inference works fine. The use case I had in mind was to change the parameter's value during quantisation so that it becomes the new default "baked" into the quantised model.

I did consider modifying the type check instead, but not knowing how many other models fall into the same category (working assumption is MoEs), and not fully groking the effect of relaxing the check, I thought adding the new type was the safest approach since, if it were to blow up somewhere, it will be only for models specifically quantised with the new type, instead of affecting standard functionality. Having said that, happy to go the other way if that's the preferred approach.

@CISC
Copy link
Collaborator

CISC commented Jun 14, 2025

Ah, I see what's happening, that complicates things a little.

Ideally we shouldn't allow the user to change the metadata type like that, but enforcing that is not great for maintainability either.

@EAddario
Copy link
Contributor Author

Don't disagree, but I suppose the exception to the rule is for parameters that are meant to be tweaked, like number of active experts for example. Either way, happy with whatever the final decision is 😁

@CISC
Copy link
Collaborator

CISC commented Jun 14, 2025

You know what though, all integer values written by the conversion script is UINT32, so I think the best solution is to just change gguf_set_val_i32 to gguf_set_val_u32 (and maybe check for negative values)...

@EAddario
Copy link
Contributor Author

EAddario commented Jun 15, 2025

That's a more surgical change! I've implemented the suggestion in another PR and will close this once all tests succeed.

A couple of points to keep in mind: the conversion script also generates other types (UINT16, UINT64, etc.) but those params are not/should not be overridable (i.e. GGUF.tensor_count, split.count, etc.); the llama-quant.cpp logic in if (params->kv_overrides) will seem somewhat counterintuitive (if type INT then generate UINT) so I've added a comment linking back here to explain.

As for checking negative values, I thought a bit about this and opted to just drop the sign by taking the absolute value. It keeps the code simple and, on the basis that the originally converted model does not support negative values anyway (it's all UINTxx), it won't change the original logic.

@EAddario
Copy link
Contributor Author

Replaced by #14197

@EAddario EAddario closed this Jun 15, 2025
@EAddario EAddario deleted the kv_override branch June 15, 2025 20:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants