llama-quantize error #14922
Unanswered
lakatosgabor
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello!
I am trying to quantize a model to a smaller version, but I am getting the following error.
What could be the problem?
PS E:\PythonProjects\gguf_new\llama.cpp\build\bin\Release> ./llama-quantize.exe "E:\PythonProjects\gguf\OllamaToGGUF\Output\OpenEuroLLM-Hungarian\OpenEu roLLM-Hungarian-12.2B-Q4_K_M.gguf" "E:\PythonProjects\gguf\OllamaToGGUF\Output\hunLLMQ4_1.gguf" Q4_1 main: build = 6014 (00fa15fe) main: built with MSVC 19.44.35213.0 for x64 main: quantizing 'E:\PythonProjects\gguf\OllamaToGGUF\Output\OpenEuroLLM-Hungarian\OpenEuroLLM-Hungarian-12.2B-Q4_K_M.gguf' to 'E:\PythonProjects\gguf\O llamaToGGUF\Output\hunLLMQ4_1.gguf' as Q4_1 llama_model_loader: loaded meta data with 36 key-value pairs and 1065 tensors from E:\PythonProjects\gguf\OllamaToGGUF\Output\OpenEuroLLM-Hungarian\Open EuroLLM-Hungarian-12.2B-Q4_K_M.gguf (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: gemma3.attention.head_count u32 = 16 llama_model_loader: - kv 1: gemma3.attention.head_count_kv u32 = 8 llama_model_loader: - kv 2: gemma3.attention.key_length u32 = 256 llama_model_loader: - kv 3: gemma3.attention.sliding_window u32 = 1024 llama_model_loader: - kv 4: gemma3.attention.value_length u32 = 256 llama_model_loader: - kv 5: gemma3.block_count u32 = 48 llama_model_loader: - kv 6: gemma3.context_length u32 = 131072 llama_model_loader: - kv 7: gemma3.embedding_length u32 = 3840 llama_model_loader: - kv 8: gemma3.feed_forward_length u32 = 15360 llama_model_loader: - kv 9: gemma3.mm.tokens_per_image u32 = 256 llama_model_loader: - kv 10: gemma3.vision.attention.head_count u32 = 16 llama_model_loader: - kv 11: gemma3.vision.attention.layer_norm_epsilon f32 = 0.000001 llama_model_loader: - kv 12: gemma3.vision.block_count u32 = 27 llama_model_loader: - kv 13: gemma3.vision.embedding_length u32 = 1152 llama_model_loader: - kv 14: gemma3.vision.feed_forward_length u32 = 4304 llama_model_loader: - kv 15: gemma3.vision.image_size u32 = 896 llama_model_loader: - kv 16: gemma3.vision.num_channels u32 = 3 llama_model_loader: - kv 17: gemma3.vision.patch_size u32 = 14 llama_model_loader: - kv 18: general.architecture str = gemma3 llama_model_loader: - kv 19: tokenizer.chat_template str = {{ bos_token }}\n{%- if messages[0]['r... llama_model_loader: - kv 20: tokenizer.ggml.add_bos_token bool = true llama_model_loader: - kv 21: tokenizer.ggml.add_eos_token bool = false llama_model_loader: - kv 22: tokenizer.ggml.add_padding_token bool = false llama_model_loader: - kv 23: tokenizer.ggml.add_unknown_token bool = false llama_model_loader: - kv 24: tokenizer.ggml.bos_token_id u32 = 2 llama_model_loader: - kv 25: tokenizer.ggml.eos_token_id u32 = 1 llama_model_loader: - kv 26: tokenizer.ggml.merges arr[str,514906] = ["\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n", ... llama_model_loader: - kv 27: tokenizer.ggml.model str = llama llama_model_loader: - kv 28: tokenizer.ggml.padding_token_id u32 = 0 llama_model_loader: - kv 29: tokenizer.ggml.pre str = default llama_model_loader: - kv 30: tokenizer.ggml.scores arr[f32,262145] = [0.000000, 0.000000, 0.000000, 0.0000... llama_model_loader: - kv 31: tokenizer.ggml.token_type arr[i32,262145] = [3, 3, 3, 2, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 32: tokenizer.ggml.tokens arr[str,262145] = ["<pad>", "<eos>", "<bos>", "<unk>", ... llama_model_loader: - kv 33: tokenizer.ggml.unknown_token_id u32 = 3 llama_model_loader: - kv 34: general.quantization_version u32 = 2 llama_model_loader: - kv 35: general.file_type u32 = 15 llama_model_loader: - type f32: 563 tensors llama_model_loader: - type f16: 165 tensors llama_model_loader: - type q4_K: 290 tensors llama_model_loader: - type q6_K: 47 tensors llama_model_quantize: failed to quantize: key not found in model: gemma3.attention.layer_norm_rms_epsilon main: failed to quantize model from 'E:\PythonProjects\gguf\OllamaToGGUF\Output\OpenEuroLLM-Hungarian\OpenEuroLLM-Hungarian-12.2B-Q4_K_M.gguf'
Beta Was this translation helpful? Give feedback.
All reactions