Hunyuan tokenizer #26

ngxson · 2025-06-29T07:05:45Z

Make sure to read the contributing guidelines before submitting a PR

ngxson · 2025-06-29T07:15:09Z

@coderabbitai pause

coderabbitai · 2025-06-29T07:15:15Z

✅ Actions performed

Reviews paused.

ngxson · 2025-06-29T07:40:36Z

src/llama-graph.cpp

-    if (arch == LLM_ARCH_HUNYUAN_MOE) {
-        weights = ggml_reshape_2d(ctx0, weights, n_expert_used, n_tokens); // [n_expert_used, n_tokens]
-        weights = ggml_div(ctx0, weights, ggml_sum_rows(ctx0, weights)); // [1, n_tokens]
-        weights = ggml_reshape_3d(ctx0, weights, 1, n_expert_used, n_tokens); // [1, n_expert_used, n_tokens]
-        cb(weights, "ffn_moe_weights_scaled", il);
-    }
-


haha good catch, I didn't notice that I reinvented the norm_w code block

ngxson · 2025-06-29T07:41:35Z

gguf-py/gguf/gguf_writer.py

+    def add_qk_norm(self, value: bool) -> None:
+        self.add_bool(Keys.Attention.QK_NORM.format(arch=self.arch), value)
+


This is redundant because we can just check the existence of k_norm.weight tensor. I will remove this after the PR is merged

ngxson · 2025-06-29T07:41:46Z

gguf-py/gguf/constants.py

@@ -148,6 +148,7 @@ class Attention:
        VALUE_LENGTH_MLA             = "{arch}.attention.value_length_mla"
        SHARED_KV_LAYERS             = "{arch}.attention.shared_kv_layers"
        SLIDING_WINDOW_PATTERN       = "{arch}.attention.sliding_window_pattern"
+        QK_NORM                      = "{arch}.attention.qk_norm"


ngxson · 2025-06-29T07:41:51Z

convert_hf_to_gguf.py

+        assert all(n == moe_shared_expert[0] for n in moe_shared_expert)
+        self.gguf_writer.add_expert_shared_count(moe_shared_expert[0])
+
+        self.gguf_writer.add_qk_norm(hparams.get("use_qk_norm", True))


ngxson · 2025-06-29T07:42:41Z

convert_hf_to_gguf.py

+        rope_scaling = hparams.get("rope_scaling", {})
+        if rope_scaling.get("type") == "dynamic":
+            self.gguf_writer.add_rope_scaling_type(gguf.RopeScalingType.YARN)
+            self.gguf_writer.add_rope_scaling_factor(rope_scaling["factor"])


maybe add an else: raise Error here

This one I'm not really sure about. It effectively does nothing right now because the factor is 1 in the config. I would need to dig deeper to see if their "dynamic" really is YARN, and I really want to extend it to their claimed 256k. That will require more research after the logits line up.
As for the raise Error, it seems other implementations just let it continue if rope is not defined, so I'll leave it as is for now.

I did more research and refined this to generically handle their NTK Aware alpha scaling

ngxson · 2025-06-29T07:43:32Z

convert_hf_to_gguf.py

+        special_vocab = gguf.SpecialVocab(self.dir_model, load_merges=False)
+        special_vocab.add_to_gguf(self.gguf_writer)
+        # FIX for BOS token: Manually set the correct BOS token ID.
+        self.gguf_writer.add_bos_token_id(127959) # <|bos|>


We can overwrite this in hparams["bos_token_id"]

I'm not sure if setting it in hparams would override the id that gguf.SpecialVocab reads from the config. I've left this as is for now, but that can be tested later

kooshi · 2025-06-29T14:25:03Z

Alright, I cleaned it up a bit, and it can be merged.

Future considerations:

~~more research on the rope type (what is "dynamic" really?, can it actually scale to 256k?)~~ Done
maybe move the bos token override somewhere else
try to fix load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect

ubergarm · 2025-06-30T15:48:04Z

I just tested this PR and re-converted the safetensors to bf16 and seems to be working now in very limited testing.

👈 commands and logs

## convert
python \
    convert_hf_to_gguf.py \
    --outtype bf16 \
    --split-max-size 50G \
    --outfile /mnt/raid/models/ubergarm/Hunyuan-A13B-Instruct-GGUF/ \
    /mnt/raid/models/tencent/Hunyuan-A13B-Instruct/

## run server
./build/bin/llama-server \
  --model "$model" \
  --alias ubergarm/Hunyuan-A13B-Instruct-bf16 \
  -fa \
  -ctk q8_0 -ctv q8_0 \
  -c 8192 \
  --jinja \
  --temp 0.6 \
  --presence-penalty 0.7 \
  --min-p 0.1 \
  -b 1024 \
  -ts 48,48 \
  -ngl 18 \
  --threads 24 \
  --host 127.0.0.1 \
  --port 8080

Screenshot

kooshi added 10 commits June 27, 2025 18:38

almost working

5e78e88

skip embed, fix bos

d219580

Merge remote-tracking branch 'other/xsn/hunyuan-moe' into hunyuan

616f4c7

cleanup

0fd3930

yarn scaling

b19ecae

cleanup

245db15

failed token fix

8fd547b

Merge remote-tracking branch 'other/xsn/hunyuan-moe' into hunyuan

34cc679

tokenization working

b20bd26

Merge remote-tracking branch 'other/xsn/hunyuan-moe' into hunyuan

99d9e94

github-actions bot added the python label Jun 29, 2025

This comment was marked as resolved.

Sign in to view

ngxson mentioned this pull request Jun 29, 2025

model : add hunyuan moe ggml-org/llama.cpp#14425

Merged

4 tasks

ngxson commented Jun 29, 2025

View reviewed changes

kooshi added 2 commits June 29, 2025 08:52

cleanup and pr changes

1221d94

vocab_size sanity check

5471f5a

ntk alpha generic

46c8b70

tristandruyen mentioned this pull request Jun 30, 2025

Feature Request: Hunyuan-A13B model support ggml-org/llama.cpp#14415

Closed

4 tasks

ngxson merged commit 443ec9b into ngxson:xsn/hunyuan-moe Jun 30, 2025
2 checks passed

		def add_qk_norm(self, value: bool) -> None:
		self.add_bool(Keys.Attention.QK_NORM.format(arch=self.arch), value)

Hunyuan tokenizer #26

Hunyuan tokenizer #26

Uh oh!

Conversation

ngxson commented Jun 29, 2025

Uh oh!

This comment was marked as resolved.

ngxson commented Jun 29, 2025

Uh oh!

coderabbitai bot commented Jun 29, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kooshi commented Jun 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ubergarm commented Jun 30, 2025

Uh oh!

Uh oh!

kooshi commented Jun 29, 2025 •

edited

Loading