Skip to content

Hunyuan tokenizer #26

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Jun 30, 2025
Merged

Hunyuan tokenizer #26

merged 13 commits into from
Jun 30, 2025

Conversation

ngxson
Copy link
Owner

@ngxson ngxson commented Jun 29, 2025

Make sure to read the contributing guidelines before submitting a PR

This comment was marked as resolved.

@ngxson
Copy link
Owner Author

ngxson commented Jun 29, 2025

@coderabbitai pause

Copy link

coderabbitai bot commented Jun 29, 2025

✅ Actions performed

Reviews paused.

Comment on lines -708 to -714
if (arch == LLM_ARCH_HUNYUAN_MOE) {
weights = ggml_reshape_2d(ctx0, weights, n_expert_used, n_tokens); // [n_expert_used, n_tokens]
weights = ggml_div(ctx0, weights, ggml_sum_rows(ctx0, weights)); // [1, n_tokens]
weights = ggml_reshape_3d(ctx0, weights, 1, n_expert_used, n_tokens); // [1, n_expert_used, n_tokens]
cb(weights, "ffn_moe_weights_scaled", il);
}

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haha good catch, I didn't notice that I reinvented the norm_w code block

Comment on lines 795 to 797
def add_qk_norm(self, value: bool) -> None:
self.add_bool(Keys.Attention.QK_NORM.format(arch=self.arch), value)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is redundant because we can just check the existence of k_norm.weight tensor. I will remove this after the PR is merged

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

@@ -148,6 +148,7 @@ class Attention:
VALUE_LENGTH_MLA = "{arch}.attention.value_length_mla"
SHARED_KV_LAYERS = "{arch}.attention.shared_kv_layers"
SLIDING_WINDOW_PATTERN = "{arch}.attention.sliding_window_pattern"
QK_NORM = "{arch}.attention.qk_norm"
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rm this

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

assert all(n == moe_shared_expert[0] for n in moe_shared_expert)
self.gguf_writer.add_expert_shared_count(moe_shared_expert[0])

self.gguf_writer.add_qk_norm(hparams.get("use_qk_norm", True))
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rm this

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

rope_scaling = hparams.get("rope_scaling", {})
if rope_scaling.get("type") == "dynamic":
self.gguf_writer.add_rope_scaling_type(gguf.RopeScalingType.YARN)
self.gguf_writer.add_rope_scaling_factor(rope_scaling["factor"])
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add an else: raise Error here

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one I'm not really sure about. It effectively does nothing right now because the factor is 1 in the config. I would need to dig deeper to see if their "dynamic" really is YARN, and I really want to extend it to their claimed 256k. That will require more research after the logits line up.
As for the raise Error, it seems other implementations just let it continue if rope is not defined, so I'll leave it as is for now.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did more research and refined this to generically handle their NTK Aware alpha scaling

special_vocab = gguf.SpecialVocab(self.dir_model, load_merges=False)
special_vocab.add_to_gguf(self.gguf_writer)
# FIX for BOS token: Manually set the correct BOS token ID.
self.gguf_writer.add_bos_token_id(127959) # <|bos|>
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can overwrite this in hparams["bos_token_id"]

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if setting it in hparams would override the id that gguf.SpecialVocab reads from the config. I've left this as is for now, but that can be tested later

@kooshi
Copy link

kooshi commented Jun 29, 2025

Alright, I cleaned it up a bit, and it can be merged.

Future considerations:

  • more research on the rope type (what is "dynamic" really?, can it actually scale to 256k?) Done
  • maybe move the bos token override somewhere else
  • try to fix load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect

@ngxson ngxson merged commit 443ec9b into ngxson:xsn/hunyuan-moe Jun 30, 2025
2 checks passed
@ubergarm
Copy link

I just tested this PR and re-converted the safetensors to bf16 and seems to be working now in very limited testing.

👈 commands and logs
## convert
python \
    convert_hf_to_gguf.py \
    --outtype bf16 \
    --split-max-size 50G \
    --outfile /mnt/raid/models/ubergarm/Hunyuan-A13B-Instruct-GGUF/ \
    /mnt/raid/models/tencent/Hunyuan-A13B-Instruct/

## run server
./build/bin/llama-server \
  --model "$model" \
  --alias ubergarm/Hunyuan-A13B-Instruct-bf16 \
  -fa \
  -ctk q8_0 -ctv q8_0 \
  -c 8192 \
  --jinja \
  --temp 0.6 \
  --presence-penalty 0.7 \
  --min-p 0.1 \
  -b 1024 \
  -ts 48,48 \
  -ngl 18 \
  --threads 24 \
  --host 127.0.0.1 \
  --port 8080

Screenshot

mainline-hunyuan-a13b-test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants