Skip to content

Conversation

@kallewoof
Copy link

@kallewoof kallewoof commented Jul 19, 2025

This adds a tests/ folder with a single test_autoguess.py script which will return a zero exit code iff

  • all AutoGuess.json entries have a mapped real model that uses the suggested template
  • all mappings map to the corresponding AutoGuess adapter

and exit code 1 otherwise. It also adds a github workflow that runs this on any pull request which touches AutoGuess.json.

The test currently fails the RWKV model only, which is tested against fla-hub/rwkv7-1.5B-world. I started off with the ambition of finding open ungated models for every template (which meant some odd choices for certain models whose official releases are gated), but gave up in favor of a best-effort approach where gated models' tokenizer configs are saved (as is) in an external github repository that is cloned in the github workflow. You may even argue that this should just include all the templates, and esp. if we put this into CI where the tests will repeatedly download them over and over from HF it may be better to simply push them to git instead.

Results so far:

ChatML (Phi 4)                  = ChatML (Phi 4)                  : OK      microsoft/phi-4
ChatML (Qwen 2.5 based)         = ChatML (Qwen 2.5 based)         : OK      Qwen/Qwen2.5-0.5B-Instruct
ChatML (Kimi)                   = ChatML (Kimi)                   : OK      moonshotai/Kimi-K2-Instruct
Google Gemma 2                  = Google Gemma 2                  : OK      Efficient-Large-Model/gemma-2-2b-it
Google Gemma 3                  = Google Gemma 3                  : OK      scb10x/typhoon2.1-gemma3-12b
Google Gemma 3n                 = Google Gemma 3n                 : OK      lmstudio-community/gemma-3n-E4B-it-MLX-bf16
Llama 3.x                       = Llama 3.x                       : OK      Steelskull/L3.3-Shakudo-70b
Llama 4                         = Llama 4                         : OK      meta-llama/Llama-4-Scout-17B-16E-Instruct
Mistral V7 (with system prompt) = Mistral V7 (with system prompt) : OK      Doctor-Shotgun/MS3.2-24B-Magnum-Diamond
Mistral V3                      = Mistral V3                      : OK      mistralai/Mistral-7B-Instruct-v0.3
GLM-4                           = GLM-4                           : OK      THUDM/glm-4-9b-chat-hf
Phi 3.5                         = Phi 3.5                         : OK      microsoft/Phi-3.5-mini-instruct
Phi 4 (mini)                    = Phi 4 (mini)                    : OK      microsoft/Phi-4-mini-instruct
Cohere (Aya Expanse 32B based)  = Cohere (Aya Expanse 32B based)  : OK      CohereLabs/aya-expanse-32b                  [default]
DeepSeek V2.5                   = DeepSeek V2.5                   : OK      deepseek-ai/DeepSeek-V2.5
Jamba                           = Jamba                           : OK      ai21labs/Jamba-tiny-dev
Dots                            = Dots                            : OK      rednote-hilab/dots.llm1.inst
RWKV World                      = MISSING MAPPING                 : FAILURE fla-hub/rwkv7-1.5B-world
Mistral (Generic)               = Mistral (Generic)               : OK      mistralai/Mistral-Nemo-Instruct-2407
ChatML (Generic)                = ChatML (Generic)                : OK      NewEden/Gemma-27B-chatml
There were 1 failure(s)!

@kallewoof
Copy link
Author

Some of the tokenizer configs are pretty big, so perhaps it would be warranted to pull out only the chat template and only saving that.

@kallewoof
Copy link
Author

As for RWKV, the search strings require that "rwkv-world" is present, but it is not in the above example. Maybe "rwkv_tokenizer_end_of_text" is better? Not sure how RWKV World differs from other RWKV's, if at all.

@kallewoof
Copy link
Author

I took a stab at a github workflow to make this trigger in PR's when someone modifies AutoGuess.json but I can't test it. I can drop that commit if it seems broken.

@kallewoof kallewoof force-pushed the 202507-autugoess-tests branch from 5f04351 to 01c57d6 Compare July 19, 2025 07:43
@kallewoof
Copy link
Author

kallewoof commented Jul 19, 2025

OK, I was able to get this running at kallewoof#1 -- it's broken atm, but at least I can test it now.

Got it working. Force-pushed complete solution.

@kallewoof kallewoof force-pushed the 202507-autugoess-tests branch from 01c57d6 to 038328e Compare July 19, 2025 09:28
@LostRuins
Copy link
Owner

I don't think we should bundle those tokenizer configs into the repo.

If you're making a workflow for it we can simply download them on demand in the python test script (in fact they don't even need to be stored on disk, just temporarily kept in memory)

@LostRuins LostRuins force-pushed the concedo_experimental branch from 1dbae35 to 15b1034 Compare July 20, 2025 03:10
@kallewoof
Copy link
Author

I guess someone can make a huggingface account and put all the gated tokenizers there? Is that what you were envisioning?

@kallewoof
Copy link
Author

kallewoof commented Jul 20, 2025

I made a github repository and put the gated tokenizer configs there. The workflow now git clones and uses that repo instead. The one drawback with this is that people can't just run the test without first cloning the gated-tokenizers repo.

@kallewoof
Copy link
Author

kallewoof commented Jul 20, 2025

Btw, I know this may seem like a lot of work for this relatively minor feature, but (1) I am hoping to also add a check where the apply_chat_template results are compared to the adapter config (edit: see #1654), which will catch mis-configured adapters, and (2) make this into a de facto standard for use elsewhere, e.g. as a basis for the Silly Tavern chat template derivation.

@kallewoof
Copy link
Author

kallewoof#1 updated for reference.

@kallewoof
Copy link
Author

kallewoof commented Jul 20, 2025

I now have a working follow-up to this in #1654. This is all demonstrated in kallewoof#1:

  • Commit 1: Fixes the search string for RWKV World, currently the only adapter whose search string is failing.
  • Commit 2: Fixes several adapter issues (mostly revolving around whether or not a space or newline goes before or after a token). The exception is RWKV World which required heavy modification. Should I redirect the test model to some other model, perhaps?

With these two commits, the Transformers tokenizer apply_chat_template and the koboldcpp adapters are synced up, with the exception of Mistral V3 and Mistral (Generic), which have wonky system prompt handling which I don't think is worth special-casing.

@LostRuins
Copy link
Owner

alright thanks give me some time ill take a look

@LostRuins
Copy link
Owner

Alright seems to be working fine. As for the RWKV world template I am honestly unsure - it was provided to me by someone else and I did not try it.

I'm fine changing it to rwkv_ or even rwkv (rare enough in normal models) or removing it entirely if there is no clear answer. It does seem like many model makers are just stuffing a chatml template there anyway.

Also #1627

but this is good enough so I think we can merge so I will merge this first

@LostRuins LostRuins merged commit ff8f156 into LostRuins:concedo_experimental Jul 25, 2025
1 check failed
@kallewoof kallewoof deleted the 202507-autugoess-tests branch July 25, 2025 13:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants