AutoGuess tests #1650

kallewoof · 2025-07-19T07:22:45Z

This adds a tests/ folder with a single test_autoguess.py script which will return a zero exit code iff

all AutoGuess.json entries have a mapped real model that uses the suggested template
all mappings map to the corresponding AutoGuess adapter

and exit code 1 otherwise. It also adds a github workflow that runs this on any pull request which touches AutoGuess.json.

The test currently fails the RWKV model only, which is tested against fla-hub/rwkv7-1.5B-world. I started off with the ambition of finding open ungated models for every template (which meant some odd choices for certain models whose official releases are gated), but gave up in favor of a best-effort approach where gated models' tokenizer configs are saved (as is) in an external github repository that is cloned in the github workflow. You may even argue that this should just include all the templates, and esp. if we put this into CI where the tests will repeatedly download them over and over from HF it may be better to simply push them to git instead.

Results so far:

ChatML (Phi 4)                  = ChatML (Phi 4)                  : OK      microsoft/phi-4
ChatML (Qwen 2.5 based)         = ChatML (Qwen 2.5 based)         : OK      Qwen/Qwen2.5-0.5B-Instruct
ChatML (Kimi)                   = ChatML (Kimi)                   : OK      moonshotai/Kimi-K2-Instruct
Google Gemma 2                  = Google Gemma 2                  : OK      Efficient-Large-Model/gemma-2-2b-it
Google Gemma 3                  = Google Gemma 3                  : OK      scb10x/typhoon2.1-gemma3-12b
Google Gemma 3n                 = Google Gemma 3n                 : OK      lmstudio-community/gemma-3n-E4B-it-MLX-bf16
Llama 3.x                       = Llama 3.x                       : OK      Steelskull/L3.3-Shakudo-70b
Llama 4                         = Llama 4                         : OK      meta-llama/Llama-4-Scout-17B-16E-Instruct
Mistral V7 (with system prompt) = Mistral V7 (with system prompt) : OK      Doctor-Shotgun/MS3.2-24B-Magnum-Diamond
Mistral V3                      = Mistral V3                      : OK      mistralai/Mistral-7B-Instruct-v0.3
GLM-4                           = GLM-4                           : OK      THUDM/glm-4-9b-chat-hf
Phi 3.5                         = Phi 3.5                         : OK      microsoft/Phi-3.5-mini-instruct
Phi 4 (mini)                    = Phi 4 (mini)                    : OK      microsoft/Phi-4-mini-instruct
Cohere (Aya Expanse 32B based)  = Cohere (Aya Expanse 32B based)  : OK      CohereLabs/aya-expanse-32b                  [default]
DeepSeek V2.5                   = DeepSeek V2.5                   : OK      deepseek-ai/DeepSeek-V2.5
Jamba                           = Jamba                           : OK      ai21labs/Jamba-tiny-dev
Dots                            = Dots                            : OK      rednote-hilab/dots.llm1.inst
RWKV World                      = MISSING MAPPING                 : FAILURE fla-hub/rwkv7-1.5B-world
Mistral (Generic)               = Mistral (Generic)               : OK      mistralai/Mistral-Nemo-Instruct-2407
ChatML (Generic)                = ChatML (Generic)                : OK      NewEden/Gemma-27B-chatml
There were 1 failure(s)!

kallewoof · 2025-07-19T07:24:33Z

Some of the tokenizer configs are pretty big, so perhaps it would be warranted to pull out only the chat template and only saving that.

kallewoof · 2025-07-19T07:30:54Z

As for RWKV, the search strings require that "rwkv-world" is present, but it is not in the above example. Maybe "rwkv_tokenizer_end_of_text" is better? Not sure how RWKV World differs from other RWKV's, if at all.

kallewoof · 2025-07-19T07:42:46Z

I took a stab at a github workflow to make this trigger in PR's when someone modifies AutoGuess.json but I can't test it. I can drop that commit if it seems broken.

kallewoof · 2025-07-19T09:20:16Z

OK, I was able to get this running at kallewoof#1 -- ~~it's broken atm, but at least I can test it now.~~

Got it working. Force-pushed complete solution.

LostRuins · 2025-07-20T03:00:26Z

I don't think we should bundle those tokenizer configs into the repo.

If you're making a workflow for it we can simply download them on demand in the python test script (in fact they don't even need to be stored on disk, just temporarily kept in memory)

kallewoof · 2025-07-20T05:00:20Z

I guess someone can make a huggingface account and put all the gated tokenizers there? Is that what you were envisioning?

kallewoof · 2025-07-20T05:16:18Z

I made a github repository and put the gated tokenizer configs there. The workflow now git clones and uses that repo instead. The one drawback with this is that people can't just run the test without first cloning the gated-tokenizers repo.

kallewoof · 2025-07-20T05:19:35Z

Btw, I know this may seem like a lot of work for this relatively minor feature, but (1) I am hoping to also add a check where the apply_chat_template results are compared to the adapter config (edit: see #1654), which will catch mis-configured adapters, and (2) make this into a de facto standard for use elsewhere, e.g. as a basis for the Silly Tavern chat template derivation.

kallewoof · 2025-07-20T05:31:51Z

kallewoof#1 updated for reference.

kallewoof · 2025-07-20T13:52:04Z

I now have a working follow-up to this in #1654. This is all demonstrated in kallewoof#1:

Commit 1: Fixes the search string for RWKV World, currently the only adapter whose search string is failing.
Commit 2: Fixes several adapter issues (mostly revolving around whether or not a space or newline goes before or after a token). The exception is RWKV World which required heavy modification. Should I redirect the test model to some other model, perhaps?

With these two commits, the Transformers tokenizer apply_chat_template and the koboldcpp adapters are synced up, with the exception of Mistral V3 and Mistral (Generic), which have wonky system prompt handling which I don't think is worth special-casing.

LostRuins · 2025-07-21T15:45:01Z

alright thanks give me some time ill take a look

LostRuins · 2025-07-25T11:20:50Z

Alright seems to be working fine. As for the RWKV world template I am honestly unsure - it was provided to me by someone else and I did not try it.

I'm fine changing it to rwkv_ or even rwkv (rare enough in normal models) or removing it entirely if there is no clear answer. It does seem like many model makers are just stuffing a chatml template there anyway.

Also #1627

but this is good enough so I think we can merge so I will merge this first

LostRuins and others added 4 commits July 19, 2025 15:08

whitespace

490b13a

AutoGuess remove dot suffix in names

583dbf2

.gitignore update

82bb598

test: added autoguess test suite

a9c4e95

kallewoof force-pushed the 202507-autugoess-tests branch from 5f04351 to 01c57d6 Compare July 19, 2025 07:43

github workflow to run autoguess test when appropriate

038328e

kallewoof force-pushed the 202507-autugoess-tests branch from 01c57d6 to 038328e Compare July 19, 2025 09:28

LostRuins force-pushed the concedo_experimental branch from 1dbae35 to 15b1034 Compare July 20, 2025 03:10

kallewoof added 2 commits July 20, 2025 14:14

git clone unavailable tokenizer configs rather than committing to repo

f0d19ad

fix link to included tokenizer configs

cff95bf

kallewoof added 2 commits July 20, 2025 14:25

skip storing downloaded tokenizer configs

0c1eb9e

typo

819a8b7

kallewoof added 2 commits July 20, 2025 21:19

minor fixes

f910bbc

clean-up

a500c08

kallewoof mentioned this pull request Jul 20, 2025

AutoGuess follow-up: adapter checks #1654

Merged

limit workflow to trigger from experimental branch

4cdb26a

LostRuins merged commit ff8f156 into LostRuins:concedo_experimental Jul 25, 2025
1 check failed

kallewoof deleted the 202507-autugoess-tests branch July 25, 2025 13:17

AutoGuess tests #1650

AutoGuess tests #1650

Uh oh!

Conversation

kallewoof commented Jul 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kallewoof commented Jul 19, 2025

Uh oh!

kallewoof commented Jul 19, 2025

Uh oh!

kallewoof commented Jul 19, 2025

Uh oh!

kallewoof commented Jul 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LostRuins commented Jul 20, 2025

Uh oh!

kallewoof commented Jul 20, 2025

Uh oh!

kallewoof commented Jul 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kallewoof commented Jul 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kallewoof commented Jul 20, 2025

Uh oh!

kallewoof commented Jul 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LostRuins commented Jul 21, 2025

Uh oh!

LostRuins commented Jul 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kallewoof commented Jul 19, 2025 •

edited

Loading

kallewoof commented Jul 19, 2025 •

edited

Loading

kallewoof commented Jul 20, 2025 •

edited

Loading

kallewoof commented Jul 20, 2025 •

edited

Loading

kallewoof commented Jul 20, 2025 •

edited

Loading