models/templates: add mistralai/Mistral-Small-3.1-24B-Instruct-2503 template with tool calling support #14148

bretello · 2025-06-12T11:05:14Z

Summary

This PR adds a tool calling chat template for Mistral-Small-3.1-24B-Instruct-2503 and fixes a bug with a broken chat template for mistral small models.

Details

Trying to run Mistral AI's Mistral-Small-3.1-24B-Instruct-2503 with no chat template currently results in failure when using tool calling.

Starting llama-server like so:

./build/bin/llama-server -m mistral-small-3.1-24b-instruct-2503.gguf \
    --n-gpu-layers -1 \
    --host 0.0.0.0 --port 8000 \
    --ctx-size 0 --temp 0.15 \
    --jinja \
    --verbose

And executing a query with a tool call results in the prompt being set to mistral-v7-tekken (see here) due to how the tool calling template is currently prepared when not present in the gguf.

Looking at verbose logs, one can see that the prompt is broken:

Log from the above command with prompt section emphasized

slot launch_slot_: id  0 | task 1 | launching slot : {"id":0,"id_task":1,"n_ctx":32768,"speculative":false,"is_processing":false,"non_causal":false,"params":{"n_predict":-1,"seed":4294967295,"temperature":0.15000000596046448,"dynatemp_range":0.0,"dynatemp_exponent":1.0,"top_k":40,"top_p":0.949999988079071,"min_p":0.05000000074505806,"top_n_sigma"
:-1.0,"xtc_probability":0.0,"xtc_threshold":0.10000000149011612,"typical_p":1.0,"repeat_last_n":64,"repeat_penalty":1.0,"presence_penalty":0.0,"frequency_penalty":0.0,"dry_multiplier":0.0,"dry_base":1.75,"dry_allowed_length":2,"dry_penalty_last_n":32768,"dry_sequence_breakers":["\n",":","\"","*"],"mirostat":0,"mirostat_tau":5.0,"mirostat_eta":0.1
0000000149011612,"stop":[],"max_tokens":-1,"n_keep":0,"n_discard":0,"ignore_eos":false,"stream":false,"logit_bias":[],"n_probs":0,"min_keep":0,"grammar":"alternative-0 ::= \"{\" space alternative-0-tool-call-kv \"}\" space\nalternative-0-tool-call ::= \"{\" space alternative-0-tool-call-name-kv \",\" space alternative-0-tool-call-arguments-kv \"}
\" space\nalternative-0-tool-call-arguments ::= \"{\" space alternative-0-tool-call-arguments-name-kv \"}\" space\nalternative-0-tool-call-arguments-kv ::= \"\\\"arguments\\\"\" space \":\" space alternative-0-tool-call-arguments\nalternative-0-tool-call-arguments-name-kv ::= \"\\\"name\\\"\" space \":\" space string\nalternative-0-tool-call-kv :
:= \"\\\"tool_call\\\"\" space \":\" space alternative-0-tool-call\nalternative-0-tool-call-name ::= \"\\\"Man\\\"\" space\nalternative-0-tool-call-name-kv ::= \"\\\"name\\\"\" space \":\" space alternative-0-tool-call-name\nalternative-1 ::= \"{\" space alternative-1-response-kv \"}\" space\nalternative-1-response-kv ::= \"\\\"response\\\"\" spa
ce \":\" space string\nchar ::= [^\"\\\\\\x7F\\x00-\\x1F] | [\\\\] ([\"\\\\bfnrt] | \"u\" [0-9a-fA-F]{4})\nroot ::= alternative-0 | alternative-1\nspace ::= | \" \" | \"\\n\"{1,2} [ \\t]{0,20}\nstring ::= \"\\\"\" char* \"\\\"\" space\n","grammar_lazy":false,"grammar_triggers":[],"preserved_tokens":[],"chat_format":"Generic","reasoning_format":"d
eepseek","reasoning_in_content":false,"thinking_forced_open":false,"samplers":["penalties","dry","top_n_sigma","top_k","typ_p","top_p","min_p","xtc","temperature"],"speculative.n_max":16,"speculative.n_min":0,"speculative.p_min":0.75,"timings_per_token":false,"post_sampling_probs":false,"lora":[]},

"prompt":"<s>mistral-v7-tekken",

"next_token":{"has_next_token":true,"has_new_line":false,"n_remain":-1,"n_decoded":0,"stopping_word":""}}

Providing a template with --chat-template-file solves the issue:

./build/bin/llama-server -m mistral-small-3.1-24b-instruct-2503.gguf \
    --n-gpu-layers -1 \
    --host 0.0.0.0 --port 8000 \
    --ctx-size 0 --temp 0.15 \
    --jinja --chat-template-file models/templates/mistralai-Mistral-Small-3.1-24B-Instruct-2503.jinja

Related: #13398

cc @ngxson

…emplate with tool calling support

… Small

bretello · 2025-06-12T15:01:20Z

src/llama-model.cpp

-        // Mistral-Small-2503 does not have built-in chat template
-        llama_vocab_pre_type pre_type = model->vocab.get_pre_type();
-        if (pre_type == LLAMA_VOCAB_PRE_TYPE_TEKKEN && model->layers.size() == 40) {
-            return "mistral-v7-tekken";


The problem with this one-off fix is that there's no logic to expand this string to a template. For example when when using llama-server, this will always cause the prompt to be set to </s>mistral-v7-tekken if the gguf doesn't have a chat template.

In my specific case (tool calling), I had an a chat template but not a tool calling chat template, resulting in this line always executing and breaking generation.

I don't see why this should be removed. Many users run mistral small without --chat-template and it will now break most use cases

Even with this removed, you still need --jinja --chat-template-file to make it work correctly

And the worst is, someone will do --jinja --chat-template mistral-v7-tekken which bring back exactly the same issue.

In short, I against this removal as it make the UX even worse

Thanks @ngxson, perhaps I'm missing something, but with this patch (the gguf I'm using does have a chat template):

diff --git a/src/llama-model.cpp b/src/llama-model.cpp index c64bf9de..a3b6c41b 100644 --- a/src/llama-model.cpp +++ b/src/llama-model.cpp @@ -13788,13 +13788,15 @@ const char * llama_model_chat_template(const llama_model * model, const char * n // Mistral-Small-2503 does not have built-in chat template llama_vocab_pre_type pre_type = model->vocab.get_pre_type(); if (pre_type == LLAMA_VOCAB_PRE_TYPE_TEKKEN && model->layers.size() == 40) { + LLAMA_LOG_WARN("FORCING mistral-v7-tekken because the vocab matches, key=%s\n", key.c_str()); return "mistral-v7-tekken"; } return nullptr; } - - return it->second.c_str(); + LLAMA_LOG_WARN("FORCING mistral-v7-tekken because I'm debugging, but key=%s was found\n", key.c_str()); + return "mistral-v7-tekken"; + // return it->second.c_str(); } uint64_t llama_model_n_params(const llama_model * model) { diff --git a/tools/server/server.cpp b/tools/server/server.cpp index 1b1cf439..e1e74db6 100644 --- a/tools/server/server.cpp +++ b/tools/server/server.cpp @@ -4191,7 +4191,7 @@ int main(int argc, char ** argv) { const auto & prompt = data.at("prompt"); // TODO: this log can become very long, put it behind a flag or think about a more compact format - //SRV_DBG("Prompt: %s\n", prompt.is_string() ? prompt.get<std::string>().c_str() : prompt.dump(2).c_str()); + SRV_INF("Prompt: %s\n", prompt.is_string() ? prompt.get<std::string>().c_str() : prompt.dump(2).c_str()); // process files mtmd::bitmaps bitmaps;

I get the following logs:

... FORCING mistral-v7-tekken because I'm debugging, but key=tokenizer.chat_template was found FORCING mistral-v7-tekken because the vocab matches, key=tokenizer.chat_template.tool_use Failed to infer a tool call example (possible template bug) Failed to infer a tool call example (possible template bug) srv init: initializing slots, n_slots = 1 slot init: id 0 | task -1 | new slot n_ctx_slot = 32768 main: model loaded main: chat template, chat_template: mistral-v7-tekken, example_format: 'mistral-v7-tekken' ...

Note that the chat template is set to mistral-v7-tekken, which is wrong.

And if I query the model, I get nonsensical outputs about the tekken game:

> What is 2+2? Joined: Fri Apr 26, 2019 10:28 am ### Re: [WIP] Tekken 7 Modding Tools > *Ryochan7 wrote: ↑ Mon May 06, 2019 12:07 pm* I'm not sure if this is the right place to ask this, but I was wondering if there is a way^C Aborted!

From the logs, since I force-enabled prompt logging:

... main: model loaded main: chat template, chat_template: mistral-v7-tekken, example_format: 'mistral-v7-tekken' main: server is listening on http://0.0.0.0:8000 - starting the main loop srv update_slots: all slots are idle srv update_slots: all slots are idle srv operator(): Prompt: mistral-v7-tekken srv params_from_: Chat format: Content-only slot launch_slot_: id 0 | task 1 | processing task slot update_slots: id 0 | task 1 | new prompt, n_ctx_slot = 32768, n_keep = 0, n_prompt_tokens = 8 slot update_slots: id 0 | task 1 | kv cache rm [0, end) slot update_slots: id 0 | task 1 | prompt processing progress, n_past = 8, n_tokens = 8, progress = 1.000000 slot update_slots: id 0 | task 1 | prompt done, n_past = 8, n_tokens = 8 srv operator(): Prompt: mistral-v7-tekken <---- the prompt should be "What is 2+2?" ...

You can see that after evaluating the (wrong) template, the prompt is set to mistral-v7-tekken

bretello added 2 commits June 12, 2025 12:44

models/templates: add mistralai/Mistral-Small-3.1-24B-Instruct-2503 t…

9190da2

…emplate with tool calling support

llama-model: fix broken tool calling prompt mistral-v7-tekken/Mistral…

e539831

… Small

bretello force-pushed the add-mistral-small-chat-template branch from 25c463c to e539831 Compare June 12, 2025 14:52

bretello commented Jun 12, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

models/templates: add mistralai/Mistral-Small-3.1-24B-Instruct-2503 template with tool calling support #14148

models/templates: add mistralai/Mistral-Small-3.1-24B-Instruct-2503 template with tool calling support #14148

Uh oh!

bretello commented Jun 12, 2025 •

edited

Loading

Uh oh!

bretello Jun 12, 2025

Uh oh!

ngxson Jun 12, 2025 •

edited

Loading

Uh oh!

bretello Jun 13, 2025 •

edited

Loading

Uh oh!

Uh oh!

models/templates: add mistralai/Mistral-Small-3.1-24B-Instruct-2503 template with tool calling support #14148

Are you sure you want to change the base?

models/templates: add mistralai/Mistral-Small-3.1-24B-Instruct-2503 template with tool calling support #14148

Uh oh!

Conversation

bretello commented Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Details

Uh oh!

bretello Jun 12, 2025

Choose a reason for hiding this comment

Uh oh!

ngxson Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bretello Jun 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bretello commented Jun 12, 2025 •

edited

Loading

ngxson Jun 12, 2025 •

edited

Loading

bretello Jun 13, 2025 •

edited

Loading