Skip to content

models/templates: add mistralai/Mistral-Small-3.1-24B-Instruct-2503 template with tool calling support #14148

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
132 changes: 132 additions & 0 deletions models/templates/mistralai-Mistral-Small-3.1-24B-Instruct-2503.jinja
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
{%- set today = strftime_now("%Y-%m-%d") %}
{%- set default_system_message = "You are Mistral Small 3, a Large Language Model (LLM) created by Mistral AI, a French startup headquartered in Paris.\
Your knowledge base was last updated on 2023-10-01. The current date is " + today + ".\
\
When you're not sure about some information, you say that you don't have the information and don't make up anything.\
If the user's question is not clear, ambiguous, or does not provide enough context for you to accurately answer the question, you do not try to answer it right away and you rather ask the user to clarify their request (e.g. \"What are some good restaurants around me?\" => \"Where are you?\" or \"When is the next flight to Tokyo\" => \"Where do you travel from?\")" %}

{{- bos_token }}

{%- if messages[0]['role'] == 'system' %}
{%- if messages[0]['content'] is string %}
{%- set system_message = messages[0]['content'] %}
{%- set loop_messages = messages[1:] %}
{%- else %}
{%- set system_message = messages[0]['content'][0]['text'] %}
{%- set loop_messages = messages[1:] %}
{%- endif %}
{%- else %}
{%- set system_message = default_system_message %}
{%- set loop_messages = messages %}
{%- endif %}
{%- if not tools is defined %}
{%- set tools = none %}
{%- elif tools is not none %}
{%- set parallel_tool_prompt = "You are a helpful assistant that can call tools. If you call one or more tools, format them in a single JSON array or objects, where each object is a tool call, not as separate objects outside of an array or multiple arrays. Use the format [{\"name\": tool call name, \"arguments\": tool call arguments}, additional tool calls] if you call more than one tool. If you call tools, do not attempt to interpret them or otherwise provide a response until you receive a tool call result that you can interpret for the user." %}
{%- if system_message is defined %}
{%- set system_message = parallel_tool_prompt + "\
\
" + system_message %}
{%- else %}
{%- set system_message = parallel_tool_prompt %}
{%- endif %}
{%- endif %}
{{- '[SYSTEM_PROMPT]' + system_message + '[/SYSTEM_PROMPT]' }}

{%- set user_messages = loop_messages | selectattr("role", "equalto", "user") | list %}

{%- set filtered_messages = [] %}
{%- for message in loop_messages %}
{%- if message["role"] not in ["tool", "tool_results"] and not message.get("tool_calls") %}
{%- set filtered_messages = filtered_messages + [message] %}
{%- endif %}
{%- endfor %}

{%- for message in filtered_messages %}
{%- if (message["role"] == "user") != (loop.index0 % 2 == 0) %}
{{- raise_exception("After the optional system message, conversation roles must alternate user/assistant/user/assistant/...") }}
{%- endif %}
{%- endfor %}

{%- for message in loop_messages %}
{%- if message["role"] == "user" %}
{%- if tools is not none and (message == user_messages[-1]) %}
{{- "[AVAILABLE_TOOLS] [" }}
{%- for tool in tools %}
{%- set tool = tool.function %}
{{- '{"type": "function", "function": {' }}
{%- for key, val in tool.items() if key != "return" %}
{%- if val is string %}
{{- '"' + key + '": "' + val + '"' }}
{%- else %}
{{- '"' + key + '": ' + val|tojson }}
{%- endif %}
{%- if not loop.last %}
{{- ", " }}
{%- endif %}
{%- endfor %}
{{- "}}" }}
{%- if not loop.last %}
{{- ", " }}
{%- else %}
{{- "]" }}
{%- endif %}
{%- endfor %}
{{- "[/AVAILABLE_TOOLS]" }}
{%- endif %}
{%- if message['content'] is string %}
{{- '[INST]' + message['content'] + '[/INST]' }}
{%- else %}
{{- '[INST]' }}
{%- for block in message['content'] %}
{%- if block['type'] == 'text' %}
{{- block['text'] }}
{%- elif block['type'] == 'image' or block['type'] == 'image_url' %}
{{- '[IMG]' }}
{%- else %}
{{- raise_exception('Only text and image blocks are supported in message content!') }}
{%- endif %}
{%- endfor %}
{{- '[/INST]' }}
{%- endif %}
{%- elif message["role"] == "tool_calls" or message.tool_calls is defined %}
{%- if message.tool_calls is defined %}
{%- set tool_calls = message.tool_calls %}
{%- else %}
{%- set tool_calls = message.content %}
{%- endif %}
{{- "[TOOL_CALLS] [" }}
{%- for tool_call in tool_calls %}
{%- set out = tool_call.function|tojson %}
{{- out[:-1] }}
{%- if not tool_call.id is defined or tool_call.id|length < 9 %}
{{- raise_exception("Tool call IDs should be alphanumeric strings with length >= 9! (1)" + tool_call.id) }}
{%- endif %}
{{- ', "id": "' + tool_call.id[-9:] + '"}' }}
{%- if not loop.last %}
{{- ", " }}
{%- else %}
{{- "]" + eos_token }}
{%- endif %}
{%- endfor %}
{%- elif message['role'] == 'assistant' %}
{%- if message['content'] is string %}
{{- message['content'] + eos_token }}
{%- else %}
{{- message['content'][0]['text'] + eos_token }}
{%- endif %}
{%- elif message["role"] == "tool_results" or message["role"] == "tool" %}
{%- if message.content is defined and message.content.content is defined %}
{%- set content = message.content.content %}
{%- else %}
{%- set content = message.content %}
{%- endif %}
{{- '[TOOL_RESULTS] {"content": ' + content|string + ", " }}
{%- if not message.tool_call_id is defined or message.tool_call_id|length < 9 %}
{{- raise_exception("Tool call IDs should be alphanumeric strings with length >= 9! (2)" + message.tool_call_id) }}
{%- endif %}
{{- '"call_id": "' + message.tool_call_id[-9:] + '"}[/TOOL_RESULTS]' }}
{%- else %}
{{- raise_exception("Only user and assistant roles are supported, with the exception of an initial optional system message!") }}
{%- endif %}
{%- endfor %}
9 changes: 1 addition & 8 deletions src/llama-model.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -13783,14 +13783,7 @@ const char * llama_model_chat_template(const llama_model * model, const char * n
: LLM_KV(model->arch)(LLM_KV_TOKENIZER_CHAT_TEMPLATE);
const auto & it = model->gguf_kv.find(key);
if (it == model->gguf_kv.end()) {
// one-off fix for very popular models (so we are not flooded with issues)
// do not extend this list unless absolutely necessary
// Mistral-Small-2503 does not have built-in chat template
llama_vocab_pre_type pre_type = model->vocab.get_pre_type();
if (pre_type == LLAMA_VOCAB_PRE_TYPE_TEKKEN && model->layers.size() == 40) {
return "mistral-v7-tekken";
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with this one-off fix is that there's no logic to expand this string to a template. For example when when using llama-server, this will always cause the prompt to be set to </s>mistral-v7-tekken if the gguf doesn't have a chat template.

In my specific case (tool calling), I had an a chat template but not a tool calling chat template, resulting in this line always executing and breaking generation.

Copy link
Collaborator

@ngxson ngxson Jun 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why this should be removed. Many users run mistral small without --chat-template and it will now break most use cases

Even with this removed, you still need --jinja --chat-template-file to make it work correctly

And the worst is, someone will do --jinja --chat-template mistral-v7-tekken which bring back exactly the same issue.

In short, I against this removal as it make the UX even worse

Copy link
Author

@bretello bretello Jun 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ngxson, perhaps I'm missing something, but with this patch (the gguf I'm using does have a chat template):

diff --git a/src/llama-model.cpp b/src/llama-model.cpp
index c64bf9de..a3b6c41b 100644
--- a/src/llama-model.cpp
+++ b/src/llama-model.cpp
@@ -13788,13 +13788,15 @@ const char * llama_model_chat_template(const llama_model * model, const char * n
         // Mistral-Small-2503 does not have built-in chat template
         llama_vocab_pre_type pre_type = model->vocab.get_pre_type();
         if (pre_type == LLAMA_VOCAB_PRE_TYPE_TEKKEN && model->layers.size() == 40) {
+            LLAMA_LOG_WARN("FORCING mistral-v7-tekken because the vocab matches, key=%s\n", key.c_str());
             return "mistral-v7-tekken";
         }
 
         return nullptr;
     }
-
-    return it->second.c_str();
+    LLAMA_LOG_WARN("FORCING mistral-v7-tekken because I'm debugging, but key=%s was found\n", key.c_str());
+    return "mistral-v7-tekken";
+    // return it->second.c_str();
 }
 
 uint64_t llama_model_n_params(const llama_model * model) {
diff --git a/tools/server/server.cpp b/tools/server/server.cpp
index 1b1cf439..e1e74db6 100644
--- a/tools/server/server.cpp
+++ b/tools/server/server.cpp
@@ -4191,7 +4191,7 @@ int main(int argc, char ** argv) {
 
             const auto & prompt = data.at("prompt");
             // TODO: this log can become very long, put it behind a flag or think about a more compact format
-            //SRV_DBG("Prompt: %s\n", prompt.is_string() ? prompt.get<std::string>().c_str() : prompt.dump(2).c_str());
+            SRV_INF("Prompt: %s\n", prompt.is_string() ? prompt.get<std::string>().c_str() : prompt.dump(2).c_str());
 
             // process files
             mtmd::bitmaps bitmaps;

I get the following logs:

...
FORCING mistral-v7-tekken because I'm debugging, but key=tokenizer.chat_template was found
FORCING mistral-v7-tekken because the vocab matches, key=tokenizer.chat_template.tool_use
Failed to infer a tool call example (possible template bug)
Failed to infer a tool call example (possible template bug)
srv          init: initializing slots, n_slots = 1
slot         init: id  0 | task -1 | new slot n_ctx_slot = 32768
main: model loaded
main: chat template, chat_template: mistral-v7-tekken, example_format: 'mistral-v7-tekken'
...

Note that the chat template is set to mistral-v7-tekken, which is wrong.

And if I query the model, I get nonsensical outputs about the tekken game:

> What is 2+2?

    Joined: Fri Apr 26, 2019 10:28 am

### Re: [WIP] Tekken 7 Modding Tools

> *Ryochan7 wrote: ↑ Mon May 06, 2019 12:07 pm* I'm not sure if this is the right place to ask this, but I was wondering if there is a way^C
Aborted!

From the logs, since I force-enabled prompt logging:

...
main: model loaded
main: chat template, chat_template: mistral-v7-tekken, example_format: 'mistral-v7-tekken'
main: server is listening on http://0.0.0.0:8000 - starting the main loop
srv  update_slots: all slots are idle
srv  update_slots: all slots are idle
srv    operator(): Prompt: mistral-v7-tekken
srv  params_from_: Chat format: Content-only
slot launch_slot_: id  0 | task 1 | processing task
slot update_slots: id  0 | task 1 | new prompt, n_ctx_slot = 32768, n_keep = 0, n_prompt_tokens = 8
slot update_slots: id  0 | task 1 | kv cache rm [0, end)
slot update_slots: id  0 | task 1 | prompt processing progress, n_past = 8, n_tokens = 8, progress = 1.000000
slot update_slots: id  0 | task 1 | prompt done, n_past = 8, n_tokens = 8
srv    operator(): Prompt: mistral-v7-tekken  <---- the prompt should be "What is 2+2?"
...

You can see that after evaluating the (wrong) template, the prompt is set to mistral-v7-tekken

}

LLAMA_LOG_WARN("llama_model_chat_template: Couldn't find chat template (tried key: %s).\n", key.c_str());
return nullptr;
}

Expand Down