-
Notifications
You must be signed in to change notification settings - Fork 12.1k
models/templates: add mistralai/Mistral-Small-3.1-24B-Instruct-2503 template with tool calling support #14148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
models/templates: add mistralai/Mistral-Small-3.1-24B-Instruct-2503 template with tool calling support #14148
Conversation
…emplate with tool calling support
25c463c
to
e539831
Compare
// Mistral-Small-2503 does not have built-in chat template | ||
llama_vocab_pre_type pre_type = model->vocab.get_pre_type(); | ||
if (pre_type == LLAMA_VOCAB_PRE_TYPE_TEKKEN && model->layers.size() == 40) { | ||
return "mistral-v7-tekken"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with this one-off fix is that there's no logic to expand this string to a template. For example when when using llama-server
, this will always cause the prompt to be set to </s>mistral-v7-tekken
if the gguf doesn't have a chat template.
In my specific case (tool calling), I had an a chat template but not a tool calling chat template, resulting in this line always executing and breaking generation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see why this should be removed. Many users run mistral small without --chat-template
and it will now break most use cases
Even with this removed, you still need --jinja --chat-template-file
to make it work correctly
And the worst is, someone will do --jinja --chat-template mistral-v7-tekken
which bring back exactly the same issue.
In short, I against this removal as it make the UX even worse
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ngxson, perhaps I'm missing something, but with this patch (the gguf I'm using does have a chat template):
diff --git a/src/llama-model.cpp b/src/llama-model.cpp
index c64bf9de..a3b6c41b 100644
--- a/src/llama-model.cpp
+++ b/src/llama-model.cpp
@@ -13788,13 +13788,15 @@ const char * llama_model_chat_template(const llama_model * model, const char * n
// Mistral-Small-2503 does not have built-in chat template
llama_vocab_pre_type pre_type = model->vocab.get_pre_type();
if (pre_type == LLAMA_VOCAB_PRE_TYPE_TEKKEN && model->layers.size() == 40) {
+ LLAMA_LOG_WARN("FORCING mistral-v7-tekken because the vocab matches, key=%s\n", key.c_str());
return "mistral-v7-tekken";
}
return nullptr;
}
-
- return it->second.c_str();
+ LLAMA_LOG_WARN("FORCING mistral-v7-tekken because I'm debugging, but key=%s was found\n", key.c_str());
+ return "mistral-v7-tekken";
+ // return it->second.c_str();
}
uint64_t llama_model_n_params(const llama_model * model) {
diff --git a/tools/server/server.cpp b/tools/server/server.cpp
index 1b1cf439..e1e74db6 100644
--- a/tools/server/server.cpp
+++ b/tools/server/server.cpp
@@ -4191,7 +4191,7 @@ int main(int argc, char ** argv) {
const auto & prompt = data.at("prompt");
// TODO: this log can become very long, put it behind a flag or think about a more compact format
- //SRV_DBG("Prompt: %s\n", prompt.is_string() ? prompt.get<std::string>().c_str() : prompt.dump(2).c_str());
+ SRV_INF("Prompt: %s\n", prompt.is_string() ? prompt.get<std::string>().c_str() : prompt.dump(2).c_str());
// process files
mtmd::bitmaps bitmaps;
I get the following logs:
...
FORCING mistral-v7-tekken because I'm debugging, but key=tokenizer.chat_template was found
FORCING mistral-v7-tekken because the vocab matches, key=tokenizer.chat_template.tool_use
Failed to infer a tool call example (possible template bug)
Failed to infer a tool call example (possible template bug)
srv init: initializing slots, n_slots = 1
slot init: id 0 | task -1 | new slot n_ctx_slot = 32768
main: model loaded
main: chat template, chat_template: mistral-v7-tekken, example_format: 'mistral-v7-tekken'
...
Note that the chat template is set to mistral-v7-tekken
, which is wrong.
And if I query the model, I get nonsensical outputs about the tekken game:
> What is 2+2?
Joined: Fri Apr 26, 2019 10:28 am
### Re: [WIP] Tekken 7 Modding Tools
> *Ryochan7 wrote: ↑ Mon May 06, 2019 12:07 pm* I'm not sure if this is the right place to ask this, but I was wondering if there is a way^C
Aborted!
From the logs, since I force-enabled prompt logging:
...
main: model loaded
main: chat template, chat_template: mistral-v7-tekken, example_format: 'mistral-v7-tekken'
main: server is listening on http://0.0.0.0:8000 - starting the main loop
srv update_slots: all slots are idle
srv update_slots: all slots are idle
srv operator(): Prompt: mistral-v7-tekken
srv params_from_: Chat format: Content-only
slot launch_slot_: id 0 | task 1 | processing task
slot update_slots: id 0 | task 1 | new prompt, n_ctx_slot = 32768, n_keep = 0, n_prompt_tokens = 8
slot update_slots: id 0 | task 1 | kv cache rm [0, end)
slot update_slots: id 0 | task 1 | prompt processing progress, n_past = 8, n_tokens = 8, progress = 1.000000
slot update_slots: id 0 | task 1 | prompt done, n_past = 8, n_tokens = 8
srv operator(): Prompt: mistral-v7-tekken <---- the prompt should be "What is 2+2?"
...
You can see that after evaluating the (wrong) template, the prompt is set to mistral-v7-tekken
Summary
This PR adds a tool calling chat template for Mistral-Small-3.1-24B-Instruct-2503 and fixes a bug with a broken chat template for mistral small models.
Details
Trying to run Mistral AI's
Mistral-Small-3.1-24B-Instruct-2503
with no chat template currently results in failure when using tool calling.Starting
llama-server
like so:And executing a query with a tool call results in the prompt being set to
mistral-v7-tekken
(see here) due to how the tool calling template is currently prepared when not present in the gguf.Looking at verbose logs, one can see that the prompt is broken:
Log from the above command with prompt section emphasized
Providing a template with
--chat-template-file
solves the issue:Related: #13398
cc @ngxson