major improvements to tool calling logic #1839

Rose22 · 2025-11-07T17:47:09Z

as we've discussed in discord,

this change alters the way koboldcpp determines what tool to use in some pretty drastic ways that vastly improve it's accuracy, especially with small LLM's. instead of doing one request to the LLM to prompt it and ask it if a tool should be used, and forcing it down to 5 tokens with grammar forcing a simple "yes/no" answer.. it now gives the LLM full freedom to write out it's decision and why it took that decision, with it's final decision text always added at the end of the response. then we take that response and use the yes/no grammar on that instead!

This reverts commit 70902ed.

…asoning

…front and center for the LLM so that it doesn't repeat tool calls from earlier in the conversation. it still feeds the last 6 turns of chat history to the llm to enable followups

LostRuins · 2025-11-08T02:28:37Z

thanks. it will take me some time to review as i am waiting to get my laptop back

henk717 · 2025-11-09T19:56:48Z

Function wise it performs better than our current implementation, speed wise for me its very minor it may effect people with slow PP / gen times more. But those people can have an escape in the upcoming jinja implementation.

I'd say we can merge this if you review the code.

Rose22 · 2025-11-09T22:51:18Z

Function wise it performs better than our current implementation, speed wise for me its very minor it may effect people with slow PP / gen times more. But those people can have an escape in the upcoming jinja implementation.

I'd say we can merge this if you review the code.

yup, can definitely confirm that it's affected by slow PP. but it's a lot better than the old way of doing it! with the old way, my 4b qwen3 model was all too eager to run tools. even just saying a simple "hi" to it triggered a tool call!

i'm glad you guys are implementing jinja as well though! in my opinion thats superior for toolcalling, because it just works as it should, in the way the llamacpp server already does. but at least with this change the native toolcalling in kobold isn't as eager to randomly run tools, and more accurately chooses which tools to run

looking forward to the code review!

LostRuins · 2025-11-11T09:05:05Z

koboldcpp.py

                    for name in toolnames:
                        pollgrammar += ("" if pollgrammar=="" else " | ")
                        pollgrammar += "\"" + name + "\""
-                    pollgrammar += " | \"no_tool\""


why remove the null tool? seems like its still a good idea to have even if the llm knows that a tool is needed. it gives it a second chance to change its mind.

because in testing this often resulted in toolcalls that i explicitely asked for being cancelled. best to just not interfere with its final reasoning..

Formerly the null tool was to give the LLM an out they would go for, rosie's PR instead does this with a reasoning step. Same concept different implementation.

… testing. refer to PR.

LostRuins · 2025-11-21T11:48:28Z

Consolidated testing results:

Two step Summarization pass before using yes/no:
- Agreed, your new method seems like a better approach, we can try it
Removal of escape clause tool:
- Agreed, you're right we can probably remove it after improving the tool determination logic
Reducing tool complexity, pass only the essential tool call information
- Agreed, we can try this
My own mistake: Not actually passing all the function call history per turn, only the first.
- Need to pass the full history of function calls, along with their call IDs so the AI can cross reference the correct results
Mandating messages field messages = genparams.get("messages") (using newest 6 messages)
- Not ideal, can't be used in non chat completions mode
- Loss of contextual information when referencing older content (e.g. memory, older turns)
- Only focuses on most recent user message (which may not contain enough information about the tool to use e.g. "yes please find it for me"
- Breaks context shifting
  Solution: Retain current chat history in curr_ctx as a prefix first. Still perform the 2 passes of reasoning -> final answer
Reworded tool prompt: If user's request was to generate any kind of non-text media, no further action is needed and the answer should be no, regardless of what the tool call response was.
- Not an ideal solution as this breaks all multi-pass tools that deal with media, and adds clutter if images are not involved
- After a deeper dive into some of your original prompts, I know why the original payload was generating images infinitely. Right now it's simply not recognizing when an image was returned as a result of a tool call. Here's a broken down example, the request->context was lost. I can probably refine the tool response prompts.

[
		{
			"role": "system",
			"content": "Write AI's next reply in a chat on an instant messaging app between AI and Rosa. Keep replies to one or two sentences!"
		},
		{
			"role": "system",
			"content": "[Start a new Chat]"
		},
		{
			"role": "user",
			"content": "hi!"
		},
		{
			"role": "assistant",
			"tool_calls": [
				{
					"id": "call_58784",
					"type": "function",
					"function": {
						"arguments": "{\"prompt\": \"A vibrant sunset over a coastal city, with tall skyscrapers reflecting the golden light on the water, people walking on the pier, and warm hues of orange, pink, and purple blending into the sky. The scene is detailed, dynamic, and feels peaceful and alive, with soft wind-blown grass and a few sailboats gently moving in the distance.\"}",
						"name": "GenerateImage"
					}
				}
			]
		},
		{
			"role": "tool",
			"content": "/user/images/AI/[email protected]",
			"tool_call_id": "call_58784"
		}
	]

your current prompt unwrapping:

'User\'s request: hi!

Tool call responses: [\'/user/images/AI/[email protected]\']

Tool List:
[
{
"name": "GenerateImage",
"description": "Generate an image from a given text prompt. Use when a user asks to generate an image, imagine a concept or an item, send a picture of a scene, a selfie, etc.",
"properties": {
"prompt": "string"
}
}
]

If user\'s request was to generate any kind of non-text media, no further action is needed and the answer should be no, regardless of what the tool call response was. Otherwise, given the tool call response to the user\'s request, is another tool call needed to further answer user\'s message? State your final decision at the end. Don\'t use emojis.
### Response:
'

reworked prompt unwrap:

'<|im_start|>system
Write AI\'s next reply in a chat on an instant messaging app between AI and Rosa. Keep replies to one or two sentences!<|im_end|>
<|im_start|>system
[Start a new Chat]<|im_end|>
<|im_start|>user
hi!<|im_end|>
<|im_start|>assistant

(Made a function call call_58784 to GenerateImage with arguments={"prompt": "A vibrant sunset over a coastal city, with tall skyscrapers reflecting the golden light on the water, people walking on the pier, and warm hues of orange, pink, and purple blending into the sky. The scene is detailed, dynamic, and feels peaceful and alive, with soft wind-blown grass and a few sailboats gently moving in the distance."})
<|im_end|>

Received results of function call call_58784:
/user/images/AI/[email protected]<|im_start|>assistant

(Made a function call call_27009 to GenerateImage with arguments={"prompt": "A cozy, sunlit living room with a plush sofa, a coffee table filled with books and a plant, a window looking out to a garden, and soft warm lighting from lamps and a ceiling fixture."})
<|im_end|>

Received results of function call call_27009:
/user/images/AI/[email protected]

Tool List:
[
{
"name": "GenerateImage",
"description": "Generate an image from a given text prompt. Use when a user asks to generate an image, imagine a concept or an item, send a picture of a scene, a selfie, etc.",
"properties": {
"prompt": "string"
}
}
]

AI reasoning: No, no further tool calls are needed

So final decision, did the AI decide that a tool call is required? (one word answer: yes or no):'

and the AI replied:
Hey Rosa! I generated two images for you — one of a sunset over a coastal city and another of a cozy living room. Let me know if you'd like to see more! 🌅🛋️

LostRuins · 2025-11-21T11:49:47Z

really sorry for the huge delay, i was caught up with laptop repairs and troubleshooting. I've made some tweaks to the PR and added some comments, do you think you could give it a try and let me know if it works well for you?

Rose22 · 2025-11-21T13:53:34Z

it's okay. sure, i'll try. i'm sorry for messing up the git branch, hope all the effort will be worth it! please check out the new (hopefully fixed) pull request and add your changes to it

rosa and others added 13 commits November 7, 2025 18:39

improvements to tool calling logic

70902ed

Revert "improvements to tool calling logic"

1452960

This reverts commit 70902ed.

major improvements to tool calling logic

72a96fa

major improvements to tool calling logic

a86080d

fixed toolcall choosing logic

7329347

fixed toolcall choosing logic even more

0c69dfd

removed concedo's toolcall escape since it clashes with my prompting

b8a4a08

improved tool call choice logic by taking into account LLM's prior re…

afba8bd

…asoning

improved interpreting of first decision

f0b32b3

improved tool call decision logic by putting the user's LAST message …

86b8317

…front and center for the LLM so that it doesn't repeat tool calls from earlier in the conversation. it still feeds the last 6 turns of chat history to the llm to enable followups

fixed small mistake

7bf8ffc

fixed formatting issue

5a9347d

Merge branch 'LostRuins:concedo_experimental' into concedo_experimental

58f37f4

LostRuins added enhancement New feature or request good first issue Good for newcomers labels Nov 8, 2025

LostRuins force-pushed the concedo_experimental branch from b0f20ae to 055fdce Compare November 8, 2025 13:53

LostRuins and others added 2 commits November 10, 2025 18:13

linting

994427d

Merge branch 'LostRuins:concedo_experimental' into concedo_experimental

ba20f9f

LostRuins force-pushed the concedo_experimental branch from 994427d to cdc18f0 Compare November 10, 2025 12:54

Rose22 added 3 commits November 10, 2025 13:11

Merge branch 'LostRuins:concedo_experimental' into concedo_experimental

44d0166

Merge branch 'LostRuins:concedo_experimental' into concedo_experimental

1794c70

Merge branch 'LostRuins:concedo_experimental' into concedo_experimental

9a4a4d0

LostRuins reviewed Nov 11, 2025

View reviewed changes

Rose22 and others added 3 commits November 11, 2025 18:03

Merge branch 'LostRuins:concedo_experimental' into concedo_experimental

459b39b

Merge branch 'LostRuins:concedo_experimental' into concedo_experimental

edba5ae

added some tweaks for improved tool calls to reuse old ctx, but needs…

47c4809

… testing. refer to PR.

Rose22 closed this Nov 21, 2025

Rose22 deleted the concedo_experimental branch November 21, 2025 13:58

Rose22 restored the concedo_experimental branch November 21, 2025 14:02

Rose22 deleted the concedo_experimental branch November 21, 2025 14:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

major improvements to tool calling logic #1839

major improvements to tool calling logic #1839

Uh oh!

Rose22 commented Nov 7, 2025

Uh oh!

LostRuins commented Nov 8, 2025

Uh oh!

henk717 commented Nov 9, 2025

Uh oh!

Rose22 commented Nov 9, 2025

Uh oh!

LostRuins Nov 11, 2025

Uh oh!

Rose22 Nov 11, 2025

Uh oh!

henk717 Nov 11, 2025

Uh oh!

LostRuins commented Nov 21, 2025

Uh oh!

LostRuins commented Nov 21, 2025

Uh oh!

Rose22 commented Nov 21, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

major improvements to tool calling logic #1839

major improvements to tool calling logic #1839

Uh oh!

Conversation

Rose22 commented Nov 7, 2025

Uh oh!

LostRuins commented Nov 8, 2025

Uh oh!

henk717 commented Nov 9, 2025

Uh oh!

Rose22 commented Nov 9, 2025

Uh oh!

LostRuins Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

Rose22 Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

henk717 Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

LostRuins commented Nov 21, 2025

Uh oh!

LostRuins commented Nov 21, 2025

Uh oh!

Rose22 commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Rose22 commented Nov 21, 2025 •

edited

Loading