Skip to content

Conversation

@Rose22
Copy link

@Rose22 Rose22 commented Nov 7, 2025

as we've discussed in discord,

this change alters the way koboldcpp determines what tool to use in some pretty drastic ways that vastly improve it's accuracy, especially with small LLM's. instead of doing one request to the LLM to prompt it and ask it if a tool should be used, and forcing it down to 5 tokens with grammar forcing a simple "yes/no" answer.. it now gives the LLM full freedom to write out it's decision and why it took that decision, with it's final decision text always added at the end of the response. then we take that response and use the yes/no grammar on that instead!

@LostRuins LostRuins added enhancement New feature or request good first issue Good for newcomers labels Nov 8, 2025
@LostRuins
Copy link
Owner

thanks. it will take me some time to review as i am waiting to get my laptop back

@LostRuins LostRuins force-pushed the concedo_experimental branch from b0f20ae to 055fdce Compare November 8, 2025 13:53
@henk717
Copy link
Collaborator

henk717 commented Nov 9, 2025

Function wise it performs better than our current implementation, speed wise for me its very minor it may effect people with slow PP / gen times more. But those people can have an escape in the upcoming jinja implementation.

I'd say we can merge this if you review the code.

@Rose22
Copy link
Author

Rose22 commented Nov 9, 2025

Function wise it performs better than our current implementation, speed wise for me its very minor it may effect people with slow PP / gen times more. But those people can have an escape in the upcoming jinja implementation.

I'd say we can merge this if you review the code.

yup, can definitely confirm that it's affected by slow PP. but it's a lot better than the old way of doing it! with the old way, my 4b qwen3 model was all too eager to run tools. even just saying a simple "hi" to it triggered a tool call!

i'm glad you guys are implementing jinja as well though! in my opinion thats superior for toolcalling, because it just works as it should, in the way the llamacpp server already does. but at least with this change the native toolcalling in kobold isn't as eager to randomly run tools, and more accurately chooses which tools to run

looking forward to the code review!

@LostRuins LostRuins force-pushed the concedo_experimental branch from 994427d to cdc18f0 Compare November 10, 2025 12:54
for name in toolnames:
pollgrammar += ("" if pollgrammar=="" else " | ")
pollgrammar += "\"" + name + "\""
pollgrammar += " | \"no_tool\""
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why remove the null tool? seems like its still a good idea to have even if the llm knows that a tool is needed. it gives it a second chance to change its mind.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because in testing this often resulted in toolcalls that i explicitely asked for being cancelled. best to just not interfere with its final reasoning..

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formerly the null tool was to give the LLM an out they would go for, rosie's PR instead does this with a reasoning step. Same concept different implementation.

@LostRuins
Copy link
Owner

Consolidated testing results:

  • Two step Summarization pass before using yes/no:

    • Agreed, your new method seems like a better approach, we can try it
  • Removal of escape clause tool:

    • Agreed, you're right we can probably remove it after improving the tool determination logic
  • Reducing tool complexity, pass only the essential tool call information

    • Agreed, we can try this
  • My own mistake: Not actually passing all the function call history per turn, only the first.

    • Need to pass the full history of function calls, along with their call IDs so the AI can cross reference the correct results
  • Mandating messages field messages = genparams.get("messages") (using newest 6 messages)

    • Not ideal, can't be used in non chat completions mode
    • Loss of contextual information when referencing older content (e.g. memory, older turns)
    • Only focuses on most recent user message (which may not contain enough information about the tool to use e.g. "yes please find it for me"
    • Breaks context shifting
      Solution: Retain current chat history in curr_ctx as a prefix first. Still perform the 2 passes of reasoning -> final answer
  • Reworded tool prompt: If user's request was to generate any kind of non-text media, no further action is needed and the answer should be no, regardless of what the tool call response was.

    • Not an ideal solution as this breaks all multi-pass tools that deal with media, and adds clutter if images are not involved
    • After a deeper dive into some of your original prompts, I know why the original payload was generating images infinitely. Right now it's simply not recognizing when an image was returned as a result of a tool call. Here's a broken down example, the request->context was lost. I can probably refine the tool response prompts.
[
		{
			"role": "system",
			"content": "Write AI's next reply in a chat on an instant messaging app between AI and Rosa. Keep replies to one or two sentences!"
		},
		{
			"role": "system",
			"content": "[Start a new Chat]"
		},
		{
			"role": "user",
			"content": "hi!"
		},
		{
			"role": "assistant",
			"tool_calls": [
				{
					"id": "call_58784",
					"type": "function",
					"function": {
						"arguments": "{\"prompt\": \"A vibrant sunset over a coastal city, with tall skyscrapers reflecting the golden light on the water, people walking on the pier, and warm hues of orange, pink, and purple blending into the sky. The scene is detailed, dynamic, and feels peaceful and alive, with soft wind-blown grass and a few sailboats gently moving in the distance.\"}",
						"name": "GenerateImage"
					}
				}
			]
		},
		{
			"role": "tool",
			"content": "/user/images/AI/[email protected]",
			"tool_call_id": "call_58784"
		}
	]

your current prompt unwrapping:

'User\'s request: hi!

Tool call responses: [\'/user/images/AI/[email protected]\']

Tool List:
[
{
"name": "GenerateImage",
"description": "Generate an image from a given text prompt. Use when a user asks to generate an image, imagine a concept or an item, send a picture of a scene, a selfie, etc.",
"properties": {
"prompt": "string"
}
}
]

If user\'s request was to generate any kind of non-text media, no further action is needed and the answer should be no, regardless of what the tool call response was. Otherwise, given the tool call response to the user\'s request, is another tool call needed to further answer user\'s message? State your final decision at the end. Don\'t use emojis.
### Response:
'

reworked prompt unwrap:

'<|im_start|>system
Write AI\'s next reply in a chat on an instant messaging app between AI and Rosa. Keep replies to one or two sentences!<|im_end|>
<|im_start|>system
[Start a new Chat]<|im_end|>
<|im_start|>user
hi!<|im_end|>
<|im_start|>assistant

(Made a function call call_58784 to GenerateImage with arguments={"prompt": "A vibrant sunset over a coastal city, with tall skyscrapers reflecting the golden light on the water, people walking on the pier, and warm hues of orange, pink, and purple blending into the sky. The scene is detailed, dynamic, and feels peaceful and alive, with soft wind-blown grass and a few sailboats gently moving in the distance."})
<|im_end|>

Received results of function call call_58784:
/user/images/AI/[email protected]<|im_start|>assistant

(Made a function call call_27009 to GenerateImage with arguments={"prompt": "A cozy, sunlit living room with a plush sofa, a coffee table filled with books and a plant, a window looking out to a garden, and soft warm lighting from lamps and a ceiling fixture."})
<|im_end|>

Received results of function call call_27009:
/user/images/AI/[email protected]

Tool List:
[
{
"name": "GenerateImage",
"description": "Generate an image from a given text prompt. Use when a user asks to generate an image, imagine a concept or an item, send a picture of a scene, a selfie, etc.",
"properties": {
"prompt": "string"
}
}
]

AI reasoning: No, no further tool calls are needed

So final decision, did the AI decide that a tool call is required? (one word answer: yes or no):'

and the AI replied:
Hey Rosa! I generated two images for you — one of a sunset over a coastal city and another of a cozy living room. Let me know if you'd like to see more! 🌅🛋️

@LostRuins
Copy link
Owner

really sorry for the huge delay, i was caught up with laptop repairs and troubleshooting. I've made some tweaks to the PR and added some comments, do you think you could give it a try and let me know if it works well for you?

@Rose22
Copy link
Author

Rose22 commented Nov 21, 2025

it's okay. sure, i'll try. i'm sorry for messing up the git branch, hope all the effort will be worth it! please check out the new (hopefully fixed) pull request and add your changes to it

@Rose22 Rose22 closed this Nov 21, 2025
@Rose22 Rose22 deleted the concedo_experimental branch November 21, 2025 13:58
@Rose22 Rose22 restored the concedo_experimental branch November 21, 2025 14:02
@Rose22 Rose22 deleted the concedo_experimental branch November 21, 2025 14:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request good first issue Good for newcomers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants