Skip to content

Added max_images to Agents & AgentMemory. #1439

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

mkrauklis
Copy link

The max_images parameter solves the problem that images "build up" in step memory and get sent to the LLM, but many LLMs have a (fairly low) limit on the number of images they will accept.

e.g. Claude Sonnet 3.5 v2 only accepts 20 images. If I pass in 5 images to an agent run that has a max of 10 steps it's likely that each generated step will include image observations and the run will quickly exceed the 20-image limit in the step->message conversion.

mkrauklis and others added 2 commits June 13, 2025 11:39
The max_images parameter solves the problem that images "build up" in step memory and get sent to the LLM, but many LLMs have a (fairly low) limit on the number of images they will accept.

e.g. Claude Sonnet 3.5 v2 only accepts 20 images. If I pass in 5 images to an agent run that has a max of 10 steps it's likely that each generated step will include image observations and the run will quickly exceed the 20-image limit in the step->message conversion.
@@ -255,6 +255,7 @@ class MultiStepAgent(ABC):
Each function should:
- Take the final answer and the agent's memory as arguments.
- Return a boolean indicating whether the final answer is valid.
max_images (`int`, *optional*): Maximum number of images to retain in the agent's memory. If `-1`, there is no limit.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think making the default None would be clearer when there's no limit! So the type would be int | None


def reset(self):
"""Reset the agent's memory, clearing all steps and keeping the system prompt."""
self.steps = []

def append(self, step: TaskStep | ActionStep | PlanningStep):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of not adding images to memory (with the risk to lose precious information), better remove this custom append method.

To handle the max count of images, you could instead change agent.write_memory_to_messages to just not add an image to the list of input messages being created if we're over the image threshold, or to remove it in postprocessing.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I was under the misunderstanding that each Agent implementation had its own method of converting the memory state to messages. I'll re-write this as soon as I get the chance.

Copy link
Collaborator

@aymeric-roucher aymeric-roucher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mkrauklis, thank you for proposing this change, it makes a lot of sense!

Re: the implementation, I think it's better to keep all images in memory, and to modify write_memory_to_messages instead to cap image count.

Then we can merge!

@mkrauklis
Copy link
Author

Re-implemented in MultiStepAgent.write_memory_to_messages leaving memory alone.

@@ -734,10 +738,62 @@ def write_memory_to_messages(
Reads past llm_outputs, actions, and observations or errors from the memory into a series of messages
that can be used as input to the LLM. Adds a number of keywords (such as PLAN, error, etc) to help
the LLM.
If max_images is set, it will limit the number of images in the messages keeping the newest images in
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If max_images is set, it will limit the number of images in the messages keeping the newest images in
If max_images is set, only the last n images in memory will be written to the message list.

@@ -734,10 +738,62 @@ def write_memory_to_messages(
Reads past llm_outputs, actions, and observations or errors from the memory into a series of messages
that can be used as input to the LLM. Adds a number of keywords (such as PLAN, error, etc) to help
the LLM.
If max_images is set, it will limit the number of images in the messages keeping the newest images in
the message chain.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
the message chain.

"""
messages = self.memory.system_prompt.to_messages(summary_mode=summary_mode)
for memory_step in self.memory.steps:
messages.extend(memory_step.to_messages(summary_mode=summary_mode))

if self.max_images is not None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be better to just do it in one loop: remove the images_to_remove variables, and just process the message list backwards starting from the end and keeping a counter, once this counter goes over self.max_images, you remove all the images that remain on your way.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial implementation was a single backwards loop (as was the original implementation in memory), and it worked but was much more complicated and harder to conceptualize and therefore harder to maintain. I switched to this approach expecting the low cardinality of messages to mitigate any performance impact and placing precedence on maintainability.

If you really think it's worth reverting I'll do it, but know this was a conscious decision on my part. Let me know what you think.

@aaron-ang
Copy link

hi @aymeric-roucher, I created a similar PR #1601 that is a working implementation for my specific use case, but I do believe it makes sense in the broader context too. Please have a look if you have some time. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants