Skip to content

feat(tools): add use_browser tool to Strands tools repository #88

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

jimbrub
Copy link

@jimbrub jimbrub commented Jun 16, 2025

Description

Adding a use_browser tool, that gives an agent actions, so they interact with browsers (just chromium right now). This tool relies on Playwright to perform automated actions on a browser. Playwright allows for a browser to be launched with different launch options. One of these launch options is persistent_context, which allows users to launch a browser and load in the saved data from whatever the specified user_data_dir is. This feature is used to save cookie settings, local storage, sessions storage, browser history, saved passwords, etc. Right now you can ask an agent to use the use_browser tool to launch a browser with persistent_context set to true and specify a file path for user_data_dir. Then, any usernames, bookmarks you create, or search history should be stored in the user_data_dir and then the browser can be relaunched at a later date.

Related Issues

(https://github.com/strands-agents/private-sdk-python-staging/issues/72)

Documentation PR

Will need to add documentation about use_browser tool.

Type of Change

  • New Tool
  • Other (please describe):

Testing

[How have you tested the change?]

  • hatch fmt --linter
  • hatch fmt --formatter
  • hatch test --all

Checklist

  • I have read the CONTRIBUTING document

  • I have added tests that prove my fix is effective or my feature works

  • I have updated the documentation accordingly

  • I have added an appropriate example to the documentation to outline the feature

  • My changes generate no new warnings

  • Any dependent changes have been merged and published

  • By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@jimbrub jimbrub requested a review from a team as a code owner June 16, 2025 13:38
@mehtarac mehtarac requested a review from a team June 17, 2025 22:22
Adding functionality so use_browser tool has some retry functionality and supports multiple tabs and switching between them.

elif action == "click":
result += await self._handle_click_action(page, args)
async def retry_action(self, action_func, max_retries=3, delay=1.0, action_name=None, args=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make this configurable by environment variables. Along with the delay parameter.

Copy link
Member

@mehtarac mehtarac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High level comment:

Let's add this tool to the README as well

@jimbrub
Copy link
Author

jimbrub commented Jun 20, 2025

High level comment:

Let's add this tool to the README as well

@jimbrub jimbrub closed this Jun 20, 2025
@jimbrub jimbrub reopened this Jun 20, 2025
def setup_test_environment():
"""Fixture to set up common test environment"""
original_value = os.environ.get("BYPASS_TOOL_CONSENT", None)
os.environ["BYPASS_TOOL_CONSENT"] = "true"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using mocking of the environment. See

def os_environment():
for example

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thank you for this. Autouse works well for the setup for the tests

@@ -38,6 +38,8 @@ dependencies = [
"tenacity>=9.1.2,<10.0.0",
"watchdog>=6.0.0,<7.0.0",
"slack_bolt>=1.23.0,<2.0.0",
"nest-asyncio>=1.5.0,<2.0.0",
"playwright>=1.42.0,<2.0.0",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How heavy is this dependency?

I'm leaning towards this being an optional dependency as I don't want everyone to need to download playwright just to use memory for instance

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't too sure on the acceptable way to add these dependencies. As long as adding them as optional dependencies keeps the same functionality with the tool, I would rather do it that way.

from strands_tools.utils.user_input import get_user_input

# Only configure this module's logger, not the root logger
enable_debug = os.getenv("ENABLE_DEBUG_BROWSER_LOGS", "false").lower() == "true"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's our naming convention? I think use a STRANDS prefix elsewhere and I wonder if we should add that prefix

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would a better convention for this variable be "STRANDS_USE_BROWSER_ENABLE_DEBUG_LOGS"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to Mackenzie's point. It's a small thing but it'll be good to follow the naming convention that we have throughout the repo.

enable_debug = os.getenv("ENABLE_DEBUG_BROWSER_LOGS", "false").lower() == "true"

logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG if enable_debug else logging.INFO)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a client-level concern, not a tool level concern. Any reason we shouldn't let the application/caller set up logging properly?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was having an issue with the logs in the past printing out everything, I will re-look at this, logs should be configured on client-level, that makes sense.

_playwright_manager = None

# Apply nested event loop support
nest_asyncio.apply()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be done lazily at all? I'm a little concerned that simply importing the tool changes the behavior

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I can move this into the initialization code for the BrowserManager class

console = Console()

# Global browser manager instance
_playwright_manager = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a specific reason to have a global instance? If so, can that reason be added as a comment?

Are there any limitations around multiple agents interacting with this tool?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no reason that I have found to why it needs a global instance, let me test the tool with that line in the tool function

self._loop = asyncio.new_event_loop()
asyncio.set_event_loop(self._loop)
self.action_configs = {
"navigate": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than config as code, would it make sense to go to a defined object?

class ApiMethods:
    def navigate(page: Page, url: str):
        page.wait_for_load_state("networkidle"))
        return "Navigated to {url}"
    ....

I think you'd be able to use reflection at runtime to build up the same data here, but it'd be a more declarative way of exposing actions

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this approach as well. This way we get access to all the actions of the class in one go.

"method": lambda page, selector: page.text_content(selector),
"required_args": ["page", "selector"],
"required_params": [("selector", str)],
"post_process": lambda result: result,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually needed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, I see let me test it without that and see

logger.debug("Ensuring browser is running...")

# Ensure required directories exist
user_data_dir = os.getenv("BROWSER_USER_DATA_DIR", os.path.join(os.path.expanduser("~"), ".browser_automation"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to my earlier question - what's our convention with naming? Should these be prefixed with STRANDS_?

Are these parameters only required because we don't provide a way to configure tools otherwise? What if you implemented it such as:

class BrowserTool:
    def __init__(... params...):


    @tool
    def use_browser(...)
           ... 

Usage would then be:

use_browser = BrowserTool(... params ...).use_browser

which would also allow you to isolate state.

(this may or may not work today; but after strands-agents/sdk-python#258 it should work). We may also want to ping the team to get thoughts on this class based approach for this case

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do like the idea of a class based approach here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants