-
Notifications
You must be signed in to change notification settings - Fork 53
feat(tools): add use_browser tool to Strands tools repository #88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Adding functionality so use_browser tool has some retry functionality and supports multiple tabs and switching between them.
src/strands_tools/use_browser.py
Outdated
|
||
elif action == "click": | ||
result += await self._handle_click_action(page, args) | ||
async def retry_action(self, action_func, max_retries=3, delay=1.0, action_name=None, args=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make this configurable by environment variables. Along with the delay
parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
High level comment:
Let's add this tool to the README as well
|
tests/test_use_browser.py
Outdated
def setup_test_environment(): | ||
"""Fixture to set up common test environment""" | ||
original_value = os.environ.get("BYPASS_TOOL_CONSENT", None) | ||
os.environ["BYPASS_TOOL_CONSENT"] = "true" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using mocking of the environment. See
tools/tests/test_environment.py
Line 29 in 44f5d23
def os_environment(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thank you for this. Autouse works well for the setup for the tests
…errors for better error handling
…actions are only called with their required arguments
@@ -38,6 +38,8 @@ dependencies = [ | |||
"tenacity>=9.1.2,<10.0.0", | |||
"watchdog>=6.0.0,<7.0.0", | |||
"slack_bolt>=1.23.0,<2.0.0", | |||
"nest-asyncio>=1.5.0,<2.0.0", | |||
"playwright>=1.42.0,<2.0.0", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How heavy is this dependency?
I'm leaning towards this being an optional dependency as I don't want everyone to need to download playwright just to use memory
for instance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't too sure on the acceptable way to add these dependencies. As long as adding them as optional dependencies keeps the same functionality with the tool, I would rather do it that way.
src/strands_tools/use_browser.py
Outdated
from strands_tools.utils.user_input import get_user_input | ||
|
||
# Only configure this module's logger, not the root logger | ||
enable_debug = os.getenv("ENABLE_DEBUG_BROWSER_LOGS", "false").lower() == "true" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's our naming convention? I think use a STRANDS
prefix elsewhere and I wonder if we should add that prefix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would a better convention for this variable be "STRANDS_USE_BROWSER_ENABLE_DEBUG_LOGS"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to Mackenzie's point. It's a small thing but it'll be good to follow the naming convention that we have throughout the repo.
src/strands_tools/use_browser.py
Outdated
enable_debug = os.getenv("ENABLE_DEBUG_BROWSER_LOGS", "false").lower() == "true" | ||
|
||
logger = logging.getLogger(__name__) | ||
logger.setLevel(logging.DEBUG if enable_debug else logging.INFO) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a client-level concern, not a tool level concern. Any reason we shouldn't let the application/caller set up logging properly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was having an issue with the logs in the past printing out everything, I will re-look at this, logs should be configured on client-level, that makes sense.
src/strands_tools/use_browser.py
Outdated
_playwright_manager = None | ||
|
||
# Apply nested event loop support | ||
nest_asyncio.apply() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be done lazily at all? I'm a little concerned that simply importing the tool changes the behavior
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I can move this into the initialization code for the BrowserManager class
console = Console() | ||
|
||
# Global browser manager instance | ||
_playwright_manager = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a specific reason to have a global instance? If so, can that reason be added as a comment?
Are there any limitations around multiple agents interacting with this tool?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no reason that I have found to why it needs a global instance, let me test the tool with that line in the tool function
src/strands_tools/use_browser.py
Outdated
self._loop = asyncio.new_event_loop() | ||
asyncio.set_event_loop(self._loop) | ||
self.action_configs = { | ||
"navigate": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than config as code, would it make sense to go to a defined object?
class ApiMethods:
def navigate(page: Page, url: str):
page.wait_for_load_state("networkidle"))
return "Navigated to {url}"
....
I think you'd be able to use reflection at runtime to build up the same data here, but it'd be a more declarative way of exposing actions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this approach as well. This way we get access to all the actions of the class in one go.
src/strands_tools/use_browser.py
Outdated
"method": lambda page, selector: page.text_content(selector), | ||
"required_args": ["page", "selector"], | ||
"required_params": [("selector", str)], | ||
"post_process": lambda result: result, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this actually needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, I see let me test it without that and see
src/strands_tools/use_browser.py
Outdated
logger.debug("Ensuring browser is running...") | ||
|
||
# Ensure required directories exist | ||
user_data_dir = os.getenv("BROWSER_USER_DATA_DIR", os.path.join(os.path.expanduser("~"), ".browser_automation")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to my earlier question - what's our convention with naming? Should these be prefixed with STRANDS_
?
Are these parameters only required because we don't provide a way to configure tools otherwise? What if you implemented it such as:
class BrowserTool:
def __init__(... params...):
@tool
def use_browser(...)
...
Usage would then be:
use_browser = BrowserTool(... params ...).use_browser
which would also allow you to isolate state.
(this may or may not work today; but after strands-agents/sdk-python#258 it should work). We may also want to ping the team to get thoughts on this class based approach for this case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do like the idea of a class based approach here
Description
Adding a use_browser tool, that gives an agent actions, so they interact with browsers (just chromium right now). This tool relies on Playwright to perform automated actions on a browser. Playwright allows for a browser to be launched with different launch options. One of these launch options is persistent_context, which allows users to launch a browser and load in the saved data from whatever the specified user_data_dir is. This feature is used to save cookie settings, local storage, sessions storage, browser history, saved passwords, etc. Right now you can ask an agent to use the use_browser tool to launch a browser with persistent_context set to true and specify a file path for user_data_dir. Then, any usernames, bookmarks you create, or search history should be stored in the user_data_dir and then the browser can be relaunched at a later date.
Related Issues
(https://github.com/strands-agents/private-sdk-python-staging/issues/72)
Documentation PR
Will need to add documentation about use_browser tool.
Type of Change
Testing
[How have you tested the change?]
hatch fmt --linter
hatch fmt --formatter
hatch test --all
Checklist
I have read the CONTRIBUTING document
I have added tests that prove my fix is effective or my feature works
I have updated the documentation accordingly
I have added an appropriate example to the documentation to outline the feature
My changes generate no new warnings
Any dependent changes have been merged and published
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.