-
Notifications
You must be signed in to change notification settings - Fork 81
OpenAI Computer Use Agent #270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review by Korbit AI
Korbit automatically attempts to detect when you fix issues in new commits.
Category | Issue | Status |
---|---|---|
Incomplete Multi-key Processing ▹ view | ✅ Fix detected | |
Unsecured OpenAI Client Initialization ▹ view | ✅ Fix detected | |
Disabled Safety Checks in Production Configuration ▹ view | ||
Silent Failure in Action Parsing ▹ view | ||
Unbounded Input History Growth ▹ view | ✅ Fix detected | |
Incorrect Safety Check Assertion Logic ▹ view | ✅ Fix detected |
Files scanned
File Path | Reviewed |
---|---|
src/agentlab/agents/openai_cua/agent_configs.py | ✅ |
src/agentlab/agents/openai_cua/agent.py | ✅ |
Explore our documentation to understand the languages and file types we support and the files we ignore.
Check out our docs on how you can make Korbit work best for you and your team.
subsets=("chat", "coord"), | ||
demo_mode=None, | ||
), | ||
enable_safety_checks=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Disabled Safety Checks in Production Configuration 
Tell me more
What is the issue?
Safety checks are disabled by default, which could allow potentially harmful actions to be executed without validation.
Why this matters
Without safety checks, the agent could perform unintended or dangerous operations in the browser environment, potentially compromising system security or causing undesired side effects.
Suggested change ∙ Feature Preview
Enable safety checks by default unless explicitly required otherwise for testing:
enable_safety_checks=True
Provide feedback to improve future suggestions
💬 Looking for more details? Reply to this comment to chat with Korbit.
@amanjaiswal73892 did you get the chance to check if the issue on solving WorkArena-L1 is coming from bgym coordinates interpretation or from the |
Hi @imenelydiaker,
Thank you for the PR. It looks good! The coordinates interpretation in bgym seems to be working well. I was able to get ~25% for these tasks (66% for order-ipad-mini and 33% for chart retrieval). I changed the mapping of CUA |
Okay great thank you, it may be an issue with WorkArena on my end. I'm not able to reproduce many of my preivous results with other non-GUI agents. I'll update the code to add some logging info :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requested changes to support chat message rendering in xray.
Co-authored-by: Aman Jaiswal <[email protected]>
Hi all,
I added OpenAI computer use agent for evaluation. I mostly used this documentation and sample code. The agent uses the coordinates action space.
I ran few tests with WorkArena L1 but the agent is really bad at achieving tasks (get 0 almost all the time, expect for information extraction tasks - charts -). From my analysis, it seems like it's more a problem with the model rather than the environment. Can you probably help verify it is true?
This is the sample cod eI use to test the agent:
Description by Korbit AI
What change is being made?
Add a new OpenAI Computer Use Agent with classes to define the agent's arguments and operations for interacting with a high-level action interface.
Why are these changes being made?
This agent is being introduced to enable automated interactions and reasoning within a browser environment using the OpenAI framework. The design includes options for configuring actions, executing tasks without explicit confirmations, and managing agent operations effectively in various scenarios, which aids in the development of more sophisticated AI-driven user interface automation.