OpenAI Computer Use Agent #270

imenelydiaker · 2025-07-28T15:43:19Z

Hi all,

I added OpenAI computer use agent for evaluation. I mostly used this documentation and sample code. The agent uses the coordinates action space.

I ran few tests with WorkArena L1 but the agent is really bad at achieving tasks (get 0 almost all the time, expect for information extraction tasks - charts -). From my analysis, it seems like it's more a problem with the model rather than the environment. Can you probably help verify it is true?

This is the sample cod eI use to test the agent:

from agentlab.experiments.study import make_study, Study
from browsergym.experiments.benchmark import Benchmark
from browsergym.experiments.benchmark.utils import make_env_args_list_from_repeat_tasks
from browsergym.experiments.benchmark.metadata.utils import task_metadata
from browsergym.experiments.benchmark.configs import DEFAULT_HIGHLEVEL_ACTION_SET_ARGS

import numpy as np
import logging

from agentlab.openai_cua.agent_configs import OPENAI_CUA_AGENT_ARGS

agent_args = [
    OPENAI_CUA_AGENT_ARGS,
]

benchmark = Benchmark(
    name="workarena_l1_tiny",
    high_level_action_set_args=DEFAULT_HIGHLEVEL_ACTION_SET_ARGS["workarena"],
    is_multi_tab=False,
    supports_parallel_seeds=False,
    backends=["workarena"],
    env_args_list=make_env_args_list_from_repeat_tasks(
        task_list=[
            "workarena.servicenow.all-menu",
            # "workarena.servicenow.create-problem",
            "workarena.servicenow.create-user",
            # "workarena.servicenow.create-hardware-asset",
            # "workarena.servicenow.order-development-laptop-p-c",
            # "workarena.servicenow.order-developer-laptop",
            "workarena.servicenow.order-ipad-mini",
            # "workarena.servicenow.order-loaner-laptop",
            # "workarena.servicenow.single-chart-value-retrieval",
            "workarena.servicenow.multi-chart-value-retrieval",
            # "workarena.servicenow.filter-asset-list",
            # "workarena.servicenow.filter-change-request-list",
            # "workarena.servicenow.sort-asset-list",
            # "workarena.servicenow.sort-user-list",
            # "workarena.servicenow.knowledge-base-search",
        ],
        max_steps=15,
        n_repeats=3,
        seeds_rng=np.random.RandomState(42),
    ),
    task_metadata=task_metadata("workarena"),
)

# benchmark = "workarena_l1" # Uncomment this line to use the full WorkArena L1 benchmark

relaunch = False

if relaunch:
    study = Study.load_most_recent(contains=None)
    study.find_incomplete(include_errors=True)
else:
    study = make_study(
        agent_args=agent_args,
        benchmark=benchmark,
        logging_level_stdout=logging.INFO,
        ignore_dependencies=True,
  )

study.run(
    n_jobs=10,
    parallel_backend="ray",
    strict_reproducibility=False,
    n_relaunch=3,
)

Description by Korbit AI

What change is being made?

Add a new OpenAI Computer Use Agent with classes to define the agent's arguments and operations for interacting with a high-level action interface.

Why are these changes being made?

This agent is being introduced to enable automated interactions and reasoning within a browser environment using the OpenAI framework. The design includes options for configuring actions, executing tasks without explicit confirmations, and managing agent operations effectively in various scenarios, which aids in the development of more sophisticated AI-driven user interface automation.

Is this description stale? Ask me to generate a new description by commenting /korbit-generate-pr-description

korbit-ai

Review by Korbit AI

Korbit automatically attempts to detect when you fix issues in new commits.

Category	Issue	Status
	Incomplete Multi-key Processing ▹ view	✅ Fix detected
	Unsecured OpenAI Client Initialization ▹ view	✅ Fix detected
	Disabled Safety Checks in Production Configuration ▹ view
	Silent Failure in Action Parsing ▹ view
	Unbounded Input History Growth ▹ view	✅ Fix detected
	Incorrect Safety Check Assertion Logic ▹ view	✅ Fix detected

Files scanned

File Path	Reviewed
src/agentlab/agents/openai_cua/agent_configs.py	✅
src/agentlab/agents/openai_cua/agent.py	✅

Explore our documentation to understand the languages and file types we support and the files we ignore.

Check out our docs on how you can make Korbit work best for you and your team.

Loving Korbit!? Share us on LinkedIn Reddit and X

korbit-ai · 2025-07-28T15:59:59Z

src/agentlab/agents/openai_cua/agent_configs.py

+        subsets=("chat", "coord"),
+        demo_mode=None,
+    ),
+    enable_safety_checks=False, 


Disabled Safety Checks in Production Configuration

Tell me more

What is the issue?

Safety checks are disabled by default, which could allow potentially harmful actions to be executed without validation.

Why this matters

Without safety checks, the agent could perform unintended or dangerous operations in the browser environment, potentially compromising system security or causing undesired side effects.

Suggested change ∙ Feature Preview

Enable safety checks by default unless explicitly required otherwise for testing:

enable_safety_checks=True

Provide feedback to improve future suggestions

_{💬 Looking for more details? Reply to this comment to chat with Korbit.}

src/agentlab/agents/openai_cua/agent.py

imenelydiaker · 2025-08-06T13:17:39Z

@amanjaiswal73892 did you get the chance to check if the issue on solving WorkArena-L1 is coming from bgym coordinates interpretation or from the computer-use-preview model?

amanjaiswal73892 · 2025-08-06T19:18:20Z

Hi @imenelydiaker,  Thank you for the PR. It looks good! The coordinates interpretation in bgym seems to be working well. I was able to get ~25% for these tasks (66% for order-ipad-mini and 33% for chart retrieval). I changed the mapping of CUA type action to keyboard_type bgym function and used the latest bgym (v0.14.2). Other tasks are difficult for the agent and may need more steps. It would be great if you can update the code to render chat messages in agentlab-xray. Let me know if you need any help with this.

imenelydiaker · 2025-08-06T19:23:58Z

Okay great thank you, it may be an issue with WorkArena on my end. I'm not able to reproduce many of my preivous results with other non-GUI agents.

I'll update the code to add some logging info :)

src/agentlab/agents/openai_cua/agent.py

amanjaiswal73892

Requested changes to support chat message rendering in xray.

Co-authored-by: Aman Jaiswal <[email protected]>

imenelydiaker added 2 commits July 28, 2025 17:36

add openai computer use agent

7540fc1

Code formatting

d827ef1

korbit-ai bot reviewed Jul 28, 2025

View reviewed changes

fix issues found by review bot

2061b1b

amanjaiswal73892 self-assigned this Jul 30, 2025

amanjaiswal73892 self-requested a review July 30, 2025 22:26

amanjaiswal73892 removed their request for review August 6, 2025 19:24

amanjaiswal73892 reviewed Aug 6, 2025

View reviewed changes

src/agentlab/agents/openai_cua/agent.py Outdated Show resolved Hide resolved

amanjaiswal73892 requested changes Aug 6, 2025

View reviewed changes

imenelydiaker and others added 2 commits August 6, 2025 21:41

Update src/agentlab/agents/openai_cua/agent.py

77f9b1e

Co-authored-by: Aman Jaiswal <[email protected]>

update chat_messages and other functions

c4bc421

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

OpenAI Computer Use Agent #270

OpenAI Computer Use Agent #270

Uh oh!

imenelydiaker commented Jul 28, 2025 •

edited by korbit-ai bot

Loading

Uh oh!

korbit-ai bot left a comment •

edited

Loading

Uh oh!

korbit-ai bot Jul 28, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

imenelydiaker commented Aug 6, 2025

Uh oh!

amanjaiswal73892 commented Aug 6, 2025

Uh oh!

imenelydiaker commented Aug 6, 2025

Uh oh!

Uh oh!

amanjaiswal73892 left a comment

Uh oh!

Uh oh!

OpenAI Computer Use Agent #270

Are you sure you want to change the base?

OpenAI Computer Use Agent #270

Uh oh!

Conversation

imenelydiaker commented Jul 28, 2025 • edited by korbit-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description by Korbit AI

What change is being made?

Why are these changes being made?

Uh oh!

korbit-ai bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Review by Korbit AI

Korbit automatically attempts to detect when you fix issues in new commits.

Uh oh!

korbit-ai bot Jul 28, 2025

Choose a reason for hiding this comment

Disabled Safety Checks in Production Configuration

What is the issue?

Why this matters

Suggested change ∙ Feature Preview

Provide feedback to improve future suggestions

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

imenelydiaker commented Aug 6, 2025

Uh oh!

amanjaiswal73892 commented Aug 6, 2025

Uh oh!

imenelydiaker commented Aug 6, 2025

Uh oh!

Uh oh!

amanjaiswal73892 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

imenelydiaker commented Jul 28, 2025 •

edited by korbit-ai bot

Loading

korbit-ai bot left a comment •

edited

Loading