Skip to content

add exact text match to TYPE action equivalence in is_equivalent#238

Open
thanay-sisir wants to merge 2 commits intoweb-arena-x:mainfrom
thanay-sisir:feat/actions-type-equiv-text-match
Open

add exact text match to TYPE action equivalence in is_equivalent#238
thanay-sisir wants to merge 2 commits intoweb-arena-x:mainfrom
thanay-sisir:feat/actions-type-equiv-text-match

Conversation

@thanay-sisir
Copy link
Copy Markdown

Feature: Exact Text Match for TYPE Actions

1. Why This Matters

This addresses a logic flaw in how we track what the agent is doing.

  • The Reality: Agents often type incrementally (e.g., "u" -> "us" -> "user") or correct their spelling.
  • The Flaw: Previously, the system only checked where the agent was typing, not what it was typing.
  • The Bug: It treated typing "u" and "user" as the exact same action because they happened in the same text box. This triggered the "infinite loop" detector and killed the agent mid-sentence.

2. Impact on Codebase

  • File Modified: browser_env/actions.py
  • The Fix: We updated the is_equivalent function to treat TYPE actions differently than Clicks.
  • The Logic:
    • Click/Hover: Still checks if the location matches.
    • Type: Now checks if the location matches AND if the text matches exactly.
  • The Result: Incremental typing ('h' -> 'hi') is now correctly seen as progress, not a loop.

3. Consequences of Ignoring It

  • Premature Stops: Valid runs are aborted 12% of the time on form-filling or search tasks because the agent is "typing too much."
  • Wasted Compute: We kill the agent right as it is fixing a mistake or finishing a word, wasting all previous steps.
  • False Negatives: It makes the agent look broken when it is actually performing valid retries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant