You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+11-3Lines changed: 11 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -42,6 +42,11 @@ The system adopts a two-layer structure:
42
42
43
43
* Automates browser operations, supporting web search, information extraction, and data collection tasks.
44
44
* Assists the Deep Researcher in acquiring up-to-date information from the internet.
45
+
46
+
***MCP Manager Agent**
47
+
* Manages and orchestrates Model Context Protocol (MCP) tools and services.
48
+
* Enables dynamic tool discovery, registration, and execution through MCP standards.
49
+
* Supports both local and remote MCP tool integration for enhanced agent capabilities.
45
50
46
51
***General Tool Calling Agent**
47
52
* Provides a general-purpose interface for invoking various tools and APIs.
@@ -146,13 +151,16 @@ python examples/run_gaia.py
146
151
147
152
## Experiments
148
153
149
-
We evaluated our agent on the GAIA validation set and achieved state-of-the-art performance on May 10th.
154
+
We evaluated our agent on both GAIA validation and test sets, achieving state-of-the-art performance. Our system demonstrates superior performance across all difficulty levels.
150
155
151
156
<palign="center">
152
-
<imgsrc="./docs/assets/gaia_benchmark.png"alt="GAIA Example Result"width="700"/>
157
+
<imgsrc="./docs/assets/gaia_test.png"alt="GAIA Test Results"width="300"/>
With the integration of the Computer Use Agent, which now enables pixel-level control of the browser, our system's performance on the test set has significantly improved. The latest results show a notable increase to 79.07 (average), with 91.4 on Level 1, 77.36 on Level 2, and 61.22 on Level 3. We’re continuing to refine and optimize the agent, and a new version will be released in the coming days
161
+
With the integration of the Computer Use and MCP Manager Agent, which now enables pixel-level control of the browser, our system demonstrates remarkable evolutionary capabilities. The agents can dynamically acquire and enhance their abilities through learning and adaptation, leading to significantly improved performance. The latest results show:
162
+
-**Test Set**: 83.39 (average), with 93.55 on Level 1, 83.02 on Level 2, and 65.31 on Level 3
163
+
-**Validation Set**: 82.4 (average), with 92.5 on Level 1, 83.7 on Level 2, and 57.7 on Level 3
0 commit comments