Skip to content

Commit 5595f04

Browse files
committed
update readme
1 parent 61c1226 commit 5595f04

File tree

3 files changed

+11
-3
lines changed

3 files changed

+11
-3
lines changed

README.md

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,11 @@ The system adopts a two-layer structure:
4242

4343
* Automates browser operations, supporting web search, information extraction, and data collection tasks.
4444
* Assists the Deep Researcher in acquiring up-to-date information from the internet.
45+
46+
* **MCP Manager Agent**
47+
* Manages and orchestrates Model Context Protocol (MCP) tools and services.
48+
* Enables dynamic tool discovery, registration, and execution through MCP standards.
49+
* Supports both local and remote MCP tool integration for enhanced agent capabilities.
4550

4651
* **General Tool Calling Agent**
4752
* Provides a general-purpose interface for invoking various tools and APIs.
@@ -146,13 +151,16 @@ python examples/run_gaia.py
146151

147152
## Experiments
148153

149-
We evaluated our agent on the GAIA validation set and achieved state-of-the-art performance on May 10th.
154+
We evaluated our agent on both GAIA validation and test sets, achieving state-of-the-art performance. Our system demonstrates superior performance across all difficulty levels.
150155

151156
<p align="center">
152-
<img src="./docs/assets/gaia_benchmark.png" alt="GAIA Example Result" width="700"/>
157+
<img src="./docs/assets/gaia_test.png" alt="GAIA Test Results" width="300"/>
158+
<img src="./docs/assets/gaia_validation.png" alt="GAIA Validation Results" width="300"/>
153159
</p>
154160

155-
With the integration of the Computer Use Agent, which now enables pixel-level control of the browser, our system's performance on the test set has significantly improved. The latest results show a notable increase to 79.07 (average), with 91.4 on Level 1, 77.36 on Level 2, and 61.22 on Level 3. We’re continuing to refine and optimize the agent, and a new version will be released in the coming days
161+
With the integration of the Computer Use and MCP Manager Agent, which now enables pixel-level control of the browser, our system demonstrates remarkable evolutionary capabilities. The agents can dynamically acquire and enhance their abilities through learning and adaptation, leading to significantly improved performance. The latest results show:
162+
- **Test Set**: 83.39 (average), with 93.55 on Level 1, 83.02 on Level 2, and 65.31 on Level 3
163+
- **Validation Set**: 82.4 (average), with 92.5 on Level 1, 83.7 on Level 2, and 57.7 on Level 3
156164

157165
## Questions
158166

docs/assets/gaia_test.png

298 KB
Loading
File renamed without changes.

0 commit comments

Comments
 (0)