@@ -68,16 +68,22 @@ Run an eval comparing all mcp.task runs for `my-task`:
68
68
mcpx-eval test --task my-task --task-run all
69
69
```
70
70
71
- Run an mcp.run task locally with a different set of models :
71
+ Only evaluate the latest task run :
72
72
73
73
``` bash
74
- mcpx-eval test --model .. --model .. --task my-task --iter 10
74
+ mcpx-eval test --task my-task --task-run latest
75
+ ```
76
+
77
+ Or trigger a new task run:
78
+
79
+ ``` bash
80
+ mcpx-eval test --task my-task --task-run new
75
81
```
76
82
77
- Run the ` my-test.toml ` eval for 10 iterations :
83
+ Run an mcp.run task locally with a different set of models :
78
84
79
85
``` bash
80
- mcpx-eval test --model ... --model ... --config my-test.toml --iter 10
86
+ mcpx-eval test --model .. --model .. --task my-task --iter 10
81
87
```
82
88
83
89
Generate an HTML scoreboard for all evals:
@@ -92,7 +98,7 @@ A test file is a TOML file containing the following fields:
92
98
93
99
- ` name ` - name of the test
94
100
- ` task ` - optional, the name of the mcp.run task to use
95
- - ` task-run ` - optional, the name or index of the task run to analyze
101
+ - ` task-run ` - optional, one of ` latest ` , ` new ` , ` all ` or the name/ index of the task run to analyze
96
102
- ` prompt ` - prompt to test, this is passed to the LLM under test, this can be left blank if ` task ` is set
97
103
- ` check ` - prompt for the judge, this is used to determine the quality of the test output
98
104
- ` expected-tools ` - list of tool names that might be used
0 commit comments