Skip to content

Conversation

@zihaolin96
Copy link
Contributor

@zihaolin96 zihaolin96 commented Dec 25, 2025

Note

Integrates Klavis sandbox lifecycle into EP pytest utilities and provides a working Gmail sandbox eval.

  • New KlavisSandboxRolloutProcessor manages full sandbox flow: create/init (initialize_*_sandbox), generate temp MCP config, run Agent, export/dump sandbox state, attach results to row.execution_metadata.extra, cleanup and delete sandbox
  • Exposes KlavisSandboxRolloutProcessor via eval_protocol.pytest.__init__
  • Adds optional klavis dependency group (klavis>=2.18.0) and lockfile updates
  • New test test_pytest_klavis_sandbox.py and dataset klavis_gmail_sandbox_test.jsonl: adapts JSONL to EP rows, runs against server_name="gmail", and scores via Fireworks LLM by comparing dumped sandbox state to provided ground_truth

Written by Cursor Bugbot for commit 7f35e24. This will update automatically on new commits. Configure here.

See our slack channel for demo

@@ -1,5 +1,6 @@
from .default_agent_rollout_processor import AgentRolloutProcessor
from .default_dataset_adapter import default_dataset_adapter
from .default_klavis_sandbox_rollout_processor import KlavisSandboxRolloutProcessor
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing try/except for optional klavis dependency import

The KlavisSandboxRolloutProcessor is imported unconditionally at line 3, but it depends on the optional klavis package. This will cause an ImportError for any user who imports from eval_protocol.pytest without having klavis installed. Other optional dependency imports like PydanticAgentRolloutProcessor and LangGraphRolloutProcessor are correctly wrapped in try/except blocks (lines 16-22 and 25-31), but KlavisSandboxRolloutProcessor lacks this protection. The import and __all__ export at line 35 both need to be made conditional like the other optional dependencies.

Additional Locations (1)

Fix in Cursor Fix in Web

@xzrderek xzrderek merged commit 60b0fce into eval-protocol:main Dec 29, 2025
4 of 17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants