Skip to content

Conversation

@shreymodi1
Copy link
Contributor

SWE-bench integration into eval-protocol, run swe-bench locally and use our RemoteRolloutProcessor to interact with the server.py

…we-1-mtp#accounts__pyroworks__deployments__r5dfiiwp.eval-run.json
Copy link
Contributor

@xzrderek xzrderek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as other pr, let dylan / benny take a look before merging

remote_base_url="http://127.0.0.1:3000",
model_base_url="https://tracing.fireworks.ai",
timeout_seconds=1800,
output_data_loader=default_fireworks_output_data_loader,
Copy link
Collaborator

@dphuang2 dphuang2 Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need to specify this? I thought there is a default one

@@ -0,0 +1,159 @@
"""
TracingFireworksModel - Routes through tracing using OpenAI SDK.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you give a more comprehensive explanation for this file. current this is confusing for somebody who doesn't know SWE-bench

  1. Why do we need it?
  2. What is it doing?

Copy link
Collaborator

@dphuang2 dphuang2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM besides some other nits! Do you mind sharing a screenshot of the local UI running in this PR if its easy to collect. I am just curious what the output looks like.

max_dataset_rows=2,
rollout_processor=RemoteRolloutProcessor(
remote_base_url="http://127.0.0.1:3000",
model_base_url="https://tracing.fireworks.ai",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't think you need this

if message.startswith("EVAL_RESULT:"):
result_json = message.replace("EVAL_RESULT:", "")
row.evaluation_result = EvaluateResult.model_validate_json(result_json)
break
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm i don't quite get this logic here. i thought we should be reading from the tracing.fireworks.ai, check out default_fireworks_output_data_loader.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe i am misunderstanding?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants