Skip to content

Conversation

@shreymodi1
Copy link
Contributor

@shreymodi1 shreymodi1 commented Dec 29, 2025

Note

Adds a new Lilac integration adapter enabling round-trip conversion between Eval Protocol rows and DataFrames.

  • New eval_protocol/adapters/lilac.py with evaluation_rows_to_dataframe and dataframe_to_evaluation_rows
  • Serializes fields to JSON (messages_json, tools_json, ground_truth_json, input_metadata_json, execution_metadata_json, evaluation_result_json) plus scalar columns like score, message_count, has_tools
  • Robust parsing with validation and warning logs to handle malformed JSON during reconstruction

Written by Cursor Bugbot for commit 74cca5d. This will update automatically on new commits. Configure here.

return [Message.model_validate(m) for m in messages_data]
except (json.JSONDecodeError, ValueError) as e:
logger.warning(f"Failed to deserialize messages: {e}")
return []
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uncaught TypeError when messages JSON is non-list

In _deserialize_messages, if messages_json contains valid JSON that is not a list (e.g., a number like "123" or "null"), json.loads succeeds but iterating over the result with for m in messages_data raises a TypeError. This exception is not caught since only json.JSONDecodeError and ValueError are handled, causing the function to propagate an error instead of gracefully returning an empty list like it does for other parse failures.

Fix in Cursor Fix in Web

try:
input_metadata = InputMetadata.model_validate_json(data["input_metadata_json"])
except (json.JSONDecodeError, ValueError):
input_metadata = InputMetadata(row_id=data.get("row_id"))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pydantic ValidationError not caught, fallback logic unreachable

The except clauses catch (json.JSONDecodeError, ValueError) for Pydantic's model_validate_json() calls, but in Pydantic v2, this method raises pydantic.ValidationError, which is not a subclass of either exception. This means the fallback logic (like InputMetadata(row_id=data.get("row_id"))) is unreachable dead code. Any validation error causes the entire row to be skipped instead of gracefully recovering. Other parts of the codebase correctly catch ValidationError explicitly (e.g., in config.py).

Additional Locations (2)

Fix in Cursor Fix in Web

@dphuang2
Copy link
Collaborator

this PR isn't specific to lilac in any way—its an adapter for pandas dataframe. Can we rename the module to reflect that?

Comment on lines +74 to +94
def _evaluation_row_to_dict(row: EvaluationRow) -> dict[str, Any]:
"""Convert a single EvaluationRow to a dictionary.

The output contains JSON-serialized fields that can be reconstructed back
to EvaluationRow. Users can add their own text columns for clustering.
"""
return {
# Identifiers
"row_id": row.input_metadata.row_id if row.input_metadata else None,
# Full data as JSON (for reconstruction)
"messages_json": json.dumps([_serialize_message(m) for m in row.messages]),
"tools_json": json.dumps(row.tools) if row.tools else None,
"ground_truth_json": json.dumps(row.ground_truth) if row.ground_truth else None,
"input_metadata_json": row.input_metadata.model_dump_json() if row.input_metadata else None,
"execution_metadata_json": row.execution_metadata.model_dump_json() if row.execution_metadata else None,
"evaluation_result_json": row.evaluation_result.model_dump_json() if row.evaluation_result else None,
# Scalar fields for filtering
"score": row.evaluation_result.score if row.evaluation_result else None,
"message_count": len(row.messages),
"has_tools": bool(row.tools),
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this helper just live on the EvaluationRow for discovery? Also, how was the final set of keys decided? Why don't we just generically serialize the entire EvaluationRow?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants