lilac adapter #389

shreymodi1 · 2025-12-29T18:11:58Z

Note

Adds a new Lilac integration adapter enabling round-trip conversion between Eval Protocol rows and DataFrames.

New eval_protocol/adapters/lilac.py with evaluation_rows_to_dataframe and dataframe_to_evaluation_rows
Serializes fields to JSON (messages_json, tools_json, ground_truth_json, input_metadata_json, execution_metadata_json, evaluation_result_json) plus scalar columns like score, message_count, has_tools
Robust parsing with validation and warning logs to handle malformed JSON during reconstruction

^{Written by Cursor Bugbot for commit 74cca5d. This will update automatically on new commits. Configure here.}

eval_protocol/adapters/lilac.py

cursor · 2025-12-29T18:16:01Z

eval_protocol/adapters/lilac.py

+        return [Message.model_validate(m) for m in messages_data]
+    except (json.JSONDecodeError, ValueError) as e:
+        logger.warning(f"Failed to deserialize messages: {e}")
+        return []


Uncaught TypeError when messages JSON is non-list

In _deserialize_messages, if messages_json contains valid JSON that is not a list (e.g., a number like "123" or "null"), json.loads succeeds but iterating over the result with for m in messages_data raises a TypeError. This exception is not caught since only json.JSONDecodeError and ValueError are handled, causing the function to propagate an error instead of gracefully returning an empty list like it does for other parse failures.

cursor · 2025-12-29T18:37:46Z

eval_protocol/adapters/lilac.py

+        try:
+            input_metadata = InputMetadata.model_validate_json(data["input_metadata_json"])
+        except (json.JSONDecodeError, ValueError):
+            input_metadata = InputMetadata(row_id=data.get("row_id"))


Pydantic ValidationError not caught, fallback logic unreachable

The except clauses catch (json.JSONDecodeError, ValueError) for Pydantic's model_validate_json() calls, but in Pydantic v2, this method raises pydantic.ValidationError, which is not a subclass of either exception. This means the fallback logic (like InputMetadata(row_id=data.get("row_id"))) is unreachable dead code. Any validation error causes the entire row to be skipped instead of gracefully recovering. Other parts of the codebase correctly catch ValidationError explicitly (e.g., in config.py).

Additional Locations (2)

eval_protocol/adapters/lilac.py#L127-L132

eval_protocol/adapters/lilac.py#L135-L140

dphuang2 · 2025-12-29T19:22:55Z

this PR isn't specific to lilac in any way—its an adapter for pandas dataframe. Can we rename the module to reflect that?

dphuang2 · 2025-12-29T19:24:10Z

eval_protocol/adapters/lilac.py

+def _evaluation_row_to_dict(row: EvaluationRow) -> dict[str, Any]:
+    """Convert a single EvaluationRow to a dictionary.
+
+    The output contains JSON-serialized fields that can be reconstructed back
+    to EvaluationRow. Users can add their own text columns for clustering.
+    """
+    return {
+        # Identifiers
+        "row_id": row.input_metadata.row_id if row.input_metadata else None,
+        # Full data as JSON (for reconstruction)
+        "messages_json": json.dumps([_serialize_message(m) for m in row.messages]),
+        "tools_json": json.dumps(row.tools) if row.tools else None,
+        "ground_truth_json": json.dumps(row.ground_truth) if row.ground_truth else None,
+        "input_metadata_json": row.input_metadata.model_dump_json() if row.input_metadata else None,
+        "execution_metadata_json": row.execution_metadata.model_dump_json() if row.execution_metadata else None,
+        "evaluation_result_json": row.evaluation_result.model_dump_json() if row.evaluation_result else None,
+        # Scalar fields for filtering
+        "score": row.evaluation_result.score if row.evaluation_result else None,
+        "message_count": len(row.messages),
+        "has_tools": bool(row.tools),
+    }


should this helper just live on the EvaluationRow for discovery? Also, how was the final set of keys decided? Why don't we just generically serialize the entire EvaluationRow?

lilac adapter

c3203f5

cursor bot reviewed Dec 29, 2025

View reviewed changes

updated adapter

74cca5d

cursor bot reviewed Dec 29, 2025

View reviewed changes

dphuang2 reviewed Dec 29, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

lilac adapter #389

lilac adapter #389

Uh oh!

shreymodi1 commented Dec 29, 2025 •

edited by cursor bot

Loading

Uh oh!

Uh oh!

cursor bot Dec 29, 2025

Uh oh!

cursor bot Dec 29, 2025

Uh oh!

dphuang2 commented Dec 29, 2025

Uh oh!

dphuang2 Dec 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lilac adapter #389

Are you sure you want to change the base?

lilac adapter #389

Uh oh!

Conversation

shreymodi1 commented Dec 29, 2025 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

cursor bot Dec 29, 2025

Choose a reason for hiding this comment

Uncaught TypeError when messages JSON is non-list

Uh oh!

cursor bot Dec 29, 2025

Choose a reason for hiding this comment

Pydantic ValidationError not caught, fallback logic unreachable

Uh oh!

dphuang2 commented Dec 29, 2025

Uh oh!

dphuang2 Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

shreymodi1 commented Dec 29, 2025 •

edited by cursor bot

Loading