Skip to content

add trajectory JSONL streaming for persistence/debug/replay#240

Open
thanay-sisir wants to merge 5 commits intoweb-arena-x:mainfrom
thanay-sisir:traj-jsonl-streaming
Open

add trajectory JSONL streaming for persistence/debug/replay#240
thanay-sisir wants to merge 5 commits intoweb-arena-x:mainfrom
thanay-sisir:traj-jsonl-streaming

Conversation

@thanay-sisir
Copy link

Feature: Crash-Safe Trajectory Streaming

1. Why This Matters

This addresses a critical data loss risk during evaluations.

  • The Problem: Previously, trajectory data (observations, actions, state) was stored entirely in RAM until the task finished.
  • The Risk: If the run crashed 90% of the way through (due to an API error, Out Of Memory, or timeout), all data was lost. You couldn't see why it failed because the logs died with the process.
  • The Fix: We now stream data to a file on disk (.jsonl) in real-time. Every single step is written and saved immediately.

2. Impact on Codebase

  • File Modified: run.py
  • The Mechanism: Added logic to open a file (result_dir/{task_id}.traj.jsonl) at the start of a task.
  • The Flow: After every trajectory.append(), the system immediately writes the new item to the file and flushes the buffer.
  • Result: Even if the power goes out at Step 29 of 30, you still have a valid file containing steps 1-29 on your disk.
  • Performance: Zero noticeable overhead (file writes are negligible compared to LLM inference time).

3. Consequences of Ignoring It

  • Lost Hours: A crash in a long-running task means hours of compute time wasted with zero artifacts to show for it.
  • Memory Bloat: storing massive HTML observations in RAM causes OOM (Out Of Memory) crashes on long tasks.
  • Blind Debugging: Without a partial trajectory file, debugging a crash requires re-running the entire task and hoping it crashes the same way again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant