Skip to content

Benchmark: Motrix record-video speed for G1 dance eval on H100 #568

Description

@wlgys8

Work type

benchmark

Area

motrix / benchmark / G1

Problem

When evaluating the G1 dance / motion-tracking task on an H100 machine with the Motrix backend, --render-mode record appears visually much slower than expected. We need to measure whether the slowdown comes from Motrix rendering/capture, Python frame collection, video encoding, task stepping, or H100/headless runtime setup.

Current repo evidence:

  • g1_motion_tracking is the documented dance / motion-tracking eval path.
  • Motrix owner configs exist at conf/ppo/task/g1_motion_tracking/motrix.yaml and conf/appo/task/g1_motion_tracking/motrix.yaml.
  • The PPO Motrix owner currently sets training.play_env_num: 128 and training.play_steps: 1000.
  • The eval route supports record mode through:
uv run eval --algo ppo --task g1_motion_tracking --sim motrix --load-run -1 \
  --render-mode record

src/unilab/base/backend/motrix/playback.py currently records by initializing the Motrix renderer at 1280x720, capturing one frame per playback step, appending copied frames in Python, then writing play_video.mp4 via mediapy.write_video.

Deliverable

Produce a small benchmark report for the H100 target that answers:

  • Actual wall-clock time and effective capture FPS for g1_motion_tracking Motrix eval with --render-mode record.
  • Breakdown, if possible, between simulation step, Motrix frame capture, frame copy/list accumulation, and final mediapy.write_video encoding.
  • Comparison against at least one non-record baseline, for example --render-mode none or an equivalent eval path without capture.
  • Whether the observed slowness is expected for the current capture path or indicates a renderer/capture efficiency issue.
  • Recommended next action if there is a bottleneck, such as streaming video encoding, lower capture resolution, batching/async capture, or a Motrix-side renderer profiling task.

Definition of done

  • H100 environment is recorded: OS, GPU, driver/CUDA, Python, Motrix package/build, UniLab commit, and exact checkpoint/run id.
  • The exact eval command and Hydra overrides are included.
  • Benchmark numbers are attached, including total playback time, number of frames, output video duration/FPS, output file path, and output file size.
  • A record-vs-non-record comparison is included so we can separate policy/task stepping cost from video capture/encoding cost.
  • Conclusion states whether this should become a UniLab optimization, a Motrix renderer issue, or no action.

Validation plan

Run on the H100 target:

uv run eval --algo ppo --task g1_motion_tracking --sim motrix --load-run <run_id> \
  --render-mode record

Then compare with a non-record baseline:

uv run eval --algo ppo --task g1_motion_tracking --sim motrix --load-run <run_id> \
  --render-mode none

If the record path is much slower, add timing around the Motrix playback loop boundaries:

  • step(obs)
  • backend.capture_video_frame()
  • frame.copy() / Python frame accumulation
  • mediapy.write_video(...)

Attach the generated play_video.mp4 path or artifact if available.

Notes

This issue is scoped to evaluation/video recording performance. It should not change backend selection semantics: use --sim motrix and the task owner YAML rather than overriding training.sim_backend directly.

Metadata

Metadata

Assignees

Labels

area:benchmarkBenchmark recording and evaluation workflowarea:g1G1 robot tasks and infrastructurearea:motrixMotrixSim related worktype:benchmarkBenchmark or evaluation tracking

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions