Work type
benchmark
Area
motrix / benchmark / G1
Problem
When evaluating the G1 dance / motion-tracking task on an H100 machine with the Motrix backend, --render-mode record appears visually much slower than expected. We need to measure whether the slowdown comes from Motrix rendering/capture, Python frame collection, video encoding, task stepping, or H100/headless runtime setup.
Current repo evidence:
g1_motion_tracking is the documented dance / motion-tracking eval path.
- Motrix owner configs exist at
conf/ppo/task/g1_motion_tracking/motrix.yaml and conf/appo/task/g1_motion_tracking/motrix.yaml.
- The PPO Motrix owner currently sets
training.play_env_num: 128 and training.play_steps: 1000.
- The eval route supports record mode through:
uv run eval --algo ppo --task g1_motion_tracking --sim motrix --load-run -1 \
--render-mode record
src/unilab/base/backend/motrix/playback.py currently records by initializing the Motrix renderer at 1280x720, capturing one frame per playback step, appending copied frames in Python, then writing play_video.mp4 via mediapy.write_video.
Deliverable
Produce a small benchmark report for the H100 target that answers:
- Actual wall-clock time and effective capture FPS for
g1_motion_tracking Motrix eval with --render-mode record.
- Breakdown, if possible, between simulation step, Motrix frame capture, frame copy/list accumulation, and final
mediapy.write_video encoding.
- Comparison against at least one non-record baseline, for example
--render-mode none or an equivalent eval path without capture.
- Whether the observed slowness is expected for the current capture path or indicates a renderer/capture efficiency issue.
- Recommended next action if there is a bottleneck, such as streaming video encoding, lower capture resolution, batching/async capture, or a Motrix-side renderer profiling task.
Definition of done
- H100 environment is recorded: OS, GPU, driver/CUDA, Python, Motrix package/build, UniLab commit, and exact checkpoint/run id.
- The exact eval command and Hydra overrides are included.
- Benchmark numbers are attached, including total playback time, number of frames, output video duration/FPS, output file path, and output file size.
- A record-vs-non-record comparison is included so we can separate policy/task stepping cost from video capture/encoding cost.
- Conclusion states whether this should become a UniLab optimization, a Motrix renderer issue, or no action.
Validation plan
Run on the H100 target:
uv run eval --algo ppo --task g1_motion_tracking --sim motrix --load-run <run_id> \
--render-mode record
Then compare with a non-record baseline:
uv run eval --algo ppo --task g1_motion_tracking --sim motrix --load-run <run_id> \
--render-mode none
If the record path is much slower, add timing around the Motrix playback loop boundaries:
step(obs)
backend.capture_video_frame()
frame.copy() / Python frame accumulation
mediapy.write_video(...)
Attach the generated play_video.mp4 path or artifact if available.
Notes
This issue is scoped to evaluation/video recording performance. It should not change backend selection semantics: use --sim motrix and the task owner YAML rather than overriding training.sim_backend directly.
Work type
benchmark
Area
motrix / benchmark / G1
Problem
When evaluating the G1 dance / motion-tracking task on an H100 machine with the Motrix backend,
--render-mode recordappears visually much slower than expected. We need to measure whether the slowdown comes from Motrix rendering/capture, Python frame collection, video encoding, task stepping, or H100/headless runtime setup.Current repo evidence:
g1_motion_trackingis the documented dance / motion-tracking eval path.conf/ppo/task/g1_motion_tracking/motrix.yamlandconf/appo/task/g1_motion_tracking/motrix.yaml.training.play_env_num: 128andtraining.play_steps: 1000.uv run eval --algo ppo --task g1_motion_tracking --sim motrix --load-run -1 \ --render-mode recordsrc/unilab/base/backend/motrix/playback.pycurrently records by initializing the Motrix renderer at1280x720, capturing one frame per playback step, appending copied frames in Python, then writingplay_video.mp4viamediapy.write_video.Deliverable
Produce a small benchmark report for the H100 target that answers:
g1_motion_trackingMotrix eval with--render-mode record.mediapy.write_videoencoding.--render-mode noneor an equivalent eval path without capture.Definition of done
Validation plan
Run on the H100 target:
Then compare with a non-record baseline:
If the record path is much slower, add timing around the Motrix playback loop boundaries:
step(obs)backend.capture_video_frame()frame.copy()/ Python frame accumulationmediapy.write_video(...)Attach the generated
play_video.mp4path or artifact if available.Notes
This issue is scoped to evaluation/video recording performance. It should not change backend selection semantics: use
--sim motrixand the task owner YAML rather than overridingtraining.sim_backenddirectly.