Question Regarding Evaluation Results with Gr00t-n1 Checkpoint on Simpler Benchmark

Hello,

I hope you're doing well. I recently had the chance to explore your excellent work and really appreciate the efforts you've put into releasing the code and models.

I'm currently trying to evaluate the pretrained checkpoint InternRobotics/Gr00t-n1_BridgeDataV2 on the Simpler benchmark. For this, I used the following command:
`python scripts/eval/start_evaluator.py --config run_configs/eval/gr00t_on_simpler_widowx.py`

https://github.com/user-attachments/assets/5a225344-5d69-4614-80b7-cec95919a787

However, I observed that all episodes ended in failure, resulting in a success rate of 0%. I'm wondering if I might have missed some important settings or preprocessing steps during the evaluation. Could you kindly let me know what success rate you observed when evaluating the same model on this benchmark?

Any guidance or suggestions would be greatly appreciated. Thank you very much for your time and for sharing your work with the community.

put_carrot_on_plate.json
`    "widowx_carrot_on_plate": [
        {
            "policy_setup": "widowx_bridge",
            "task_name": "widowx_carrot_on_plate",
            "env_name": "PutCarrotOnPlateInScene-v0",
            "scene_name": "bridge_table_1_v1",
            "robot": "widowx",
            "control_freq": 20,
            "sim_freq": 500,
            "max_episode_steps": 240,`


`eval_cfg = EvalCfg(
    eval_type="simpler",
    agent=AgentCfg(
        agent_type="gr00t_n1",
        model_name_or_path="./InternRobotics/Gr00t-n1_BridgeDataV2",
        agent_settings={
            "policy_setup": "bridgedata_v2",
            "action_scale": 1.0,
            "exec_horizon": 1,
            "action_ensemble_temp": -0.8,
            "embodiment_tag": "new_embodiment",
            "denoising_steps": 16,
        },
        server_cfg=ServerCfg(
            server_host="localhost",
            server_port=5000,
        ),
    ),
    env=EnvCfg(
        env_type="simpler",
        device_id=0,
        episodes_config_path=[
                f"{Path(__file__).absolute().parents[2]}/internmanip/benchmarks/utils/SimplerEnv/widowx_bridge/visual_matching/put_carrot_on_plate.json",
                f"{Path(__file__).absolute().parents[2]}/internmanip/benchmarks/utils/SimplerEnv/widowx_bridge/visual_matching/put_eggplant_in_basket.json",
                f"{Path(__file__).absolute().parents[2]}/internmanip/benchmarks/utils/SimplerEnv/widowx_bridge/visual_matching/put_spoon_on_towel.json",
                f"{Path(__file__).absolute().parents[2]}/internmanip/benchmarks/utils/SimplerEnv/widowx_bridge/visual_matching/stack_cube.json",
            ]
    ),`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Question Regarding Evaluation Results with Gr00t-n1 Checkpoint on Simpler Benchmark #6

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question Regarding Evaluation Results with Gr00t-n1 Checkpoint on Simpler Benchmark #6

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions