Skip to content

Replace MuJoCo environments with Roboschool. #5

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 7 additions & 9 deletions hw1/README.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,13 @@
# CS294-112 HW 1: Imitation Learning

Dependencies: TensorFlow, MuJoCo version 1.31, OpenAI Gym
Dependencies: TensorFlow, OpenAI Gym, Roboschool v1.1

The only file that you need to look at is `run_expert.py`, which is code to load up an expert policy, run a specified number of roll-outs, and save out data.

In `experts/`, the provided expert policies are:
* Ant-v1.pkl
* HalfCheetah-v1.pkl
* Hopper-v1.pkl
* Humanoid-v1.pkl
* Reacher-v1.pkl
* Walker2d-v1.pkl

The name of the pickle file corresponds to the name of the gym environment.
* RoboschoolAnt-v1.py
* RoboschoolHalfCheetah-v1.py
* RoboschoolHopper-v1.py
* RoboschoolHumanoid-v1.py
* RoboschoolReacher-v1.py
* RoboschoolWalker2d-v1.py
Binary file removed hw1/experts/Ant-v1.pkl
Binary file not shown.
Binary file removed hw1/experts/HalfCheetah-v1.pkl
Binary file not shown.
Binary file removed hw1/experts/Hopper-v1.pkl
Binary file not shown.
Binary file removed hw1/experts/Humanoid-v1.pkl
Binary file not shown.
Binary file removed hw1/experts/Reacher-v1.pkl
Binary file not shown.
260 changes: 260 additions & 0 deletions hw1/experts/RoboschoolAnt-v1.py

Large diffs are not rendered by default.

261 changes: 261 additions & 0 deletions hw1/experts/RoboschoolHalfCheetah-v1.py

Large diffs are not rendered by default.

250 changes: 250 additions & 0 deletions hw1/experts/RoboschoolHopper-v1.py

Large diffs are not rendered by default.

472 changes: 472 additions & 0 deletions hw1/experts/RoboschoolHumanoid-v1.py

Large diffs are not rendered by default.

244 changes: 244 additions & 0 deletions hw1/experts/RoboschoolReacher-v1.py

Large diffs are not rendered by default.

257 changes: 257 additions & 0 deletions hw1/experts/RoboschoolWalker2d-v1.py

Large diffs are not rendered by default.

Binary file removed hw1/experts/Walker2d-v1.pkl
Binary file not shown.
58 changes: 0 additions & 58 deletions hw1/load_policy.py

This file was deleted.

81 changes: 38 additions & 43 deletions hw1/run_expert.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,70 +3,65 @@
"""
Code to load an expert policy and generate roll-out data for behavioral cloning.
Example usage:
python run_expert.py experts/Humanoid-v1.pkl Humanoid-v1 --render \
python run_expert.py experts/RoboschoolHumanoid-v1.py --render \
--num_rollouts 20

Author of this script and included expert policies: Jonathan Ho ([email protected])
"""

import argparse
import pickle
import tensorflow as tf
import numpy as np
import tf_util
import gym
import load_policy
import importlib

def main():
import argparse
parser = argparse.ArgumentParser()
parser.add_argument('expert_policy_file', type=str)
parser.add_argument('envname', type=str)
parser.add_argument('--render', action='store_true')
parser.add_argument("--max_timesteps", type=int)
parser.add_argument('--num_rollouts', type=int, default=20,
help='Number of expert roll outs')
args = parser.parse_args()

print('loading and building expert policy')
policy_fn = load_policy.load_policy(args.expert_policy_file)
print('loaded and built')

with tf.Session():
tf_util.initialize()
print('loading expert policy')
module_name = args.expert_policy_file.replace('/', '.')
if module_name.endswith('.py'):
module_name = module_name[:-3]
policy_module = importlib.import_module(module_name)
print('loaded')

import gym
env = gym.make(args.envname)
max_steps = args.max_timesteps or env.spec.timestep_limit
env, policy = policy_module.get_env_and_policy()
max_steps = args.max_timesteps or env.spec.timestep_limit

returns = []
observations = []
actions = []
for i in range(args.num_rollouts):
print('iter', i)
obs = env.reset()
done = False
totalr = 0.
steps = 0
while not done:
action = policy_fn(obs[None,:])
observations.append(obs)
actions.append(action)
obs, r, done, _ = env.step(action)
totalr += r
steps += 1
if args.render:
env.render()
if steps % 100 == 0: print("%i/%i"%(steps, max_steps))
if steps >= max_steps:
break
returns.append(totalr)
returns = []
observations = []
actions = []
for i in range(args.num_rollouts):
print('iter', i)
obs = env.reset()
done = False
totalr = 0.
steps = 0
while not done:
action = policy.act(obs)
observations.append(obs)
actions.append(action)
obs, r, done, _ = env.step(action)
totalr += r
steps += 1
if args.render:
env.render()
if steps % 100 == 0: print("%i/%i"%(steps, max_steps))
if steps >= max_steps:
break
returns.append(totalr)

print('returns', returns)
print('mean return', np.mean(returns))
print('std of return', np.std(returns))
print('returns', returns)
print('mean return', np.mean(returns))
print('std of return', np.std(returns))

expert_data = {'observations': np.array(observations),
'actions': np.array(actions)}
expert_data = {'observations': np.array(observations),
'actions': np.array(actions)}

if __name__ == '__main__':
main()
Loading