Skip to content

GeneHit/drl_practice

Repository files navigation

drl_practice

Practice the Deep Reinforcement Learning (DRL) with the gymnasium.

  • Easy hands-on on our laptop (like Mac/window/linux).
  • No long-time training.

How to practice

Check the Command Guide for the step-by-step commands:

  • Create the conda env with pip.
  • Exercise
    1. For a exercise, implement all NotImplementedErrors in the *_exercise.py file .
    2. then train it with the provided command.
    3. [Optional] generate the video and push the video/result to the HuggingFace.

Exercises

Don't choose too hard game and big neural network. But you can try it by yourself.

Exercise Algorithm Verification Game For Challenge State Action
1. q_learning Q Table FrozenLake Taxi 📊 📊
2. dqn Deep Q Network -> Rainbow 1D LunarLander-v3 img LunarLander-v3 🌊 📊
3. reinforce Reinforce (Monte Carlo) CartPole-v1 - 🌊 📊
4. curiosity Curiosity (Reinforce, baseline, shaping reward) - MountainCar-v0 🌊 📊
5. A2C A2C+GAE (or A2C+TD-n) CartPole-v1 LunarLander-v3 🌊 📊
6. A3C A3C (using A2C+GAE) CartPole-v1 LunarLander-v3 🌊 📊
7. PPO PPO CartPole-v1 LunarLander-v3 🌊 📊
8. TD3 Twin Delayed DDPG (TD3) Pendulum-v1 Walker2d-v5 🌊 🌊
9. SAC SAC (Soft Actor-Critic) Pendulum-v1 Walker2d-v5 🌊 🌊
10. PPO+DDP PPO+Curiosity Reacher-v5 Pusher-v5 🌊 🌊
11. SAC+DDP SAC+PER Reacher-v5 Pusher-v5 🌊 🌊
12. MBPO Model-based Policy Optim. Pusher-v5 Walker2d-v5 🌊 🌊

where, 🌊: Continuous, 📊: Discrete

Motivation

After studying the HuggingFace's DRL course and Pieter Abbeel's The Foundations of Deep RL in 6 Lectures, I want to have a deeper and broader understanding through the coding.

Other

  1. RL Algorithms
  2. OpenAI's Spining Up
  3. Stable Baseline3