Practice the Deep Reinforcement Learning (DRL) with the gymnasium.
- Easy hands-on on our laptop (like Mac/window/linux).
- No long-time training.
Check the Command Guide for the step-by-step commands:
- Create the conda env with pip.
- Exercise
- For a exercise, implement all
NotImplementedError
s in the*_exercise.py
file . - then train it with the provided command.
- [Optional] generate the video and push the video/result to the HuggingFace.
- For a exercise, implement all
Don't choose too hard game and big neural network. But you can try it by yourself.
Exercise | Algorithm | Verification Game | For Challenge | State | Action |
---|---|---|---|---|---|
1. q_learning | Q Table | FrozenLake | Taxi | 📊 | 📊 |
2. dqn | Deep Q Network -> Rainbow | 1D LunarLander-v3 | img LunarLander-v3 | 🌊 | 📊 |
3. reinforce | Reinforce (Monte Carlo) | CartPole-v1 | - | 🌊 | 📊 |
4. curiosity | Curiosity (Reinforce, baseline, shaping reward) | - | MountainCar-v0 | 🌊 | 📊 |
5. A2C | A2C+GAE (or A2C+TD-n) | CartPole-v1 | LunarLander-v3 | 🌊 | 📊 |
6. A3C | A3C (using A2C+GAE) | CartPole-v1 | LunarLander-v3 | 🌊 | 📊 |
7. PPO | PPO | CartPole-v1 | LunarLander-v3 | 🌊 | 📊 |
8. TD3 | Twin Delayed DDPG (TD3) | Pendulum-v1 | Walker2d-v5 | 🌊 | 🌊 |
9. SAC | SAC (Soft Actor-Critic) | Pendulum-v1 | Walker2d-v5 | 🌊 | 🌊 |
10. PPO+DDP | PPO+Curiosity | Reacher-v5 | Pusher-v5 | 🌊 | 🌊 |
11. SAC+DDP | SAC+PER | Reacher-v5 | Pusher-v5 | 🌊 | 🌊 |
12. MBPO | Model-based Policy Optim. | Pusher-v5 | Walker2d-v5 | 🌊 | 🌊 |
where, 🌊: Continuous, 📊: Discrete
After studying the HuggingFace's DRL course and Pieter Abbeel's The Foundations of Deep RL in 6 Lectures, I want to have a deeper and broader understanding through the coding.
- RL Algorithms
- OpenAI's Spining Up
- Stable Baseline3