Deep Reinforcement Learning for mobile robot navigation in ROS Gazebo (TD3 + PyTorch). The project uses a simulated Velodyne-like LiDAR and trains a TD3 agent to navigate a Pioneer P3DX robot to random goals while avoiding obstacles. Trained/tested with ROS Noetic on Ubuntu 20.04, Python 3.8.10 and PyTorch 1.10.
Goal-Driven Autonomous Exploration Through Deep Reinforcement Learning — Reinis Cimurs, Il Hong Suh, Jin Han Lee. IEEE RA-L, 2022.
- Clone the repository
cd ~
# replace <your-repo-url> with your repo
git clone https://github.com/infinityengi/goal-driven-td3-nav.git
cd DRL-robot-navigation- Build and run Docker image
# Either use provided helper
docker build -t drl-robot-nav . # builds the image (script documents details)
./run.sh # runs the container and starts helper entrypoints
- Compile the ROS workspace (inside container or on host where ROS Noetic is installed)
cd ~/DRL-robot-navigation/catkin_ws
catkin_make_isolated
source devel_isolated/setup.bash- Environment variables (these are also set by the
run.shscript; repeat if needed)
export ROS_HOSTNAME=localhost
export ROS_MASTER_URI=http://localhost:11311
export ROS_PORT_SIM=11311
export GAZEBO_RESOURCE_PATH=~/DRL-robot-navigation/catkin_ws/src/multi_robot_scenario/launch
source ~/.bashrc
cd ~/DRL-robot-navigation/catkin_ws
source devel_isolated/setup.bash- Run training (start from repository root or TD3 folder)
cd ~/DRL-robot-navigation/TD3
python3 train_velodyne_td3.py- Monitor training with TensorBoard
cd ~/DRL-robot-navigation/TD3
tensorboard --logdir runs
# Open the browser at the address printed by tensorboard (usually http://localhost:6006)- Stop training
- Preferred: press
Ctrl+Cin the terminal running the training script. - If training processes hang or you need a forced-kill:
killall -9 rosout roslaunch rosmaster gzserver nodelet robot_state_publisher gzclient python python3- Test a trained model
cd ~/DRL-robot-navigation/TD3
python3 test_velodyne_td3.pyAdjust <real_time_update_rate> in your Gazebo world file to run simulation faster than real time. Example file location in this project:
catkin_ws/src/multi_robot_scenario/launch/TD3.world
- Increasing the
real_time_update_ratespeeds simulation but may destabilize sensors/plugins if set too high. Try conservative values and test.
- The training is set to launch RViz by default (lightweight). Gazebo GUI (
gzclient) is not launched by default to save GPU resources. - To open Gazebo GUI for a running simulation: open a new terminal and run
gzclient. - To launch GUI automatically, edit the
empty_world.launchincatkin_ws/src/multi_robot_scenario/launchand enable the GUI node (the launch file contains comments where to change this behavior).
Sensor configuration lives in the Velodyne xacro/URDF and the robot xacro where the plugin is called.
Files to check and tune:
catkin_ws/src/velodyne_simulator/velodyne_description/urdf/VLP-16.urdf.xacro— change sample count, min/max angle, etc.catkin_ws/src/multi_robot_scenario/xacro/p3dx/pioneer3dx.xacro— where the Velodyne plugin is included; there you can set FOV, frequency and origin.
Notes:
- Field of View (FOV) is given in radians (left to right). If you need rear sensing, expand the FOV.
- Increase frequency or sample count to get denser scans, but be mindful of CPU/GPU and simulation stability.
- TD3 (Twin Delayed DDPG) is an actor-critic method for continuous control. It uses two critic networks to reduce Q-value overestimation and delays policy updates.
- In this robotics context: the actor outputs continuous linear and angular velocity commands; the critics estimate the Q-value of state-action pairs.
- Observations are laser/LiDAR readings (Velodyne), optionally concatenated with goal polar coordinates.
Overall
TD3 trains an actor network that outputs continuous commands (linear and angular velocities) and two critic networks that estimate the expected return. Using a replay buffer and delayed policy updates, the agent learns to reach randomly placed goals while avoiding obstacles using LiDAR observations.
assets/training/loss_plot.png— add your loss plot here and reference it in the README near the training section.networks/td3_architecture.pngortd3_architecture.svg— visual diagram of your actor/critic networks.training.gif— keep at repo root so it appears at the top of the README (as requested).
Use the issue template under .github/ISSUE_TEMPLATE/bug_report.md to collect environment details and reproduction steps. Example fields to request from reporters:
- OS, ROS distro, Python and PyTorch versions
- Exact command that failed
- TensorBoard screenshot or
runs/folder - Small reproduction (launch file, small bag, or minimal steps)
-
Tutorial series by the original author (installation, environment, training):
- Part 1 (installation): https://medium.com/@reinis_86651/deep-reinforcement-learning-in-mobile-robot-navigation-tutorial-part1-installation-d62715722303
- Part 3 (training): https://medium.com/@reinis_86651/deep-reinforcement-learning-in-mobile-robot-navigation-tutorial-part3-training-13b2875c7b51
- Part 4 (environment): https://medium.com/@reinis_86651/deep-reinforcement-learning-in-mobile-robot-navigation-tutorial-part4-environment-7e4bc672f590
- Part 5 (extra): https://medium.com/@reinis_86651/deep-reinforcement-learning-in-mobile-robot-navigation-tutorial-part5-some-extra-stuff-b744852345ac
-
TD3 algorithm overview and background: search for “TD3 paper” (Twin Delayed DDPG) and the OpenAI/spinningup resources for algorithm intuition.
-
Velodyne simulator used: https://github.com/lmark1/velodyne_simulator
-
IEEE paper and citation on IEEE Xplore: https://ieeexplore.ieee.org/document/9645287

