Skip to content

TD3-based deep RL for goal-driven mobile robot navigation in ROS Noetic + Gazebo. Trains policies from Velodyne LiDAR using PyTorch, Docker-ready, TensorBoard logging.

License

Notifications You must be signed in to change notification settings

infinityengi/goal-driven-td3-nav

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

goal-driven-td3-nav

training

Summary

Deep Reinforcement Learning for mobile robot navigation in ROS Gazebo (TD3 + PyTorch). The project uses a simulated Velodyne-like LiDAR and trains a TD3 agent to navigate a Pioneer P3DX robot to random goals while avoiding obstacles. Trained/tested with ROS Noetic on Ubuntu 20.04, Python 3.8.10 and PyTorch 1.10.

Paper

Goal-Driven Autonomous Exploration Through Deep Reinforcement Learning — Reinis Cimurs, Il Hong Suh, Jin Han Lee. IEEE RA-L, 2022.


Installation (commands)

  1. Clone the repository
cd ~
# replace <your-repo-url> with your repo
git clone https://github.com/infinityengi/goal-driven-td3-nav.git
cd DRL-robot-navigation
  1. Build and run Docker image
# Either use provided helper
docker build -t drl-robot-nav .  # builds the image (script documents details)
./run.sh       # runs the container and starts helper entrypoints
  1. Compile the ROS workspace (inside container or on host where ROS Noetic is installed)
cd ~/DRL-robot-navigation/catkin_ws
catkin_make_isolated
source devel_isolated/setup.bash
  1. Environment variables (these are also set by the run.sh script; repeat if needed)
export ROS_HOSTNAME=localhost
export ROS_MASTER_URI=http://localhost:11311
export ROS_PORT_SIM=11311
export GAZEBO_RESOURCE_PATH=~/DRL-robot-navigation/catkin_ws/src/multi_robot_scenario/launch
source ~/.bashrc
cd ~/DRL-robot-navigation/catkin_ws
source devel_isolated/setup.bash

Training & testing (commands)

  1. Run training (start from repository root or TD3 folder)
cd ~/DRL-robot-navigation/TD3
python3 train_velodyne_td3.py
  1. Monitor training with TensorBoard
cd ~/DRL-robot-navigation/TD3
tensorboard --logdir runs
# Open the browser at the address printed by tensorboard (usually http://localhost:6006)
  1. Stop training
  • Preferred: press Ctrl+C in the terminal running the training script.
  • If training processes hang or you need a forced-kill:
killall -9 rosout roslaunch rosmaster gzserver nodelet robot_state_publisher gzclient python python3
  1. Test a trained model
cd ~/DRL-robot-navigation/TD3
python3 test_velodyne_td3.py

Simulation speed (how to increase)

Adjust <real_time_update_rate> in your Gazebo world file to run simulation faster than real time. Example file location in this project:

catkin_ws/src/multi_robot_scenario/launch/TD3.world
  • Increasing the real_time_update_rate speeds simulation but may destabilize sensors/plugins if set too high. Try conservative values and test.

Gazebo GUI / Visualization

  • The training is set to launch RViz by default (lightweight). Gazebo GUI (gzclient) is not launched by default to save GPU resources.
  • To open Gazebo GUI for a running simulation: open a new terminal and run gzclient.
  • To launch GUI automatically, edit the empty_world.launch in catkin_ws/src/multi_robot_scenario/launch and enable the GUI node (the launch file contains comments where to change this behavior).

Velodyne sensor configuration pointers

Sensor configuration lives in the Velodyne xacro/URDF and the robot xacro where the plugin is called.

Files to check and tune:

  • catkin_ws/src/velodyne_simulator/velodyne_description/urdf/VLP-16.urdf.xacro — change sample count, min/max angle, etc.
  • catkin_ws/src/multi_robot_scenario/xacro/p3dx/pioneer3dx.xacro — where the Velodyne plugin is included; there you can set FOV, frequency and origin.

Notes:

  • Field of View (FOV) is given in radians (left to right). If you need rear sensing, expand the FOV.
  • Increase frequency or sample count to get denser scans, but be mindful of CPU/GPU and simulation stability.

TD3

TD3

  • TD3 (Twin Delayed DDPG) is an actor-critic method for continuous control. It uses two critic networks to reduce Q-value overestimation and delays policy updates.
  • In this robotics context: the actor outputs continuous linear and angular velocity commands; the critics estimate the Q-value of state-action pairs.
  • Observations are laser/LiDAR readings (Velodyne), optionally concatenated with goal polar coordinates.

Overall

TD3 trains an actor network that outputs continuous commands (linear and angular velocities) and two critic networks that estimate the expected return. Using a replay buffer and delayed policy updates, the agent learns to reach randomly placed goals while avoiding obstacles using LiDAR observations.


Training artifacts and recommended files to keep in repo

  • assets/training/loss_plot.png — add your loss plot here and reference it in the README near the training section.
  • networks/td3_architecture.png or td3_architecture.svg — visual diagram of your actor/critic networks.
  • training.gif — keep at repo root so it appears at the top of the README (as requested).

Issue reporting

Use the issue template under .github/ISSUE_TEMPLATE/bug_report.md to collect environment details and reproduction steps. Example fields to request from reporters:

  • OS, ROS distro, Python and PyTorch versions
  • Exact command that failed
  • TensorBoard screenshot or runs/ folder
  • Small reproduction (launch file, small bag, or minimal steps)

Useful references and tutorials

About

TD3-based deep RL for goal-driven mobile robot navigation in ROS Noetic + Gazebo. Trains policies from Velodyne LiDAR using PyTorch, Docker-ready, TensorBoard logging.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published