The goal of this PR is to extend the RL-CBF method to work on continuous control environments. Environments: - Gymnasium mujoco locomotion envs ### Milestones - [ ] Try RL-CBF directly by discretizing the environment. - [ ] Work out theory for continuous envs - [ ] Implement CBF training & verification - [ ] Evaluate RL-CBF vs vanilla RL on safety performance