Add continuous control environments

The goal of this PR is to extend the RL-CBF method to work on continuous control environments. 

Environments:
- Gymnasium mujoco locomotion envs

### Milestones
- [ ] Try RL-CBF directly by discretizing the environment. 
- [ ] Work out theory for continuous envs
- [ ] Implement CBF training & verification
- [ ] Evaluate RL-CBF vs vanilla RL on safety performance