ReinforcementLearning

This repo will contain scripts and notebooks in which I explore and implement common RL algorithms on agents operating in Gym enviroments

Tabular_SARSA: got a tabular SARSA implementation working (kind of) on the Mountain Car environment

Tabular_QLearning: implemented a Vanilla tabular version of Q-Learning on the Acrobat environment; works relatively well

Function_Approximation_SARSA: implemented SARSA with function approximation on the Mountain Car environment. The function I used was a linear function and the state-action pairs were transformed into features using tile coding.

REINFORCE_with_Function-Approximated_Baseline: implemented the REINFORCE algorithm with a baseline. The baseline estimates the state value function using a linear function. The policy is structured as a neural network. The environment I used to test the algorithm was the CartPole environment. After training, the average reward over 50 episodes was 488 from a max of 500.

PPO: implelmented the Proximal Policy Optimization algorithm on the BlackJack env

Unsloth_Qwen2_5_(3B)_GRPO: In this notebook, I followed Unsloth's tutorial of fine tuning a base LLM with GRPO using the GSM8K dataset; 1 custom tweak I made is that I added a reward function that gives a .001 reward for each character in between the reasoning tags up to a maximum of 100. The rationale behind this was to incentivize longer reasoning chains of thought. In hindsight, I could have increeased the maximum quite a bit because 100 characters is not much. It was pretty interesting to see that after the default of 250 training steps that the model was pushed to reason even though in the example that was run the answer is still wrong. Also, I don't know why the notebook is not rendering properly in GitHub.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
Function_Approximation_SARSA.ipynb		Function_Approximation_SARSA.ipynb
PPO.ipynb		PPO.ipynb
PPO_Bipedal_Walker.ipynb		PPO_Bipedal_Walker.ipynb
README.md		README.md
REINFORCE_with_Function_Approximated_Baseline.ipynb		REINFORCE_with_Function_Approximated_Baseline.ipynb
Tabular_QLearning.ipynb		Tabular_QLearning.ipynb
Tabular_SARSA.ipynb		Tabular_SARSA.ipynb
Unsloth_Qwen2_5_(3B)_GRPO.ipynb		Unsloth_Qwen2_5_(3B)_GRPO.ipynb
tic_tac_toe_ART.ipynb		tic_tac_toe_ART.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ReinforcementLearning

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ReinforcementLearning

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages