This repository implements an environment for the Sum of Cubes problem and tests it under various conditions. The PPO (Proximal Policy Optimization) implementation is based on the Arena 2.3 PPO lecture's Jupyter notebook (https://arena3-chapter2-rl.streamlit.app/[2.3]_PPO).
- Install dependencies:
pip install -r requirements.txt- Configure the experiment using PPOArgs
- Select your target environment
- Start the training process
python run.pyThe PPOArgs class supports the following parameters:
total_timesteps: Total number of actions the experiment will takenum_envs: Number of parallel environments to runmax_k: Maximum value for k (where k = x³ + y³ + z³)learning_rate: Learning rate for the optimization
gammagae_lambda
clip_coefent_coefvf_coef
Note: Different environments may require different parameter settings. You can adjust these in the PPOArgs class within PPO.py. You can also develop your own environment.