Clone the repository to your local machine:
git clone https://github.com/yourusername/yourrepository.gitInstall dependencies:
pip install -r requirements.txtThe system can be run in two modes: training and evaluation (-t or -e). By default the system train the MDN layer and the VAE layer on cuda if available.
python main.py -t <save_model_path> [--only_vae true|false]- -t or --train: Activates training mode.
- --save_model_path: Optional: Path to save the trained model. By default is model.pt
- --only_vae: Optional; choose "true" to train only the VAE model (default is "false").
Example:
python main.py -t /home/model_output.pt --only_vae true Evaluation mode:
python main.py -e --save_model_path <save_model_path> --model_path <model_file> --model_type <vae|mdn> [--render]- -e or --evaluate: Activates evaluation mode.
- --save_model_path: (Required) A placeholder path (required by the parser, even if not used for evaluation).
- --model_path: (Required) Path to the model file to be loaded.
- --model_type: (Required) Specify the type of model to evaluate; choices are "vae" or "mdn".
- --render: Optional flag to render the environment during evaluation.
You have to provide the model type for evaluation mode(i.e. vae if your model is trained with VAE only, mdn if your model is trained with MDN)
Example:
python main.py -e dummy_path --model_path /home/valerio/Desktop/WorldModel_CarRacing/model_vae.pt --model_type vae --renderWhen training is started, the system automatically generates a dataset of 300 episodes and trains both the VAE and MDN. These 300 episodes have proven sufficient to obtain a reasonably well-trained agent. Note: These values are hardcoded in the code (specifically in policy.py) and can be modified if needed (i.e. some preliminary test)
| Only Vae | Full model |
|---|---|
![]() |
![]() |
-
Dataset Generation
- The CarRacing dataset was generated through episodic rollouts using random action sampling, with lazy loading to efficiently manage memory.
- The original dataset (
CarRacingDataset) was then transformed into a latent dataset (LatentDataset) for training the MDN-RNN.
-
VAE Training
- A Variational Autoencoder (VAE) was trained to compress environmental observations into a latent space using the reparameterization trick.
- The loss function combined reconstruction loss and KL-divergence.
- Key parameters: 10,000 episodes, 1,000 frames per episode, 10 epochs, and a learning rate of 1e-4.
-
MDN-RNN Training
- Using latent vectors produced by the VAE, an MDN-RNN (LSTM combined with a Mixture Density Network) was trained to model the temporal dynamics within the latent space.
- Training utilized teacher forcing, with 25 epochs, a batch size of 32, and a learning rate of 1e-4.
-
Controller Training
- A simple linear layer was implemented as the controller, making decisions based on the latent state.
- Two approaches were tested:
- A VAE-only approach, using parallelized CMA-ES, which proved more stable and faster.
- A full model approach (VAE + MDN) trained sequentially, which showed higher performance variability but occasional high rewards.
-
Final Notes
- A more complete report on the work can be found in the repository
world_modl_report.
- A more complete report on the work can be found in the repository

