GenAI code base practice from Sheng Wang ([email protected])
Implementation of various GenAI models. Most of them have been trained for one epoch.
Currently include
Diffusion models (DDPM, DDIM, Latent diffusion, CFG, DiT)
Reinforcement learning (REINFORCE, A2C, GAE, PPO, DPO, GRPO)
Multi-modal models (LLaVA, BLIP, BLIP2, CLIP)
Pre-training (ViT, MASE, byol, beit, ibot, ssmim, Transformer, MoE, swin Transformer)
Generative models (VAE, UNet, beta-VAE, GAN, VQ-VAE)
- Refactor
trainer.py - Prepare tiny shakespeare
/scripts/data/prepare_shakespeare.py. GPT2 tiktoken 50257 vocab size. 334646 train tokens. 3380 val tokens. - Implement
text_datasets.py - Implement GPT2 and transformer block
- Add Wandb
- Add early stop
- Test GPT2 implementation (check char-level BPE)
- Add unified generator
- Add cosine schedular
- Test GPT2 implementation (check char-level BPE)
- Wrap up GPT2 on tiny shakespeare
- Build LLaVA model
- Build coco data loader
- Re-oragnize LLaVA codebase
- Re-split COCO valid data (original data is too large, we will only consider valid set)
- Finish COCO data loader
- Merge llava and gpt2 using the same trainer
- LLaVA stage 1 training start
- Checkpoint only saves projector, skip optimizer
- Debug and test LLaVA 1 (loss is reasonable, generation is reasonable)
- Manually check stage 1 generation of ckpt with different val loss
- LLaVA stage 2 only unfreeze last 8 layers of LLM
- LLaVA stage 2 data loader and loss
- Train and wrap up stage 2 on coco
- Download TinyImageNet for ViT and SwinTransformer
- LoRA
- Build TinyImageNet dataloader (dict instead of tuple/list)
- Build ViT pipeline
- Take label/loss/metrics out of all forward functions
- Use torchmetrics instead of sklearn (calculate epoch level acc instead of batch level)
- Set up pre-commit to avoid uploading very large files
- Add strong augmentation for ViT
- Add mixup collate and loss (in dataloader, not trainer)
- Accelerate data augmentation (Kornia TODO)
- Train and debug ViT on TinyImageNet
- Wrap up ViT on TinyImageNet
- Implement CutMix
- Refactor and test LLaVA codebase
- Refactor and test GPT2 codebase
- Prepare Imagenet100
- Prepare MAE codebase
- Debug and test MAE on TinyImageNet
- Kick off MAE training on ImageNet100
- Add kv cache