Skip to content

wangshenguiuc/genAI_practice

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GenAI code base practice from Sheng Wang ([email protected])

Implementation of various GenAI models. Most of them have been trained for one epoch.

Currently include

Diffusion models (DDPM, DDIM, Latent diffusion, CFG, DiT)

Reinforcement learning (REINFORCE, A2C, GAE, PPO, DPO, GRPO)

Multi-modal models (LLaVA, BLIP, BLIP2, CLIP)

Pre-training (ViT, MASE, byol, beit, ibot, ssmim, Transformer, MoE, swin Transformer)

Generative models (VAE, UNet, beta-VAE, GAN, VQ-VAE)

Progress & TODOs (x for finished, o for on-going)

2025-11-11

  • Refactor trainer.py
  • Prepare tiny shakespeare /scripts/data/prepare_shakespeare.py. GPT2 tiktoken 50257 vocab size. 334646 train tokens. 3380 val tokens.
  • Implement text_datasets.py
  • Implement GPT2 and transformer block
  • Add Wandb
  • Add early stop
  • Test GPT2 implementation (check char-level BPE)

2025-11-12

2025-11-14

  • Build LLaVA model
  • Build coco data loader
  • Re-oragnize LLaVA codebase

2025-11-15

  • Re-split COCO valid data (original data is too large, we will only consider valid set)
  • Finish COCO data loader
  • Merge llava and gpt2 using the same trainer
  • LLaVA stage 1 training start
  • Checkpoint only saves projector, skip optimizer
  • Debug and test LLaVA 1 (loss is reasonable, generation is reasonable)

2025-11-16

  • Manually check stage 1 generation of ckpt with different val loss
  • LLaVA stage 2 only unfreeze last 8 layers of LLM
  • LLaVA stage 2 data loader and loss
  • Train and wrap up stage 2 on coco
  • Download TinyImageNet for ViT and SwinTransformer
  • LoRA

2025-11-17

  • Build TinyImageNet dataloader (dict instead of tuple/list)
  • Build ViT pipeline
  • Take label/loss/metrics out of all forward functions
  • Use torchmetrics instead of sklearn (calculate epoch level acc instead of batch level)
  • Set up pre-commit to avoid uploading very large files
  • Add strong augmentation for ViT
  • Add mixup collate and loss (in dataloader, not trainer)
  • Accelerate data augmentation (Kornia TODO)
  • Train and debug ViT on TinyImageNet
  • Wrap up ViT on TinyImageNet

2025-11-19

  • Implement CutMix
  • Refactor and test LLaVA codebase
  • Refactor and test GPT2 codebase
  • Prepare Imagenet100
  • Prepare MAE codebase
  • Debug and test MAE on TinyImageNet
  • Kick off MAE training on ImageNet100

TODO

  • Add kv cache

About

various models for genAI interview

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published