-
Notifications
You must be signed in to change notification settings - Fork 510
Pull requests: KellerJordan/modded-nanogpt
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
fix: use per-rank device for dummy backward to avoid wasted GPU memory on rank 0
#161
opened Nov 23, 2025 by
staghado
Loading…
New Medium WR - Remove a redundant op while creating block masks, -220 ms
#157
opened Nov 13, 2025 by
manikbhandari
Loading…
New WR: Preconditioned orthogonalization for faster Muon optimizer
#155
opened Nov 11, 2025 by
thib-s
Loading…
New Medium WR: -8.5s; re-do of #138; includes changes from #137
#139
opened Oct 4, 2025 by
snimu
Loading…
New medium track WR: 1404s. Snoo Optimizer. Includes #124 and #119
#128
opened Sep 16, 2025 by
dominikkallusky
Loading…
New medium track WR: Second input embedding (1412 seconds); includes #119
#124
opened Sep 11, 2025 by
snimu
Loading…
record 2025-08-28; medium track; two more value-embeddings
#119
opened Aug 28, 2025 by
snimu
Loading…
ProTip!
Mix and match filters to narrow down what you’re looking for.