[WIP] Add new record found by automated search #123

leloykun · 2025-09-11T16:05:16Z

I'm building an RL env for NanoGPT speedrunning and this was one of the good patches found so far. It reduces wallclock time by 1-2 secs on my 8xH100s.

I'm still working on the env, but I'm dropping this here in case others wanna add it to their records.

leloykun · 2025-09-11T17:44:45Z

Weird, switching to another cloud platform for the 8H100s caused a regression... currently double checking if I copy-pasted the right patch...

Gusarich · 2025-09-13T13:53:20Z

it gets about 3.35 val loss in the end when i run it

leloykun · 2025-09-17T04:13:36Z

Yeah, for some reason this works perfectly well on Modal Sandboxes, but not on PrimeIntellect machines (both SXM 8xH100s). This has been making me crazy tbh.

ClassicLarry · 2025-09-22T00:50:43Z

This direction looks promising to me, but might require Nsight Profiler deep dive to fully understand when these streams are getting scheduled. My concern is that if the hardware is deciding when to prioritize this cpu-to-gpu data transfer stream, it might block the main GPU stream in hard to detect ways. Im not sure on the best way to interleave this with the forward and backwards pass, but ideally we can do it in a way that executes consistently across different GPU providers.

add new record found by automated search

3963528

leloykun marked this pull request as draft September 11, 2025 17:27

bernard24 mentioned this pull request Sep 11, 2025

New WR 156s (1.25% better than PR #122): Optimize distributed training, improve skip connection gating, and enhance bfloat16 usage #125

Merged

leloykun mentioned this pull request Sep 17, 2025

New WR 153.9s: Asynchronously fetch and index data batches, extend final layer attention window for validation #127

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] Add new record found by automated search #123

[WIP] Add new record found by automated search #123

Uh oh!

leloykun commented Sep 11, 2025

Uh oh!

leloykun commented Sep 11, 2025 •

edited

Loading

Uh oh!

Gusarich commented Sep 13, 2025

Uh oh!

leloykun commented Sep 17, 2025

Uh oh!

ClassicLarry commented Sep 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[WIP] Add new record found by automated search #123

Are you sure you want to change the base?

[WIP] Add new record found by automated search #123

Uh oh!

Conversation

leloykun commented Sep 11, 2025

Uh oh!

leloykun commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Gusarich commented Sep 13, 2025

Uh oh!

leloykun commented Sep 17, 2025

Uh oh!

ClassicLarry commented Sep 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

leloykun commented Sep 11, 2025 •

edited

Loading