Commit 862c632
Context Parallel Squashed MR for models dir (#1337)
Adds context parallelism support to ESM2 models inside
`bionemo-recipes/models/esm2`
- Adds a comprehensive unit test that checks (1) Gradients (2) Loss and
(3) Logits for NON-CP vs CP execution
- Adds a unit test that confirms that padded THD and unpadded THD return
the same logits after the models forward pass
- Adds a Collator capable of computing CP_SHARDs for a given CP rank,
which is amenable for downstream scattering to other GPUs. All in
pre-fetch.
<!-- Provide a detailed description of the changes in this PR -->
#### Usage
<!--- How does a user interact with the changed code -->
```python
TODO: Add code snippet
```
### Type of changes
<!-- Mark the relevant option with an [x] -->
- [ ] Bug fix (non-breaking change which fixes an issue)
- [X] New feature (non-breaking change which adds functionality)
- [ ] Refactor
- [ ] Documentation update
- [ ] Other (please describe):
### CI Pipeline Configuration
Configure CI behavior by applying the relevant labels. By default, only
basic unit tests are run.
-
[ciflow:skip](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:skip)
- Skip all CI tests for this PR
-
[ciflow:notebooks](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:notebooks)
- Run Jupyter notebooks execution tests for bionemo2
-
[ciflow:slow](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:slow)
- Run slow single GPU integration tests marked as @pytest.mark.slow for
bionemo2
-
[ciflow:all](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:all)
- Run all tests (unit tests, slow tests, and notebooks) for bionemo2.
This label can be used to enforce running tests for all bionemo2.
-
[ciflow:all-recipes](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:all-recipes)
- Run tests for all recipes (under bionemo-recipes). This label can be
used to enforce running tests for all recipes.
Unit tests marked as `@pytest.mark.multi_gpu` or
`@pytest.mark.distributed` are not run in the PR pipeline.
For more details, see [CONTRIBUTING](CONTRIBUTING.md)
> [!NOTE]
> By default, only basic unit tests are run. Add appropriate labels to
enable an additional test coverage.
#### Authorizing CI Runs
We use
[copy-pr-bot](https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/#automation)
to manage authorization of CI
runs on NVIDIA's compute resources.
- If a pull request is opened by a trusted user and contains only
trusted changes, the pull request's code will
automatically be copied to a pull-request/ prefixed branch in the source
repository (e.g. pull-request/123)
- If a pull request is opened by an untrusted user or contains untrusted
changes, an NVIDIA org member must leave an
`/ok to test` comment on the pull request to trigger CI. This will need
to be done for each new commit.
### Pre-submit Checklist
<!--- Ensure all items are completed before submitting -->
- [ ] I have tested these changes locally
- [ ] I have updated the documentation accordingly
- [ ] I have added/updated tests as needed
- [ ] All existing tests pass successfully
---------
Signed-off-by: Jonathan Mitchell <[email protected]>
Signed-off-by: Jonathan Mitchell <[email protected]>
Signed-off-by: Jonathan Mitchell <[email protected]>
Signed-off-by: Jonathan Mitchell <[email protected]>
Co-authored-by: Jonathan Mitchell <[email protected]>
Co-authored-by: Jonathan Mitchell <[email protected]>
Co-authored-by: Jonathan Mitchell <[email protected]>1 parent 7c5e7ad commit 862c632
File tree
10 files changed
+880
-8
lines changed- bionemo-recipes
- models/esm2
- src/esm
- tests
- recipes
- esm2_accelerate_te/example_8m_checkpoint
- esm2_native_te
- example_8m_checkpoint
- esm2_peft_te/example_8m_checkpoint
10 files changed
+880
-8
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
24 | 24 | | |
25 | 25 | | |
26 | 26 | | |
| 27 | + | |
27 | 28 | | |
28 | 29 | | |
29 | 30 | | |
| |||
38 | 39 | | |
39 | 40 | | |
40 | 41 | | |
| 42 | + | |
41 | 43 | | |
42 | 44 | | |
43 | 45 | | |
| |||
62 | 64 | | |
63 | 65 | | |
64 | 66 | | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
65 | 70 | | |
66 | 71 | | |
67 | 72 | | |
| |||
111 | 116 | | |
112 | 117 | | |
113 | 118 | | |
| 119 | + | |
114 | 120 | | |
115 | 121 | | |
116 | 122 | | |
| |||
129 | 135 | | |
130 | 136 | | |
131 | 137 | | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
132 | 141 | | |
133 | 142 | | |
134 | 143 | | |
| |||
145 | 154 | | |
146 | 155 | | |
147 | 156 | | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
148 | 161 | | |
149 | 162 | | |
150 | 163 | | |
| |||
227 | 240 | | |
228 | 241 | | |
229 | 242 | | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
230 | 257 | | |
231 | 258 | | |
232 | 259 | | |
| |||
269 | 296 | | |
270 | 297 | | |
271 | 298 | | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
| 310 | + | |
| 311 | + | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
| 315 | + | |
| 316 | + | |
| 317 | + | |
| 318 | + | |
| 319 | + | |
| 320 | + | |
| 321 | + | |
| 322 | + | |
| 323 | + | |
| 324 | + | |
| 325 | + | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
272 | 346 | | |
273 | 347 | | |
274 | 348 | | |
| |||
441 | 515 | | |
442 | 516 | | |
443 | 517 | | |
| 518 | + | |
| 519 | + | |
| 520 | + | |
| 521 | + | |
| 522 | + | |
| 523 | + | |
| 524 | + | |
| 525 | + | |
| 526 | + | |
| 527 | + | |
| 528 | + | |
| 529 | + | |
| 530 | + | |
| 531 | + | |
| 532 | + | |
| 533 | + | |
| 534 | + | |
| 535 | + | |
| 536 | + | |
| 537 | + | |
| 538 | + | |
| 539 | + | |
| 540 | + | |
| 541 | + | |
| 542 | + | |
| 543 | + | |
| 544 | + | |
| 545 | + | |
| 546 | + | |
| 547 | + | |
| 548 | + | |
| 549 | + | |
| 550 | + | |
| 551 | + | |
| 552 | + | |
| 553 | + | |
| 554 | + | |
| 555 | + | |
| 556 | + | |
| 557 | + | |
| 558 | + | |
| 559 | + | |
| 560 | + | |
| 561 | + | |
| 562 | + | |
| 563 | + | |
| 564 | + | |
| 565 | + | |
| 566 | + | |
| 567 | + | |
| 568 | + | |
| 569 | + | |
| 570 | + | |
| 571 | + | |
| 572 | + | |
| 573 | + | |
| 574 | + | |
| 575 | + | |
| 576 | + | |
| 577 | + | |
| 578 | + | |
| 579 | + | |
| 580 | + | |
| 581 | + | |
| 582 | + | |
| 583 | + | |
| 584 | + | |
| 585 | + | |
| 586 | + | |
| 587 | + | |
| 588 | + | |
| 589 | + | |
| 590 | + | |
| 591 | + | |
| 592 | + | |
| 593 | + | |
| 594 | + | |
| 595 | + | |
| 596 | + | |
| 597 | + | |
| 598 | + | |
| 599 | + | |
| 600 | + | |
| 601 | + | |
| 602 | + | |
| 603 | + | |
| 604 | + | |
| 605 | + | |
| 606 | + | |
| 607 | + | |
| 608 | + | |
| 609 | + | |
| 610 | + | |
| 611 | + | |
| 612 | + | |
| 613 | + | |
| 614 | + | |
| 615 | + | |
| 616 | + | |
| 617 | + | |
| 618 | + | |
| 619 | + | |
| 620 | + | |
| 621 | + | |
| 622 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
180 | 180 | | |
181 | 181 | | |
182 | 182 | | |
183 | | - | |
184 | 183 | | |
185 | 184 | | |
186 | 185 | | |
| |||
213 | 212 | | |
214 | 213 | | |
215 | 214 | | |
216 | | - | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
217 | 220 | | |
218 | 221 | | |
219 | 222 | | |
| |||
226 | 229 | | |
227 | 230 | | |
228 | 231 | | |
| 232 | + | |
| 233 | + | |
229 | 234 | | |
230 | 235 | | |
| 236 | + | |
231 | 237 | | |
232 | 238 | | |
233 | 239 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
367 | 367 | | |
368 | 368 | | |
369 | 369 | | |
| 370 | + | |
| 371 | + | |
| 372 | + | |
| 373 | + | |
| 374 | + | |
| 375 | + | |
| 376 | + | |
| 377 | + | |
| 378 | + | |
| 379 | + | |
| 380 | + | |
| 381 | + | |
| 382 | + | |
| 383 | + | |
| 384 | + | |
| 385 | + | |
| 386 | + | |
| 387 | + | |
| 388 | + | |
370 | 389 | | |
371 | 390 | | |
372 | 391 | | |
| |||
0 commit comments