Skip to content

Commit 5a54fe7

Browse files
Merge branch 'main' into symmetric-reg-new-interface
2 parents d85aa69 + 2dac593 commit 5a54fe7

File tree

59 files changed

+10841
-1374
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

59 files changed

+10841
-1374
lines changed

.github/pull_request_template.md

Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
# What does this PR do ?
2+
<!-- Add a one line overview of what this PR aims to accomplish. -->
3+
4+
:warning: For major changes (either in lines of code or in its impact), please make sure to first share discuss a design-doc with the team.
5+
6+
## Contribution process
7+
8+
```mermaid
9+
flowchart LR
10+
A[Pre-checks] --> B[PR Tests]
11+
subgraph Code Review/Approval
12+
C1[Expert Review] --> C2[Final Review]
13+
end
14+
B --> C1
15+
C2 --> D[Merge]
16+
```
17+
18+
### Pre-checks
19+
20+
- [ ] I want this PR in a versioned release and have added the appropriate Milestone (e.g., `Core 0.8`)
21+
- [ ] I have added relevant unit tests
22+
- [ ] I have added relevant functional tests
23+
- [ ] I have added proper typing to my code [Typing guidelines](https://docs.python.org/3/library/typing.html)
24+
- [ ] I have added relevant documentation
25+
- [ ] I have run the [autoformatter.sh](https://github.com/NVIDIA/Megatron-LM/blob/main/tools/autoformat.sh) on my PR
26+
27+
### Code review
28+
29+
The following process is enforced via the CODEOWNERS file for changes into `megatron/core`. For changes outside of `megatron/core`, it is up to the PR author whether or not to tag the Final Reviewer team.
30+
31+
<details>
32+
<summary>For MRs into `main` branch</summary>
33+
34+
#### (Step 1): Add PR label `Expert Review`
35+
36+
#### (Step 2): Collect the expert reviewers reviews
37+
38+
1. Attach the `Expert Review` label when your PR is ready for review.
39+
2. GitHub auto-assigns expert reviewers based on your changes. They will get notified and pick up your PR soon.
40+
41+
:warning: Only proceed to the next step once all reviewers have approved, merge-conflict are resolved and the CI is passing.
42+
Final Review might get declined if these requirements are not fulfilled.
43+
44+
#### (Step 3): Final Review
45+
46+
1. Add `Final Review` label
47+
2. GitHub auto-assigns final reviewers based on your changes. They will get notified and pick up your PR soon.
48+
49+
#### (Optional Step 4): Cherry-pick into release branch
50+
51+
If this PR also needs to be merged into `core_r*` release branches, after this PR has been merged, select `Cherry-pick` to open a new PR into the release branch.
52+
53+
</details>
54+
55+
<details>
56+
<summary>For MRs into `dev` branch</summary>
57+
The proposed review process for `dev` branch is under active discussion.
58+
59+
MRs are mergable after one approval by either `[email protected]` or `[email protected]`.
60+
</details>
61+
62+
### Merging your PR
63+
64+
Any member of [core-adlr](https://github.com/orgs/teams/NVIDIA/core-adlr) and [`core-nemo`](https://github.com/orgs/teams/NVIDIA/core-nemo) will be able to merge your PR.
Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
name: Auto-assign Milestone to PR
2+
3+
on:
4+
push:
5+
branches:
6+
- "pull-request/[0-9]+"
7+
8+
permissions:
9+
contents: read
10+
pull-requests: write
11+
issues: write
12+
13+
jobs:
14+
assign-milestone:
15+
runs-on: ubuntu-latest
16+
environment: nemo-ci
17+
steps:
18+
- name: Get PR info
19+
id: get-pr-info
20+
if: startsWith(github.ref, 'refs/heads/pull-request/')
21+
uses: nv-gha-runners/get-pr-info@main
22+
23+
- name: Check if PR has milestone
24+
id: check_milestone
25+
env:
26+
GH_TOKEN: ${{ secrets.PAT }}
27+
run: |
28+
MILESTONE=$(gh pr view ${{ fromJSON(steps.get-pr-info.outputs.pr-info || '{}').number }} \
29+
--repo ${{ github.repository }} \
30+
--json milestone \
31+
--jq '.milestone.title')
32+
33+
if [ "$MILESTONE" = "null" ] || [ -z "$MILESTONE" ]; then
34+
echo "has_milestone=false" >> $GITHUB_OUTPUT
35+
else
36+
echo "has_milestone=true" >> $GITHUB_OUTPUT
37+
echo "PR already has milestone: $MILESTONE"
38+
fi
39+
40+
- name: Get most recent open milestone
41+
if: steps.check_milestone.outputs.has_milestone == 'false'
42+
id: get_milestone
43+
env:
44+
GH_TOKEN: ${{ secrets.PAT }}
45+
run: |
46+
# Get the most recent open milestone (sorted by due date, then by creation date)
47+
MILESTONE_NUMBER=$(gh api \
48+
"repos/${{ github.repository }}/milestones?state=open&sort=due_on&direction=desc" \
49+
--jq '.[0].number')
50+
51+
MILESTONE_TITLE=$(gh api \
52+
"repos/${{ github.repository }}/milestones?state=open&sort=due_on&direction=desc" \
53+
--jq '.[0].title')
54+
55+
if [ -z "$MILESTONE_NUMBER" ] || [ "$MILESTONE_NUMBER" = "null" ]; then
56+
echo "No open milestones found"
57+
echo "milestone_found=false" >> $GITHUB_OUTPUT
58+
else
59+
echo "milestone_found=true" >> $GITHUB_OUTPUT
60+
echo "milestone_number=$MILESTONE_NUMBER" >> $GITHUB_OUTPUT
61+
echo "milestone_title=$MILESTONE_TITLE" >> $GITHUB_OUTPUT
62+
echo "Found milestone: $MILESTONE_TITLE (number: $MILESTONE_NUMBER)"
63+
fi
64+
65+
- name: Assign milestone to PR
66+
if: steps.check_milestone.outputs.has_milestone == 'false' && steps.get_milestone.outputs.milestone_found == 'true'
67+
env:
68+
GH_TOKEN: ${{ secrets.PAT }}
69+
run: |
70+
gh pr edit ${{ fromJSON(steps.get-pr-info.outputs.pr-info || '{}').number }} \
71+
--repo ${{ github.repository }} \
72+
--milestone "${{ steps.get_milestone.outputs.milestone_title }}"
73+
74+
echo "✅ Assigned milestone '${{ steps.get_milestone.outputs.milestone_title }}' to PR #${{ fromJSON(steps.get-pr-info.outputs.pr-info || '{}').number }}"
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
# Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
3+
name: Auto Reminder Bot
4+
5+
on:
6+
workflow_dispatch:
7+
schedule:
8+
- cron: "0 12 * * *"
9+
10+
jobs:
11+
run-script:
12+
environment: main
13+
name: Run Auto Reminder Bot
14+
runs-on: ubuntu-latest
15+
steps:
16+
- name: Check out repository code
17+
uses: actions/checkout@v4
18+
19+
- name: Set up Python
20+
uses: actions/setup-python@v5
21+
with:
22+
python-version: "3.10"
23+
24+
- name: Install dependencies
25+
run: |
26+
pip install --no-cache-dir PyGithub slack-sdk
27+
28+
- name: Run Auto Reminder Bot
29+
run: |
30+
export SLACK_TOKEN=${{ secrets.SLACK_TOKEN }}
31+
export SLACK_WEBHOOK_URL=${{ secrets.SLACK_WEBHOOK_URL }}
32+
export GH_TOKEN=${{ secrets.PAT }}
33+
python tests/test_utils/python_scripts/auto_reminder_github.py
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
# Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
3+
name: Auto Swap Labels
4+
on:
5+
pull_request_review:
6+
types: [submitted]
7+
8+
jobs:
9+
check-approval:
10+
runs-on: ubuntu-latest
11+
if: github.event.review.state == 'approved'
12+
environment: nemo-ci
13+
steps:
14+
- name: Check out repository code
15+
uses: actions/checkout@v4
16+
17+
- name: Set up Python
18+
uses: actions/setup-python@v5
19+
with:
20+
python-version: "3.10"
21+
22+
- name: Install dependencies
23+
run: |
24+
pip install --no-cache-dir PyGithub slack-sdk
25+
26+
- name: Run Auto Reminder Bot
27+
run: |
28+
export GH_TOKEN=${{ secrets.PAT }}
29+
export PR_NUMBER=${{ github.event.pull_request.number }}
30+
python tests/test_utils/python_scripts/swap_pr_labels.py

.github/workflows/cicd-approve-test-queue.yml

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ jobs:
2525
environment: main
2626
strategy:
2727
matrix:
28-
branch: [main, dev]
28+
branch: [main, dev, others]
2929
steps:
3030
- name: Checkout repository
3131
uses: actions/checkout@v4
@@ -44,6 +44,7 @@ jobs:
4444
env:
4545
GITHUB_TOKEN: ${{ secrets.PAT }}
4646
MAX_CONCURRENCY: ${{ vars.MAX_CONCURRENCY || 1 }}
47+
PYTHONUNBUFFERED: 1
4748
shell: python
4849
run: |
4950
import os
@@ -99,7 +100,10 @@ jobs:
99100
return False
100101
101102
base_branch = pr_info.get("base", {}).get("ref")
102-
if base_branch == target_branch:
103+
if (
104+
(base_branch == target_branch) or
105+
(base_branch != "main" and base_branch != "dev" and target_branch == "others")
106+
):
103107
print(f"PR #{pr_number} targets {target_branch}")
104108
return True
105109

.github/workflows/cicd-main.yml

Lines changed: 13 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414
name: CICD Megatron-LM
1515
on:
1616
schedule:
17-
- cron: "0 */2 * * *"
17+
- cron: 0 0 * * *
1818
push:
1919
branches:
2020
- dev
@@ -23,6 +23,7 @@ on:
2323
- "deploy-release/*"
2424
merge_group:
2525
types: [checks_requested]
26+
workflow_dispatch:
2627

2728
concurrency:
2829
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}-${{ github.event.label.name || 'main' }}-${{ github.event_name }}
@@ -148,7 +149,7 @@ jobs:
148149
149150
pre-flight:
150151
needs: [is-not-external-contributor]
151-
uses: NVIDIA-NeMo/FW-CI-templates/.github/workflows/[email protected].5
152+
uses: NVIDIA-NeMo/FW-CI-templates/.github/workflows/[email protected].10
152153

153154
linting:
154155
runs-on: ubuntu-latest
@@ -318,7 +319,7 @@ jobs:
318319
- name: Parse unit tests
319320
id: parse-unit-tests
320321
run: |
321-
cat tests/test_utils/recipes/unit-tests.yaml | yq -o json '[.products[].test_case[] | { "bucket": .}]' | jq -c > unit-tests.json
322+
cat tests/test_utils/recipes/unit-tests.yaml | yq -o json '[.products[].test_case[] | { "bucket": .}] | sort_by(.model, .test_case)' | jq -c > unit-tests.json
322323
echo "unit-tests=$(cat unit-tests.json)" | tee -a $GITHUB_OUTPUT
323324
324325
cicd-unit-tests-latest:
@@ -366,6 +367,14 @@ jobs:
366367
- cicd-wait-in-queue
367368
- cicd-container-build
368369
- cicd-unit-tests-latest
370+
if: |
371+
(
372+
success()
373+
|| needs.pre-flight.outputs.is_ci_workload == 'true'
374+
|| needs.pre-flight.outputs.force_run_all == 'true'
375+
)
376+
&& needs.pre-flight.outputs.is_merge_group == 'false'
377+
&& !cancelled()
369378
outputs:
370379
integration-tests: ${{ steps.main.outputs.integration-tests }}
371380
steps:
@@ -490,7 +499,7 @@ jobs:
490499
env:
491500
GH_TOKEN: ${{ github.token }}
492501
RUN_ID: ${{ github.run_id }}
493-
SKIPPING_IS_ALLOWED: ${{ needs.pre-flight.outputs.docs_only == 'true' || needs.pre-flight.outputs.is_deployment_workflow == 'true' || needs.pre-flight.outputs.is_merge_group == 'true' }}
502+
SKIPPING_IS_ALLOWED: ${{ needs.pre-flight.outputs.docs_only == 'true' || needs.pre-flight.outputs.is_deployment_workflow == 'true' || needs.pre-flight.outputs.is_merge_group == 'true' || needs.pre-flight.outputs.is_ci_workload == 'true' }}
494503
run: |
495504
FAILED_JOBS=$(gh run view $GITHUB_RUN_ID --json jobs --jq '[.jobs[] | select(.status == "completed" and .conclusion == "failure")] | length') || echo 0
496505
SKIPPED_JOBS=$(gh run view $GITHUB_RUN_ID --json jobs --jq '[.jobs[] | select(.status == "completed" and .conclusion == "skipped")] | length') || echo 0

.github/workflows/community-bot.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@ on:
2121

2222
jobs:
2323
community-bot:
24-
uses: NVIDIA-NeMo/FW-CI-templates/.github/workflows/_community_bot.yml@v0.49.1
24+
uses: NVIDIA-NeMo/FW-CI-templates/.github/workflows/_community_bot.yml@v0.65.10
2525
secrets:
2626
GH_TOKEN: ${{ secrets.PAT }}
27+
environment: main

.github/workflows/copyright-check.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,8 +30,9 @@ jobs:
3030
needs: [pre-flight]
3131
if: |
3232
!(needs.pre-flight.outputs.docs_only == 'true'
33+
|| needs.pre-flight.outputs.is_merge_group == 'true'
3334
|| needs.pre-flight.outputs.is_deployment_workflow == 'true')
34-
uses: NVIDIA-NeMo/FW-CI-templates/.github/workflows/[email protected].9
35+
uses: NVIDIA-NeMo/FW-CI-templates/.github/workflows/[email protected].11
3536

3637
copyright-check-summary:
3738
needs: [pre-flight, copyright-check]

.gitlab-ci.yml

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,8 @@ workflow:
1818
- if: $CI_PROJECT_NAMESPACE != "ADLR" || ($CI_PIPELINE_SOURCE == "merge_request_event" && $CI_MERGE_REQUEST_PROJECT_PATH != "ADLR/megatron-lm")
1919
when: never
2020

21+
- if: $CI_PIPELINE_SOURCE == "schedule" && ($CI_COMMIT_BRANCH == 'ci-approve-dev' || $CI_COMMIT_BRANCH == 'ci-approve-main')
22+
2123
# ci-branches only for schedule
2224
- if: $CI_COMMIT_BRANCH =~ /ci-/ && $CI_PIPELINE_SOURCE != "schedule"
2325
when: never
@@ -31,15 +33,15 @@ workflow:
3133
- if: $CI_PIPELINE_SOURCE == "web"
3234

3335
# For push to main
34-
- if: $CI_PIPELINE_SOURCE == 'push' && ($CI_COMMIT_BRANCH == "main" || $CI_COMMIT_BRANCH == "dev")
36+
- if: $CI_PIPELINE_SOURCE == 'push' && ($CI_COMMIT_BRANCH == "main" || $CI_COMMIT_BRANCH == "dev" || $CI_COMMIT_BRANCH =~ /^core_/)
3537
variables:
3638
UNIT_TEST: "no"
3739
INTEGRATION_TEST: "no"
3840
FUNCTIONAL_TEST: "yes"
3941
FUNCTIONAL_TEST_SCOPE: mr
4042
FUNCTIONAL_TEST_REPEAT: 5
4143
FUNCTIONAL_TEST_RECORD_CHECKPOINTS: "no"
42-
FUNCTIONAL_TEST_TIME_LIMIT: 2700
44+
FUNCTIONAL_TEST_TIME_LIMIT: 3600
4345
CLUSTER_A100: ""
4446
CLUSTER_H100: ""
4547
PUBLISH: "no"
@@ -154,6 +156,8 @@ default:
154156
when: runner_system_failure
155157

156158
variables:
159+
BUILD:
160+
value: "yes"
157161
UNIT_TEST:
158162
value: "yes"
159163
options:

0 commit comments

Comments
 (0)