From 27e3971842cd953e9e4d7d1770d143f751908af5 Mon Sep 17 00:00:00 2001
From: abukharin-nv <abukharin@nvidia.com>
Date: Wed, 11 Jun 2025 14:16:14 +0000
Subject: [PATCH 1/4] Update guide to include minimum compute requirement

Signed-off-by: abukharin-nv <abukharin@nvidia.com>
---
 docs/guides/grpo-deepscaler.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/guides/grpo-deepscaler.md b/docs/guides/grpo-deepscaler.md
index 5beddf168..daa743b0a 100644
--- a/docs/guides/grpo-deepscaler.md
+++ b/docs/guides/grpo-deepscaler.md
@@ -19,7 +19,7 @@ At the end of each stage, you need to specify the Hugging Face checkpoint to con
 uv run examples/convert_dcp_to_hf.py --config=results/grpo-deepscaler-1.5b-8K/step_240/config.yaml --dcp-ckpt-path=results/grpo-deepscaler-1.5b-8K/step_240/policy/weights --hf-ckpt-path=results/grpo-deepscaler-1.5b-8K/step_240/hf
 ```
 
-When running the next command, we use the Hugging Face checkpoint as the initial checkpoint. We train with an 8K context window for 240 steps, a 16K context window for 290 steps, and a 24K context window for 50 steps. We run all experiments on a single 8XH100 80GB node or on a single 8XA100 80GB node.
+When running the next command, we use the Hugging Face checkpoint as the initial checkpoint. We train with an 8K context window for 240 steps, a 16K context window for 290 steps, and a 24K context window for 50 steps. We run all experiments on a single 8XH100 80GB node. The minimum requirement for training is 8XA100 80GB for 8K training and 8XH100 80GB for 16K and 24K training.
 
 ## Training Curve
 When using the above commands, we get the following training curve:

From 1905fd831da8e686d766531570472287f402f75c Mon Sep 17 00:00:00 2001
From: abukharin-nv <abukharin@nvidia.com>
Date: Wed, 11 Jun 2025 18:05:15 +0000
Subject: [PATCH 2/4] Update min requirements for A100

Signed-off-by: abukharin-nv <abukharin@nvidia.com>
---
 docs/guides/grpo-deepscaler.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/guides/grpo-deepscaler.md b/docs/guides/grpo-deepscaler.md
index daa743b0a..f5fd14519 100644
--- a/docs/guides/grpo-deepscaler.md
+++ b/docs/guides/grpo-deepscaler.md
@@ -19,7 +19,7 @@ At the end of each stage, you need to specify the Hugging Face checkpoint to con
 uv run examples/convert_dcp_to_hf.py --config=results/grpo-deepscaler-1.5b-8K/step_240/config.yaml --dcp-ckpt-path=results/grpo-deepscaler-1.5b-8K/step_240/policy/weights --hf-ckpt-path=results/grpo-deepscaler-1.5b-8K/step_240/hf
 ```
 
-When running the next command, we use the Hugging Face checkpoint as the initial checkpoint. We train with an 8K context window for 240 steps, a 16K context window for 290 steps, and a 24K context window for 50 steps. We run all experiments on a single 8XH100 80GB node. The minimum requirement for training is 8XA100 80GB for 8K training and 8XH100 80GB for 16K and 24K training.
+When running the next command, we use the Hugging Face checkpoint as the initial checkpoint. We train with an 8K context window for 240 steps, a 16K context window for 290 steps, and a 24K context window for 50 steps. We run all experiments on a single 8XH100 80GB node. The minimum requirement for training on 8XA100 80GB nodes is one node for 8K training two 8XA100 80GB for 16K and 24K training.
 
 ## Training Curve
 When using the above commands, we get the following training curve:

From 684978ce7583e2846be3c4e675e487ab0673b20d Mon Sep 17 00:00:00 2001
From: abukharin-nv <abukharin@nvidia.com>
Date: Wed, 11 Jun 2025 18:08:18 +0000
Subject: [PATCH 3/4] grammar

Signed-off-by: abukharin-nv <abukharin@nvidia.com>
---
 docs/guides/grpo-deepscaler.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/guides/grpo-deepscaler.md b/docs/guides/grpo-deepscaler.md
index f5fd14519..2a9bd5391 100644
--- a/docs/guides/grpo-deepscaler.md
+++ b/docs/guides/grpo-deepscaler.md
@@ -19,7 +19,7 @@ At the end of each stage, you need to specify the Hugging Face checkpoint to con
 uv run examples/convert_dcp_to_hf.py --config=results/grpo-deepscaler-1.5b-8K/step_240/config.yaml --dcp-ckpt-path=results/grpo-deepscaler-1.5b-8K/step_240/policy/weights --hf-ckpt-path=results/grpo-deepscaler-1.5b-8K/step_240/hf
 ```
 
-When running the next command, we use the Hugging Face checkpoint as the initial checkpoint. We train with an 8K context window for 240 steps, a 16K context window for 290 steps, and a 24K context window for 50 steps. We run all experiments on a single 8XH100 80GB node. The minimum requirement for training on 8XA100 80GB nodes is one node for 8K training two 8XA100 80GB for 16K and 24K training.
+When running the next command, we use the Hugging Face checkpoint as the initial checkpoint. We train with an 8K context window for 240 steps, a 16K context window for 290 steps, and a 24K context window for 50 steps. We run all experiments on a single 8XH100 80GB node. The minimum requirement for training on 8XA100 80GB nodes is one node for 8K training and two 8XA100 80GB for 16K and 24K training.
 
 ## Training Curve
 When using the above commands, we get the following training curve:

From 00b197718bbbd6d595596d262061b3c5df3baa38 Mon Sep 17 00:00:00 2001
From: abukharin-nv <abukharin@nvidia.com>
Date: Fri, 27 Jun 2025 08:36:54 -0400
Subject: [PATCH 4/4] Update docs/guides/grpo-deepscaler.md

Co-authored-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
Signed-off-by: abukharin-nv <abukharin@nvidia.com>
---
 docs/guides/grpo-deepscaler.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/guides/grpo-deepscaler.md b/docs/guides/grpo-deepscaler.md
index 2a9bd5391..a97ac5eab 100644
--- a/docs/guides/grpo-deepscaler.md
+++ b/docs/guides/grpo-deepscaler.md
@@ -19,7 +19,7 @@ At the end of each stage, you need to specify the Hugging Face checkpoint to con
 uv run examples/convert_dcp_to_hf.py --config=results/grpo-deepscaler-1.5b-8K/step_240/config.yaml --dcp-ckpt-path=results/grpo-deepscaler-1.5b-8K/step_240/policy/weights --hf-ckpt-path=results/grpo-deepscaler-1.5b-8K/step_240/hf
 ```
 
-When running the next command, we use the Hugging Face checkpoint as the initial checkpoint. We train with an 8K context window for 240 steps, a 16K context window for 290 steps, and a 24K context window for 50 steps. We run all experiments on a single 8XH100 80GB node. The minimum requirement for training on 8XA100 80GB nodes is one node for 8K training and two 8XA100 80GB for 16K and 24K training.
+When running the next command, we use the Hugging Face checkpoint as the initial checkpoint. We train with an 8K context window for 240 steps, a 16K context window for 290 steps, and a 24K context window for 50 steps. We run all experiments on a single 8XH100 80GB node. If you're running on 8XA100 80GB, you will need at least 1 node for 8K training and 2 nodes for 16-24k training.
 
 ## Training Curve
 When using the above commands, we get the following training curve: