From 27e3971842cd953e9e4d7d1770d143f751908af5 Mon Sep 17 00:00:00 2001 From: abukharin-nv Date: Wed, 11 Jun 2025 14:16:14 +0000 Subject: [PATCH 1/4] Update guide to include minimum compute requirement Signed-off-by: abukharin-nv --- docs/guides/grpo-deepscaler.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/guides/grpo-deepscaler.md b/docs/guides/grpo-deepscaler.md index 5beddf168..daa743b0a 100644 --- a/docs/guides/grpo-deepscaler.md +++ b/docs/guides/grpo-deepscaler.md @@ -19,7 +19,7 @@ At the end of each stage, you need to specify the Hugging Face checkpoint to con uv run examples/convert_dcp_to_hf.py --config=results/grpo-deepscaler-1.5b-8K/step_240/config.yaml --dcp-ckpt-path=results/grpo-deepscaler-1.5b-8K/step_240/policy/weights --hf-ckpt-path=results/grpo-deepscaler-1.5b-8K/step_240/hf ``` -When running the next command, we use the Hugging Face checkpoint as the initial checkpoint. We train with an 8K context window for 240 steps, a 16K context window for 290 steps, and a 24K context window for 50 steps. We run all experiments on a single 8XH100 80GB node or on a single 8XA100 80GB node. +When running the next command, we use the Hugging Face checkpoint as the initial checkpoint. We train with an 8K context window for 240 steps, a 16K context window for 290 steps, and a 24K context window for 50 steps. We run all experiments on a single 8XH100 80GB node. The minimum requirement for training is 8XA100 80GB for 8K training and 8XH100 80GB for 16K and 24K training. ## Training Curve When using the above commands, we get the following training curve: From 1905fd831da8e686d766531570472287f402f75c Mon Sep 17 00:00:00 2001 From: abukharin-nv Date: Wed, 11 Jun 2025 18:05:15 +0000 Subject: [PATCH 2/4] Update min requirements for A100 Signed-off-by: abukharin-nv --- docs/guides/grpo-deepscaler.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/guides/grpo-deepscaler.md b/docs/guides/grpo-deepscaler.md index daa743b0a..f5fd14519 100644 --- a/docs/guides/grpo-deepscaler.md +++ b/docs/guides/grpo-deepscaler.md @@ -19,7 +19,7 @@ At the end of each stage, you need to specify the Hugging Face checkpoint to con uv run examples/convert_dcp_to_hf.py --config=results/grpo-deepscaler-1.5b-8K/step_240/config.yaml --dcp-ckpt-path=results/grpo-deepscaler-1.5b-8K/step_240/policy/weights --hf-ckpt-path=results/grpo-deepscaler-1.5b-8K/step_240/hf ``` -When running the next command, we use the Hugging Face checkpoint as the initial checkpoint. We train with an 8K context window for 240 steps, a 16K context window for 290 steps, and a 24K context window for 50 steps. We run all experiments on a single 8XH100 80GB node. The minimum requirement for training is 8XA100 80GB for 8K training and 8XH100 80GB for 16K and 24K training. +When running the next command, we use the Hugging Face checkpoint as the initial checkpoint. We train with an 8K context window for 240 steps, a 16K context window for 290 steps, and a 24K context window for 50 steps. We run all experiments on a single 8XH100 80GB node. The minimum requirement for training on 8XA100 80GB nodes is one node for 8K training two 8XA100 80GB for 16K and 24K training. ## Training Curve When using the above commands, we get the following training curve: From 684978ce7583e2846be3c4e675e487ab0673b20d Mon Sep 17 00:00:00 2001 From: abukharin-nv Date: Wed, 11 Jun 2025 18:08:18 +0000 Subject: [PATCH 3/4] grammar Signed-off-by: abukharin-nv --- docs/guides/grpo-deepscaler.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/guides/grpo-deepscaler.md b/docs/guides/grpo-deepscaler.md index f5fd14519..2a9bd5391 100644 --- a/docs/guides/grpo-deepscaler.md +++ b/docs/guides/grpo-deepscaler.md @@ -19,7 +19,7 @@ At the end of each stage, you need to specify the Hugging Face checkpoint to con uv run examples/convert_dcp_to_hf.py --config=results/grpo-deepscaler-1.5b-8K/step_240/config.yaml --dcp-ckpt-path=results/grpo-deepscaler-1.5b-8K/step_240/policy/weights --hf-ckpt-path=results/grpo-deepscaler-1.5b-8K/step_240/hf ``` -When running the next command, we use the Hugging Face checkpoint as the initial checkpoint. We train with an 8K context window for 240 steps, a 16K context window for 290 steps, and a 24K context window for 50 steps. We run all experiments on a single 8XH100 80GB node. The minimum requirement for training on 8XA100 80GB nodes is one node for 8K training two 8XA100 80GB for 16K and 24K training. +When running the next command, we use the Hugging Face checkpoint as the initial checkpoint. We train with an 8K context window for 240 steps, a 16K context window for 290 steps, and a 24K context window for 50 steps. We run all experiments on a single 8XH100 80GB node. The minimum requirement for training on 8XA100 80GB nodes is one node for 8K training and two 8XA100 80GB for 16K and 24K training. ## Training Curve When using the above commands, we get the following training curve: From 00b197718bbbd6d595596d262061b3c5df3baa38 Mon Sep 17 00:00:00 2001 From: abukharin-nv Date: Fri, 27 Jun 2025 08:36:54 -0400 Subject: [PATCH 4/4] Update docs/guides/grpo-deepscaler.md Co-authored-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com> Signed-off-by: abukharin-nv --- docs/guides/grpo-deepscaler.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/guides/grpo-deepscaler.md b/docs/guides/grpo-deepscaler.md index 2a9bd5391..a97ac5eab 100644 --- a/docs/guides/grpo-deepscaler.md +++ b/docs/guides/grpo-deepscaler.md @@ -19,7 +19,7 @@ At the end of each stage, you need to specify the Hugging Face checkpoint to con uv run examples/convert_dcp_to_hf.py --config=results/grpo-deepscaler-1.5b-8K/step_240/config.yaml --dcp-ckpt-path=results/grpo-deepscaler-1.5b-8K/step_240/policy/weights --hf-ckpt-path=results/grpo-deepscaler-1.5b-8K/step_240/hf ``` -When running the next command, we use the Hugging Face checkpoint as the initial checkpoint. We train with an 8K context window for 240 steps, a 16K context window for 290 steps, and a 24K context window for 50 steps. We run all experiments on a single 8XH100 80GB node. The minimum requirement for training on 8XA100 80GB nodes is one node for 8K training and two 8XA100 80GB for 16K and 24K training. +When running the next command, we use the Hugging Face checkpoint as the initial checkpoint. We train with an 8K context window for 240 steps, a 16K context window for 290 steps, and a 24K context window for 50 steps. We run all experiments on a single 8XH100 80GB node. If you're running on 8XA100 80GB, you will need at least 1 node for 8K training and 2 nodes for 16-24k training. ## Training Curve When using the above commands, we get the following training curve: