Skip to content

Releases: NVIDIA/dgxc-benchmarking

v26.02

24 Mar 16:23
1b172ed

Choose a tag to compare

Added

  • B300 support
    • Pretrain recipes: Llama 3.1, DeepSeek V3, Nemotron-H, Qwen3
    • NCCL benchmark
    • CPU overhead microbenchmark
  • GPT-OSS pretrain recipe.
  • DeepSeek V3 Torchtitan FP8 support for GB300 and GB200.
  • DeepSeek V3 proxy models for 64 GB300/GB200 GPUs.
  • System info script for IB, container, and enroot diagnostics.
  • llmb-run archive command to package experiment logs into tarball.
  • Exemplar program documentation and tooling.

Changed

  • Updated recipes to NeMo 26.02.00 where applicable.
  • Llama3 LoRa finetuning ported to Megatron Bridge.
  • Torchtitan optimizations for DeepSeek V3.
  • Centralized peak throughput (TFLOP/GPU) as primary performance metric in READMEs.
  • Qwen3 235B GB200 removed FP8 support.

Removed

  • Run:ai support.

Known Issues

  • Recipes using NeMo 26.02.00 container will not work with EFA, see Known Issues section of README for workaround.
  • DeepSeek V3 on EFA clusters may encounter connectivity issues.

v25.12.02

12 Feb 17:07
42ff42a

Choose a tag to compare

Fixed

  • Pin uv to <=0.9.28 in install.sh to avoid strict parsing failures when installing pinned nemo_run commits with uv 0.9.29+.

v25.10.02

12 Feb 17:06
e9153ec

Choose a tag to compare

Fixed

  • Pin uv to <=0.9.28 in install.sh to avoid strict parsing failures when installing pinned nemo_run commits with uv 0.9.29+.

v25.08.02

12 Feb 17:05
4eca801

Choose a tag to compare

Fixed

  • Pin uv to <=0.9.28 in install.sh to avoid strict parsing failures when installing pinned nemo_run commits with uv 0.9.29+.

v25.12.01

05 Feb 21:46
cf259e0

Choose a tag to compare

[v25.12.01] - 2026-02-05

Changed

  • For Megatron Bridge models, download model configs in addition to tokenizers.
  • Add --container-writable flag to Megatron Bridge SLURM job scripts.
  • Use the passthrough packager for Megatron Bridge recipes.
  • Standardize Torchtitan log location and naming.
  • DSV3 B200 scales to match tested configurations.

Fixed

  • Inference and microbenchmark job submission.
  • Headless installation.
  • Ensure Qwen handles custom mounts correctly.
  • Resolve llmb-install Transformers version issues.
  • Llama3.1 70b scale documentation for H100.

Known Issues

  • Qwen3 requires internet connectivity and may encounter Hugging Face Hub access or rate limit errors during benchmark runs.

v25.12

08 Jan 00:02
db1b14c

Choose a tag to compare

Added

  • Qwen3 pretrain recipes 30B-A3B and 235B-A22B.
  • DeepSeek V3 Torchtitan pretrain recipe.

Changed

  • Updated recipes to NeMo 25.11.01 where applicable.
  • Consolidated llmb-run submit commands (see cli/llmb-run/CHANGELOG.md for details).

v25.10.01

06 Jan 18:09
90f511b

Choose a tag to compare

Added

  • NVCF support to inference recipes deployable via Helm Charts.
  • Offline mode support for Grok1 and Nemotron4 (15B and 340B) pretrain recipes on SLURM clusters. Tokenizers are pre-downloaded during installation and mounted into containers at runtime, eliminating the need for HuggingFace API access during workload execution.

Fixed

  • Fixed Nemotron 340B runtime failures caused by rate limiting (HTTP 429 errors) when connecting to HuggingFace Hub. The workload now operates in offline mode using pre-downloaded tokenizer files, preventing API rate limit exhaustion during training runs.

v25.10

04 Dec 18:54
8e77d60

Choose a tag to compare

[v25.10] - 2025-12-03

Added

  • GB300 support
    • Pretrain recipes: Nemotron4, Llama3.1, DS V3, Grok1 and Nemotron-H
  • Micro-benchmark for measuring CPU overhead
  • NCCL benchmark
  • Inference recipes deployable via Helm Charts for K8s platform
  • GPT OSS inference recipes for Dynamo K8s platform
  • Llama3 LoRa finetuning recipe

Changed

  • Updated DS V3, Grok1, Llama 3.1, Nemotron4 and Nemotron-H pretrain and finetune recipes to reduce install footprint
  • Updated to NeMo 25.09.00 where applicable

Removed

  • DeepSeek R1 NIM inference recipe
  • RAG Blueprint inference recipe
  • Llama4 pretrain, fine tuning, inference recipes

v25.08.01

07 Nov 20:03
4376ecc

Choose a tag to compare

Changed

  • Enforce 'nvidia-modelopt==0.35.1' during install to workaround bug caused by latest torch version.
  • Switched inference downloads to use 'hf download' instead of 'git clone'.
  • Updated llmb-run and llmb-install packages
  • Updated 'install.sh' script.
  • Updated inf_nim/deepseek-r1 launch script.

v25.05.05

22 Oct 21:37
c5c765f

Choose a tag to compare

Changed

  • Llama3.1 Documentation - GA parameter fix