Releases: NVIDIA/dgxc-benchmarking
Releases · NVIDIA/dgxc-benchmarking
v26.02
Added
- B300 support
- Pretrain recipes: Llama 3.1, DeepSeek V3, Nemotron-H, Qwen3
- NCCL benchmark
- CPU overhead microbenchmark
- GPT-OSS pretrain recipe.
- DeepSeek V3 Torchtitan FP8 support for GB300 and GB200.
- DeepSeek V3 proxy models for 64 GB300/GB200 GPUs.
- System info script for IB, container, and enroot diagnostics.
llmb-run archivecommand to package experiment logs into tarball.- Exemplar program documentation and tooling.
Changed
- Updated recipes to NeMo 26.02.00 where applicable.
- Llama3 LoRa finetuning ported to Megatron Bridge.
- Torchtitan optimizations for DeepSeek V3.
- Centralized peak throughput (TFLOP/GPU) as primary performance metric in READMEs.
- Qwen3 235B GB200 removed FP8 support.
Removed
- Run:ai support.
Known Issues
- Recipes using NeMo 26.02.00 container will not work with EFA, see Known Issues section of README for workaround.
- DeepSeek V3 on EFA clusters may encounter connectivity issues.
v25.12.02
v25.10.02
v25.08.02
v25.12.01
[v25.12.01] - 2026-02-05
Changed
- For Megatron Bridge models, download model configs in addition to tokenizers.
- Add
--container-writableflag to Megatron Bridge SLURM job scripts. - Use the passthrough packager for Megatron Bridge recipes.
- Standardize Torchtitan log location and naming.
- DSV3 B200 scales to match tested configurations.
Fixed
- Inference and microbenchmark job submission.
- Headless installation.
- Ensure Qwen handles custom mounts correctly.
- Resolve
llmb-installTransformers version issues. - Llama3.1 70b scale documentation for H100.
Known Issues
- Qwen3 requires internet connectivity and may encounter Hugging Face Hub access or rate limit errors during benchmark runs.
v25.12
v25.10.01
Added
- NVCF support to inference recipes deployable via Helm Charts.
- Offline mode support for Grok1 and Nemotron4 (15B and 340B) pretrain recipes on SLURM clusters. Tokenizers are pre-downloaded during installation and mounted into containers at runtime, eliminating the need for HuggingFace API access during workload execution.
Fixed
- Fixed Nemotron 340B runtime failures caused by rate limiting (HTTP 429 errors) when connecting to HuggingFace Hub. The workload now operates in offline mode using pre-downloaded tokenizer files, preventing API rate limit exhaustion during training runs.
v25.10
[v25.10] - 2025-12-03
Added
- GB300 support
- Pretrain recipes: Nemotron4, Llama3.1, DS V3, Grok1 and Nemotron-H
- Micro-benchmark for measuring CPU overhead
- NCCL benchmark
- Inference recipes deployable via Helm Charts for K8s platform
- GPT OSS inference recipes for Dynamo K8s platform
- Llama3 LoRa finetuning recipe
Changed
- Updated DS V3, Grok1, Llama 3.1, Nemotron4 and Nemotron-H pretrain and finetune recipes to reduce install footprint
- Updated to NeMo 25.09.00 where applicable
Removed
- DeepSeek R1 NIM inference recipe
- RAG Blueprint inference recipe
- Llama4 pretrain, fine tuning, inference recipes
v25.08.01
Changed
- Enforce 'nvidia-modelopt==0.35.1' during install to workaround bug caused by latest torch version.
- Switched inference downloads to use 'hf download' instead of 'git clone'.
- Updated llmb-run and llmb-install packages
- Updated 'install.sh' script.
- Updated inf_nim/deepseek-r1 launch script.