diff --git a/benchmarks/llama3-8b_h200_202506_trainy-whitefiber.md b/benchmarks/llama3-8b_h200_202506_trainy-whitefiber.md new file mode 100644 index 000000000..9ba1490f3 --- /dev/null +++ b/benchmarks/llama3-8b_h200_202506_trainy-whitefiber.md @@ -0,0 +1,54 @@ +This was performed by Trainy team on WhiteFiber in June 2025, to get a baseline of performance +of the Trainy platform on H200s platform over multiple hosts. + +### Models + +Llama 3.1 8B + +### Hardware + +Each host has + +- 8 NVIDIA H200 GPUs connected via NVLink. +- Hosts are inter-connected with a backend RDMA fabric with 400Gb/s (Mellanox CX-7) per GPU. + +### Configuration + +Runs were invoked with the following, where `NUM_NODES` was `4` and `8` +``` + torchrun \ + --nnodes $NUM_NODES \ + --nproc_per_node 8 \ + --rdzv_id 101 \ + --rdzv_backend c10d \ + --rdzv_endpoint "$MASTER_ADDR:29500" \ + torchtitan/train.py \ + --job.config-file torchtitan/models/llama3/train_configs/llama3_8b.toml \ + --metrics.enable_wandb \ + --training.local_batch_size=2 \ + --training.compile \ + --model.converters="float8" \ + --float8.enable_fsdp_float8_all_gather \ + --float8.precompute_float8_dynamic_scale_for_fsdp \ + --float8.force_recompute_fp8_weight_in_bwd \ + --profiling.profile_freq 1000000 + --training.steps 2000 +``` + +### Results + +Detailed performance results and training configurations can be found in the tables below along and can visualized in [this WandB report](https://api.wandb.ai/links/asaiacai/w4c46stp). `TPS` and `Memory(GiB)` are arbitrarily sampled at the 100th iteration: + +| NUM_NODES | TPS/GPU | Memory(GiB) | +| ----- | ----: | ----: | +| 4 | 10938 | 47.96 | +| 8 | 10753 | 46.97 | + + +### Versions and Dates + +| repo | commit | date | +| --- | --- | --- | +| torch | [2.8.0a0+5228986c39](https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-25-05.html) | 2025/05/29 | +| torchao | [0afa4c1](https://github.com/pytorch/ao/commit/0afa4c1bd28c82921e360ddbd1b27c9d6da5b947) | 2025/06/13 | +| torchtitan | [e7c0cae](https://github.com/pytorch/torchtitan/commit/e7c0cae934df78d6e9c2835f42ff1f757dc3fddc) | 2025/06/13 |