Skip to content

Commit edd369a

Browse files
authored
Merge pull request #47 from NVIDIA/davidm/readme-updates
readme updated and filename updates
2 parents 5275af7 + 89bac02 commit edd369a

File tree

5 files changed

+3
-6
lines changed

5 files changed

+3
-6
lines changed

README.md

Lines changed: 3 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1155,8 +1155,8 @@ rm_extracted: True # Preprocess script will remove extracted files after preproc
11551155
#### 5.2.1. Predefined Configurations of GPT Models
11561156
<a id="markdown-predefined-configurations-of-gpt-models" name="predefined-configurations-of-gpt-models"></a>
11571157

1158-
We provide five configurations for several different GPT model sizes: 126M, 5B, 20B,
1159-
40B, and 175B parameters. These configurations include carefully selected
1158+
We provide nine configurations for several different GPT model sizes: 126M, 400M_improved, 1B_improved, 5B, 7B_improved, 20B,
1159+
40B, 40B_improved, and 175B parameters. These configurations include carefully selected
11601160
hyperparameters, which should be used as a guideline for any custom model
11611161
configurations. All these configurations are provided in the `conf/training/gpt3/`
11621162
directory. The desired configuration can be chosen by selecting the `training`
@@ -5545,7 +5545,7 @@ The table and chart below show the performance results.
55455545
* Tensor and Pipeline Parallelism Conversion Support for GPT and T5
55465546
* Supervised Fine-Tuning Support for GPT
55475547
* RLHF (Reinforcement Learning from Human Feedback) for GPT
5548-
* New GPT model sizes - 843M, 2B, 8B, 43B based on new and improved model configurations.
5548+
* New GPT model sizes - 400M_improved, 1B_improved, 7B_improved, 40B_improved based on new and improved model configurations
55495549
* List of GPT model configuration changes
55505550
55515551
| Configuration | Previous | New |
@@ -5557,9 +5557,6 @@ The table and chart below show the performance results.
55575557
| Bias terms | Yes | No |
55585558
| Normalization | LayerNorm | LayerNorm1p |
55595559
5560-
* Added the option to use RMSNorm normalization with GPT models. Can be configured by setting `model.normalization` to `rmsnorm`. Default is `layernorm1p`.
5561-
* Added `fast` versions of SwiGLU, GeGLU and ReGLU. Can be configured by setting `model.activation=fast-swiglu`, `model.activation=fast-reglu` or `model.activation=fast-geglu`. Checkpoints trained with `fast` and regular versions of SwiGLU, GeGLU and ReGLU are *not* compatible with each since the weight state dictionaries are different.
5562-
55635560
**NeMo Framework 23.03**
55645561
* Per micro-batch data loader for GPT and BERT
55655562
* SquaredReLU and SwiGLU activation function support for GPT and T5

0 commit comments

Comments
 (0)