@@ -1155,7 +1155,7 @@ rm_extracted: True # Preprocess script will remove extracted files after preproc
11551155# ### 5.2.1. Predefined Configurations of GPT Models
11561156<a id="markdown-predefined-configurations-of-gpt-models" name="predefined-configurations-of-gpt-models"></a>
11571157
1158- We provide nine configurations for several different GPT model sizes : 126M, 400M, 1B , 5B, 7B , 20B,
1158+ We provide nine configurations for several different GPT model sizes : 126M, 400M_improved, 1B_improved , 5B, 7B_improved , 20B,
1159115940B, 40B_improved, and 175B parameters. These configurations include carefully selected
11601160hyperparameters, which should be used as a guideline for any custom model
11611161configurations. All these configurations are provided in the `conf/training/gpt3/`
@@ -5545,7 +5545,7 @@ The table and chart below show the performance results.
55455545* Tensor and Pipeline Parallelism Conversion Support for GPT and T5
55465546* Supervised Fine-Tuning Support for GPT
55475547* RLHF (Reinforcement Learning from Human Feedback) for GPT
5548- * New and improved GPT model sizes - 400M, 1B, 7B, 40B based on new and improved model configurations.
5548+ * New GPT model sizes - 400M_improved, 1B_improved, 7B_improved, 40B_improved based on new and improved model configurations.
55495549* List of GPT model configuration changes
55505550
55515551| Configuration | Previous | New |
0 commit comments