Skip to content

Commit c6feb18

Browse files
Fixing broken links (#1736) (#926)
* Fixing broken links Co-authored-by: Sadaf Rasool <[email protected]>
1 parent 012c018 commit c6feb18

File tree

2 files changed

+2
-2
lines changed

2 files changed

+2
-2
lines changed

libraries/neuronx-distributed/activation_memory_reduction.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -96,7 +96,7 @@ In the activation memory equation, we have a quadratic term of `5abs^2`. As the
9696
faster rate. This quadratic term comes from the softmax computation. `Vijay Korthikanti et.al <https://browse.arxiv.org/pdf/2205.05198.pdf>`__
9797
propose `Selective activation checkpointing` where they only recompute the softmax and attention computation and thereby avoid saving the activations coming
9898
from softmax and attention computation. This completely gets rid of the quadratic term and brings down the activation memory per layer to
99-
`34sbh/t`. The LLama-7B example in `this tutorial <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/tutorials/training_llama2_7b.html#llama2-7b-tp-zero1-tutorial>`__
99+
`34sbh/t`. The LLama-7B example in `this tutorial <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/tutorials/training_llama_tp_zero1.html#llama2-7b-tp-zero1-tutorial>`__
100100
used selective activation checkpointing.
101101

102102

libraries/neuronx-distributed/tutorials/training_codegen25_7b.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
Training CodeGen2.5 7B with Tensor Parallelism and ZeRO-1 Optimizer (``neuronx-distributed``)
44
==============================================================================================
55

6-
In this tutorial, we showcase how to pretrain a CodeGen2.5 7B model for program synthesis. Since Codegen2.5's architecture is identical to the one of Llama2, you may want to take a look at our `Llama2 tutorial <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/tutorials/training_llama2_7b.html>`__ first.
6+
In this tutorial, we showcase how to pretrain a CodeGen2.5 7B model for program synthesis. Since Codegen2.5's architecture is identical to the one of Llama2, you may want to take a look at our `Llama2 tutorial <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/tutorials/training_llama_tp_zero1.html>`__ first.
77

88
After setting up the environment and installing ``neuronx-distributed``, we need to download a data set containing source code (in this case Java code) and then preprocess and tokenize it to match the code-infill format (more about this below). Use the following commands to download the required files. Note, that we reuse our llama2 training files.
99

0 commit comments

Comments
 (0)