You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: conf.py
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -157,7 +157,7 @@
157
157
158
158
#top_banner_message="<span>⚠</span><a class='reference internal' style='color:white;' href='https://awsdocs-neuron.readthedocs-hosted.com/en/latest/general/setup/setup-troubleshooting.html#gpg-key-update'> Neuron repository GPG key for Ubuntu installation has expired, see instructions how to update! </a>"
159
159
160
-
top_banner_message="Neuron 2.20.0 is released! check <a class='reference internal' style='color:white;' href='https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/index.html#latest-neuron-release'> What's New </a> and <a class='reference internal' style='color:white;' href='https://awsdocs-neuron.readthedocs-hosted.com/en/latest/general/announcements/index.html'> Announcements </a>"
160
+
top_banner_message="Neuron 2.20.1 is released! check <a class='reference internal' style='color:white;' href='https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/index.html#latest-neuron-release'> What's New </a> and <a class='reference internal' style='color:white;' href='https://awsdocs-neuron.readthedocs-hosted.com/en/latest/general/announcements/index.html'> Announcements </a>"
Copy file name to clipboardExpand all lines: containers/tutorials/k8s-multiple-scheduler.rst
+11-22Lines changed: 11 additions & 22 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,35 +5,28 @@ In cluster environments where there is no access to default scheduler, the neuro
5
5
use this new scheduler. Neuron scheduler extension is added to this new scheduler. EKS natively does not yet support the neuron scheduler extension and so in the EKS environment this is the only way to add the neuron scheduler extension.
6
6
7
7
* Make sure :ref:`Neuron device plugin<k8s-neuron-device-plugin>` is running
8
-
* Download the my scheduler :download:`my-scheduler.yml </src/k8/my-scheduler.yml>`
9
-
* Download the scheduler extension :download:`k8s-neuron-scheduler-eks.yml </src/k8/k8s-neuron-scheduler-eks.yml>`
I1012 15:30:21.629611 1 scheduler.go:604] "Successfully bound pod to node" pod="kube-system/k8s-neuron-scheduler-5d9d9d7988-xcpqm" node="ip-192-168-2-25.ec2.internal" evaluatedNodes=1 feasibleNodes=1
32
25
33
26
34
27
* When running new pod's that need to use the neuron scheduler extension, make sure it uses the my-scheduler as the scheduler. Sample pod spec is below
35
28
36
-
::
29
+
.. code:: bash
37
30
38
31
apiVersion: v1
39
32
kind: Pod
@@ -57,20 +50,19 @@ use this new scheduler. Neuron scheduler extension is added to this new schedule
57
50
58
51
* Once the neuron workload pod is run, make sure logs in the k8s neuron scheduler has successfull filter/bind request
2022/10/12 15:41:16 Succesfully updated the DevUsageMap [true truetruetruetruetruetruetruetruefalsefalsefalsefalsefalsefalse false] and otherDevUsageMap [true truetrue false] after alloc for node ip-192-168-2-25.ec2.internal
Copy file name to clipboardExpand all lines: containers/tutorials/k8s-neuron-device-plugin.rst
+5-7Lines changed: 5 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,27 +6,25 @@ Deploy Neuron Device Plugin
6
6
~~~~~~~~~~~~~~~~~~~~~~~~~~~
7
7
8
8
* Make sure :ref:`prequisite<k8s-prerequisite>` are satisified
9
-
* Download the neuron device plugin yaml file. :download:`k8s-neuron-device-plugin.yml </src/k8/k8s-neuron-device-plugin.yml>`
10
-
* Download the neuron device plugin rbac yaml file. This enables permissions for device plugin to update the node and Pod annotations. :download:`k8s-neuron-device-plugin-rbac.yml </src/k8/k8s-neuron-device-plugin-rbac.yml>`
11
9
* Apply the Neuron device plugin as a daemonset on the cluster with the following command
Copy file name to clipboardExpand all lines: containers/tutorials/k8s-neuron-problem-detector-and-recovery.rst
+8-14Lines changed: 8 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,30 +4,24 @@ Neuron node problem detector and recovery artifact checks the health of Neuron d
4
4
5
5
* The Neuron node problem detector and recovery requires Neuron driver 2.15+, and it requires the runtime to be at SDK 2.18 or later.
6
6
* Make sure prerequisites are satisfied. This includes prerequisites for getting started with Kubernetes containers and prerequisites for the Neuron node problem detector and recovery.
7
-
* Download the Neuron node problem detector and recovery YAML file: :download:`k8s-neuron-problem-detector-and-recovery.yml </src/k8/neuron-problem-detector/k8s-neuron-problem-detector-and-recovery.yml>`.
7
+
* Install the Neuron node problem detector and recovery as a DaemonSet on the cluster with the following command:
8
8
9
9
.. note::
10
10
11
-
This YAML pulls the container image from the upstream repository for node problem detector registry.k8s.io/node-problem-detector.
12
-
13
-
* Download the Neuron node problem detector and recovery configuration file: :download:`k8s-neuron-problem-detector-and-recovery-config.yml </src/k8/neuron-problem-detector/k8s-neuron-problem-detector-and-recovery-config.yml>`.
14
-
* Download the Neuron node problem detector and recovery RBAC YAML file. This enables permissions for the Neuron node problem detector and recovery to update the node condition: :download:`k8s-neuron-problem-detector-and-recovery-rbac.yml </src/k8/neuron-problem-detector/k8s-neuron-problem-detector-and-recovery-rbac.yml>`.
15
-
* By default, the Neuron node problem detector and recovery has monitor only mode enabled. To enable the recovery functionality, update the environment variable in the YAML file:
11
+
The installation pulls the container image from the upstream repository for node problem detector registry.k8s.io/node-problem-detector.
* Verify that the Neuron device plugin is running:
31
25
32
26
.. code:: bash
33
27
@@ -44,4 +38,4 @@ Verify that the Neuron device plugin is running:
44
38
node-problem-detector-vpjtk 1/1 Running 0 59s
45
39
46
40
47
-
When any unrecoverable error occurs, Neuron node problem detector and recovery publishes a metric under the CloudWatch namespace NeuronHealthCheck. It also reflects in NodeCondition and can be seen with kubectl describe node.
41
+
* When any unrecoverable error occurs, Neuron node problem detector and recovery publishes a metric under the CloudWatch namespace NeuronHealthCheck. It also reflects in NodeCondition and can be seen with kubectl describe node.
* - Deep Learning AMI Neuron PyTorch 1.13 (Amazon Linux 2)
158
-
- torch-neuronx, neuronx-distributed
159
-
- /opt/aws_neuron_venv_pytorch
160
-
161
-
* - Deep Learning AMI Neuron PyTorch 1.13 (Amazon Linux 2)
162
-
- torch-neuron
163
-
- /opt/aws_neuron_venv_pytorch_inf1
164
-
165
160
* - Deep Learning AMI Neuron TensorFlow 2.10 (Ubuntu 20.04)
166
161
- tensorflow-neuronx
167
162
- /opt/aws_neuron_venv_tensorflow
168
163
169
-
* - Deep Learning AMI Neuron TensorFlow 2.10 (Amazon Linux 2)
170
-
- tensorflow-neuronx
171
-
- /opt/aws_neuron_venv_tensorflow
172
-
173
-
174
164
You can easily get started with the single framework DLAMI through AWS console by following one of the corresponding setup guides . If you are looking to
175
165
use the Neuron DLAMI in your cloud automation flows , Neuron also supports :ref:`SSM parameters <ssm-parameter-neuron-dlami>` to easily retrieve the latest DLAMI id.
- Added support for Amazon Linux 2023 to Neuron Multi Framework DLAMI. Customers will have two operating system options when using the multi framework DLAMI. See :ref:`neuron-dlami-overview`.
Copy file name to clipboardExpand all lines: release-notes/containers/neuron-dlc.rst
+7Lines changed: 7 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,6 +8,13 @@ Neuron DLC Release Notes
8
8
:depth: 1
9
9
10
10
11
+
Neuron 2.20.1
12
+
-------------
13
+
14
+
Date: 10/25/2024
15
+
- Neuron 2.20.1 DLC includes prerequisites for `Neuronx Distributed Training framework <https://github.com/aws-neuron/neuronx-distributed-training/blob/main/docs/general/installation_guide.rst#building-apex>`. Customers can expect to use NxDT out of the box.
0 commit comments