-
Notifications
You must be signed in to change notification settings - Fork 176
add GPU docs #706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
add GPU docs #706
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
fujitatomoya
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall lgtm, this would be useful.
| @@ -0,0 +1,296 @@ | |||
| --- | |||
| title: Edge Pods use GPU | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like this section dedicates for the Nvidia GPU, how to bind that to the application container. in that case, it probably would be better to clearly add it to the section title?
| title: Edge Pods use GPU | |
| title: Edge Pods use Nvidia GPU |
docs/advanced/gpu.md
Outdated
| root@edgenode:~/release-v1.16.0-rc.1-experimental/packages/ubuntu18.04/amd64# ls | ||
| libnvidia-container1_1.16.0~rc.1-1_amd64.deb libnvidia-container-tools_1.16.0~rc.1-1_amd64.deb nvidia-container-toolkit-operator-extensions_1.16.0~rc.1-1_amd64.deb | ||
| libnvidia-container1-dbg_1.16.0~rc.1-1_amd64.deb nvidia-container-toolkit_1.16.0~rc.1-1_amd64.deb | ||
| libnvidia-container-dev_1.16.0~rc.1-1_amd64.deb nvidia-container-toolkit-base_1.16.0~rc.1-1_amd64.deb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indent is off?
| root@edgenode:~/release-v1.16.0-rc.1-experimental/packages/ubuntu18.04/amd64# ls | |
| libnvidia-container1_1.16.0~rc.1-1_amd64.deb libnvidia-container-tools_1.16.0~rc.1-1_amd64.deb nvidia-container-toolkit-operator-extensions_1.16.0~rc.1-1_amd64.deb | |
| libnvidia-container1-dbg_1.16.0~rc.1-1_amd64.deb nvidia-container-toolkit_1.16.0~rc.1-1_amd64.deb | |
| libnvidia-container-dev_1.16.0~rc.1-1_amd64.deb nvidia-container-toolkit-base_1.16.0~rc.1-1_amd64.deb | |
| root@edgenode:~/release-v1.16.0-rc.1-experimental/packages/ubuntu18.04/amd64# ls | |
| libnvidia-container1_1.16.0~rc.1-1_amd64.deb libnvidia-container-tools_1.16.0~rc.1-1_amd64.deb nvidia-container-toolkit-operator-extensions_1.16.0~rc.1-1_amd64.deb | |
| libnvidia-container1-dbg_1.16.0~rc.1-1_amd64.deb nvidia-container-toolkit_1.16.0~rc.1-1_amd64.deb | |
| libnvidia-container-dev_1.16.0~rc.1-1_amd64.deb nvidia-container-toolkit-base_1.16.0~rc.1-1_amd64.deb |
docs/advanced/gpu.md
Outdated
| Execute the following command in this directory to complete the installation: | ||
|
|
||
| ```shell | ||
| sudo apt install ./* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| sudo apt install ./* | |
| sudo apt install ./* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
Signed-off-by: wbc6080 <[email protected]> Co-authored-by: ming.tang <[email protected]>
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces documentation for configuring and using GPUs in KubeEdge. The documentation covers setting up the GPU environment, managing GPU nodes, and testing GPU allocation. The documentation is provided in both English and Chinese. The review focuses on improving clarity and flow in both languages.
| For specific installation guides, please refer to [Container Runtime](https://kubeedge.io/docs/setup/prerequisites/runtime) | ||
|
|
||
| :::tip | ||
| Since KubeEdge v1.14, support for Dockershim has been removed, and use Docker runtime to manage edge containers is no longer supported. If you still need to use Docker, you need to install [cri-dockerd](https://kubeedge.io/docs/setup/prerequisites/runtime#docker-engine) after installing Docker. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider rephrasing to improve clarity: "Since KubeEdge v1.14, direct Docker runtime support for managing edge containers has been removed. If you still need to use Docker, install cri-dockerd after installing Docker."
| Since KubeEdge v1.14, support for Dockershim has been removed, and use Docker runtime to manage edge containers is no longer supported. If you still need to use Docker, you need to install [cri-dockerd](https://kubeedge.io/docs/setup/prerequisites/runtime#docker-engine) after installing Docker. | |
| Since KubeEdge v1.14, direct Docker runtime support for managing edge containers has been removed. If you still need to use Docker, install cri-dockerd after installing Docker. |
| keadm join --cgroupdriver=systemd \ | ||
| --cloudcore-ipport="THE-EXPOSED-IP":10000 \ | ||
| --kubeedge-version=v1.17.0 \ | ||
| --token="YOUR TOKEN" | ||
| --remote-runtime-endpoint=unix:///var/run/cri-dockerd.sock |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
|
||
| ```shell | ||
| # After deployment, check whether it is successfully deployed on the edge node | ||
| [root@master-01 ~]# kubectl get daemonsets.apps -n kube-system|grep nvidia |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| - 如果边缘节点能够直接访问外部网络,可以直接按照[官方文档](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html)进行安装 | ||
| - 如果边缘节点无法直接访问外部网络,则需要在网络可以联通的机器上下载[官方离线安装包](https://github.com/NVIDIA/nvidia-container-toolkit/releases),将安装包传入边缘节点完成解压。 | ||
| 解压后目录中应该出现如下的文件: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider rephrasing for better flow: "如果边缘节点无法直接访问外部网络,则需要在可以联通网络的主机上下载官方离线安装包,然后将安装包传入边缘节点并解压。"
| - 如果边缘节点能够直接访问外部网络,可以直接按照[官方文档](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html)进行安装 | |
| - 如果边缘节点无法直接访问外部网络,则需要在网络可以联通的机器上下载[官方离线安装包](https://github.com/NVIDIA/nvidia-container-toolkit/releases),将安装包传入边缘节点完成解压。 | |
| 解压后目录中应该出现如下的文件: | |
| 如果边缘节点无法直接访问外部网络,则需要在可以联通网络的主机上下载[官方离线安装包](https://github.com/NVIDIA/nvidia-container-toolkit/releases),然后将安装包传入边缘节点并解压。 |
|
|
||
| 这里以 Docker 和 Containerd 容器运行时作为示例: | ||
|
|
||
| ```shell |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| 检查 k8s-device-plugin 是否成功部署: | ||
|
|
||
| ```shell | ||
| # After deployment, check whether it is successfully deployed on the edge node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which issue(s) this PR fixes:
Fixes #
docs update
Added documentation for managing edge GPU nodes and introduce how to use GPU resources in edge applications.