Skip to content

Conversation

@wbc6080
Copy link
Contributor

@wbc6080 wbc6080 commented Jun 23, 2025

  • Please check if the PR fulfills these requirements
  • The commit message follows our guidelines
  • Tests for the changes have been added (for bug fixes / features)
  • Docs have been added / updated (for bug fixes / features)

Which issue(s) this PR fixes:

Fixes #

  • What kind of change does this PR introduce? (Bug fix, feature, docs update, ...)

docs update

  • What is the new behavior (if this is a feature change)?

Added documentation for managing edge GPU nodes and introduce how to use GPU resources in edge applications.

@kubeedge-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign fisherxu after the PR has been reviewed.
You can assign the PR to them by writing /assign @fisherxu in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubeedge-bot kubeedge-bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Jun 23, 2025
Copy link
Contributor

@fujitatomoya fujitatomoya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall lgtm, this would be useful.

@@ -0,0 +1,296 @@
---
title: Edge Pods use GPU
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like this section dedicates for the Nvidia GPU, how to bind that to the application container. in that case, it probably would be better to clearly add it to the section title?

Suggested change
title: Edge Pods use GPU
title: Edge Pods use Nvidia GPU

Comment on lines 38 to 41
root@edgenode:~/release-v1.16.0-rc.1-experimental/packages/ubuntu18.04/amd64# ls
libnvidia-container1_1.16.0~rc.1-1_amd64.deb libnvidia-container-tools_1.16.0~rc.1-1_amd64.deb nvidia-container-toolkit-operator-extensions_1.16.0~rc.1-1_amd64.deb
libnvidia-container1-dbg_1.16.0~rc.1-1_amd64.deb nvidia-container-toolkit_1.16.0~rc.1-1_amd64.deb
libnvidia-container-dev_1.16.0~rc.1-1_amd64.deb nvidia-container-toolkit-base_1.16.0~rc.1-1_amd64.deb
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indent is off?

Suggested change
root@edgenode:~/release-v1.16.0-rc.1-experimental/packages/ubuntu18.04/amd64# ls
libnvidia-container1_1.16.0~rc.1-1_amd64.deb libnvidia-container-tools_1.16.0~rc.1-1_amd64.deb nvidia-container-toolkit-operator-extensions_1.16.0~rc.1-1_amd64.deb
libnvidia-container1-dbg_1.16.0~rc.1-1_amd64.deb nvidia-container-toolkit_1.16.0~rc.1-1_amd64.deb
libnvidia-container-dev_1.16.0~rc.1-1_amd64.deb nvidia-container-toolkit-base_1.16.0~rc.1-1_amd64.deb
root@edgenode:~/release-v1.16.0-rc.1-experimental/packages/ubuntu18.04/amd64# ls
libnvidia-container1_1.16.0~rc.1-1_amd64.deb libnvidia-container-tools_1.16.0~rc.1-1_amd64.deb nvidia-container-toolkit-operator-extensions_1.16.0~rc.1-1_amd64.deb
libnvidia-container1-dbg_1.16.0~rc.1-1_amd64.deb nvidia-container-toolkit_1.16.0~rc.1-1_amd64.deb
libnvidia-container-dev_1.16.0~rc.1-1_amd64.deb nvidia-container-toolkit-base_1.16.0~rc.1-1_amd64.deb

Execute the following command in this directory to complete the installation:

```shell
sudo apt install ./*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sudo apt install ./*
sudo apt install ./*

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Signed-off-by: wbc6080 <[email protected]>
Co-authored-by: ming.tang <[email protected]>
@kevin-wangzefeng
Copy link
Member

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces documentation for configuring and using GPUs in KubeEdge. The documentation covers setting up the GPU environment, managing GPU nodes, and testing GPU allocation. The documentation is provided in both English and Chinese. The review focuses on improving clarity and flow in both languages.

For specific installation guides, please refer to [Container Runtime](https://kubeedge.io/docs/setup/prerequisites/runtime)

:::tip
Since KubeEdge v1.14, support for Dockershim has been removed, and use Docker runtime to manage edge containers is no longer supported. If you still need to use Docker, you need to install [cri-dockerd](https://kubeedge.io/docs/setup/prerequisites/runtime#docker-engine) after installing Docker.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Consider rephrasing to improve clarity: "Since KubeEdge v1.14, direct Docker runtime support for managing edge containers has been removed. If you still need to use Docker, install cri-dockerd after installing Docker."

Suggested change
Since KubeEdge v1.14, support for Dockershim has been removed, and use Docker runtime to manage edge containers is no longer supported. If you still need to use Docker, you need to install [cri-dockerd](https://kubeedge.io/docs/setup/prerequisites/runtime#docker-engine) after installing Docker.
Since KubeEdge v1.14, direct Docker runtime support for managing edge containers has been removed. If you still need to use Docker, install cri-dockerd after installing Docker.

Comment on lines +97 to +101
keadm join --cgroupdriver=systemd \
--cloudcore-ipport="THE-EXPOSED-IP":10000 \
--kubeedge-version=v1.17.0 \
--token="YOUR TOKEN"
--remote-runtime-endpoint=unix:///var/run/cri-dockerd.sock

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The command examples for Docker and Containerd could benefit from a brief explanation of the parameters used, such as --cgroupdriver, --cloudcore-ipport, and --remote-runtime-endpoint. This would help users understand the purpose of each parameter.


```shell
# After deployment, check whether it is successfully deployed on the edge node
[root@master-01 ~]# kubectl get daemonsets.apps -n kube-system|grep nvidia

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Consider adding a link to the KubeEdge documentation that explains how to deploy applications on edge nodes. This would provide users with a more complete understanding of the process.

Comment on lines +29 to +31
- 如果边缘节点能够直接访问外部网络,可以直接按照[官方文档](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html)进行安装
- 如果边缘节点无法直接访问外部网络,则需要在网络可以联通的机器上下载[官方离线安装包](https://github.com/NVIDIA/nvidia-container-toolkit/releases),将安装包传入边缘节点完成解压。
解压后目录中应该出现如下的文件:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Consider rephrasing for better flow: "如果边缘节点无法直接访问外部网络,则需要在可以联通网络的主机上下载官方离线安装包,然后将安装包传入边缘节点并解压。"

Suggested change
- 如果边缘节点能够直接访问外部网络,可以直接按照[官方文档](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html)进行安装
- 如果边缘节点无法直接访问外部网络,则需要在网络可以联通的机器上下载[官方离线安装包](https://github.com/NVIDIA/nvidia-container-toolkit/releases),将安装包传入边缘节点完成解压。
解压后目录中应该出现如下的文件:
如果边缘节点无法直接访问外部网络,则需要在可以联通网络的主机上下载[官方离线安装包](https://github.com/NVIDIA/nvidia-container-toolkit/releases),然后将安装包传入边缘节点并解压。


这里以 Docker 和 Containerd 容器运行时作为示例:

```shell

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Consider rephrasing for better clarity: "这里以 Docker 和 Containerd 容器运行时为例:"

Suggested change
```shell
这里以 Docker 和 Containerd 容器运行时为例:

检查 k8s-device-plugin 是否成功部署:

```shell
# After deployment, check whether it is successfully deployed on the edge node

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Consider rephrasing for better readability: "检查 k8s-device-plugin 是否成功部署:"

Suggested change
# After deployment, check whether it is successfully deployed on the edge node
检查 k8s-device-plugin 是否成功部署:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants