kata-manager pod not start

_**Important Note:  NVIDIA AI Enterprise customers can get support from NVIDIA Enterprise support. Please open a case [here](https://enterprise-support.nvidia.com/s/create-case)**._

**Describe the bug**

我按照[文档](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/gpu-operator-kata.html)安装，但kata-manager等容器并没有启动，我查看gpu-operator的日志看到报错：

{"level":"error","ts":1762424337.8351557,"msg":"Reconciler error","controller":"clusterpolicy-controller","object":{"name":"cluster-policy"},"namespace":"","name":"cluster-policy","reconcileID":"2281e293-6efc-49c8-ac29-2711707ecb58","error":"Operation cannot be fulfilled on clusterpolicies.nvidia.com \"cluster-policy\": the object has been modified; please apply your changes to the latest version and try again"}
{"level":"info","ts":1762424337.935488,"logger":"controllers.ClusterPolicy","msg":"WARNING: failed to get GPU workload config for node; using default","NodeName":"k8s-10-1-3-198","SandboxEnabled":true,"Error":"invalid GPU workload config: kata","defaultGPUWorkloadConfig":"container"}

我此时启动了的pod如下：

root@k8s-10-1-3-198:~# kubectl get pods -n gpu-operator
NAME                                                              READY   STATUS      RESTARTS      AGE
gpu-feature-discovery-l9cdd                                       1/1     Running     4 (21h ago)   21h
gpu-operator-1762423735-node-feature-discovery-gc-67489989g9vfl   1/1     Running     0             21h
gpu-operator-1762423735-node-feature-discovery-master-5cbfhmfx4   1/1     Running     0             21h
gpu-operator-1762423735-node-feature-discovery-worker-fhtdk       1/1     Running     0             21h
gpu-operator-58c88f459d-9dk8z                                     1/1     Running     0             21h
nvidia-container-toolkit-daemonset-jnmqr                          1/1     Running     0             21h
nvidia-cuda-validator-4kn5w                                       0/1     Completed   0             21h
nvidia-dcgm-exporter-h8hlg                                        1/1     Running     2 (21h ago)   21h
nvidia-device-plugin-daemonset-zb68n                              1/1     Running     3 (21h ago)   21h
nvidia-operator-validator-nmn96                                   1/1     Running     0             21h


root@k8s-10-1-3-198:~# nvidia-smi
Fri Nov  7 07:59:50 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.95.05              Driver Version: 580.95.05      CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3090        Off |   00000000:01:00.0 Off |                  N/A |
|  0%   35C    P8             18W /  370W |       1MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+


**To Reproduce**
Detailed steps to reproduce the issue.

**Expected behavior**
A clear and concise description of what you expected to happen.

**Environment (please provide the following information):**
 - GPU Operator Version: v25.10.-
 - OS: Ubuntu22.04.1
 - Kernel Version: 6.8.0-86-generic
 - Container Runtime Version: v1.7.27
 - Kubernetes Distro and Version: k8s v1.33.1



**Information to [attach](https://help.github.com/articles/file-attachments-on-issues-and-pull-requests/)** (optional if deemed irrelevant)

 - [ ] kubernetes pods status: `kubectl get pods -n OPERATOR_NAMESPACE`
 - [ ] kubernetes daemonset status: `kubectl get ds -n OPERATOR_NAMESPACE`
 - [ ] If a pod/ds is in an error state or pending state `kubectl describe pod -n OPERATOR_NAMESPACE POD_NAME`
 - [ ] If a pod/ds is in an error state or pending state `kubectl logs -n OPERATOR_NAMESPACE POD_NAME --all-containers`
 - [ ] Output from running `nvidia-smi` from the driver container: `kubectl exec DRIVER_POD_NAME -n OPERATOR_NAMESPACE -c nvidia-driver-ctr -- nvidia-smi`
 - [ ] containerd logs `journalctl -u containerd > containerd.log`


Collecting full debug bundle (optional):

```
curl -o must-gather.sh -L https://raw.githubusercontent.com/NVIDIA/gpu-operator/main/hack/must-gather.sh
chmod +x must-gather.sh
./must-gather.sh
```
**NOTE**: please refer to the [must-gather](https://raw.githubusercontent.com/NVIDIA/gpu-operator/main/hack/must-gather.sh) script for debug data collected.

This bundle can be submitted to us via email: **operator_feedback@nvidia.com**


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

kata-manager pod not start #1871

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

kata-manager pod not start #1871

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions