-
Notifications
You must be signed in to change notification settings - Fork 850
Description
/kind bug
What happened?
We are deploying our cluster using kops, so the EBS version is a bit old. Sorry for that! Every time we create a new cluster our strimzi deployment fails with a few replicas (1 or 2 out of 7) reporting MountVolume.MountDevice failed.
Problem: MountVolume.MountDevice failed for volume "pvc-f4d9245b-8e05-45e9-a6d3-c67a36b2bde1" : rpc error: code = Aborted desc = An operation with the given volume="vol-0ec38943edd360ec2" is already in progress
The volume is attached to the correct node just fine:
The Pod is assigned to `nodeName: i-04a771ff5fadd43fb``
Seems reasonable to look in Kubernetes for the problem
✦ ➜ kubectl get volumeattachment
NAME ATTACHER PV NODE ATTACHED AGE
csi-0115e55fc9e565c8ea85605d078d4a1c94cd02878d8e825f4b7737d996d5f931 ebs.csi.aws.com pvc-d79555a8-f769-4d76-92e9-04c194702753 i-0d8ddc76ef940a784 true 6h7m
csi-51039995feb12d3abde3eb78137b12bcb3bc617c9032f31ead98fafb7204c9a6 ebs.csi.aws.com pvc-208b446b-ec1f-4487-ac05-c9f5d14b40b5 i-0d6198b0b546dac69 true 6h7m
csi-5cf2c1b890e66aca0b2e3f9e753478b1b35e6d0dc140022ab314c439da9180a4 ebs.csi.aws.com pvc-cb84f7b0-b8fd-4749-8bbd-6440412d6ab9 i-06d68249d4a7d0cb8 true 6h7m
csi-954a71cfaa8a33354320408ab19f0c2f3b88ca6b742a8522222242dcae1fc6f2 ebs.csi.aws.com pvc-1bbed81d-86e6-4f34-ac1c-66dfc739a72e i-06d68249d4a7d0cb8 true 6h7m
csi-a87059f2e70cd2f489687b91b2c7b67e4741ca7e2a9e0f8a35f9cf92dfc0d96f ebs.csi.aws.com pvc-6dc054e8-a09a-4011-8be7-3fba24761fe6 i-08a1e2a74f11872f0 true 6h7m
csi-ad2817e61aa04c7824b18bff188c2b99e6f82b637946f5ced965fb78ddb1a707 ebs.csi.aws.com pvc-f56fee11-e15c-4029-9292-e38208fcedbe i-08a1e2a74f11872f0 true 6h7m
csi-c6ba5d163cd17699ec1d27b391374573e96011470eb79f08575ee1a0aae8fbcb ebs.csi.aws.com pvc-f4d9245b-8e05-45e9-a6d3-c67a36b2bde1 i-04a771ff5fadd43fb true 6h7m
csi-d7c979ec26d1b26f63e5d69e12b2b45d1b6ab4ef19a98b22c363ee067a6dd145 ebs.csi.aws.com pvc-27dba443-9377-4e19-a8d4-7440b847ca99 i-0d6198b0b546dac69 true 6h7m
csi-e31029884739e2e203a0b284337331b0cec184aa5987aefea4c8ec7dffe62814 ebs.csi.aws.com pvc-2c8259af-48e3-421d-9fae-7015ad622d31 i-023eef5364eb2146e true 6h7m✦ ➜ kubectl get volumeattachment -o json | jq '.items[] | select(.spec.source.persistentVolumeName=="pvc-f4d9245b-8e05-45e9-a6d3-c67a36b2bde1")'
{
"apiVersion": "storage.k8s.io/v1",
"kind": "VolumeAttachment",
"metadata": {
"annotations": {
"csi.alpha.kubernetes.io/node-id": "i-04a771ff5fadd43fb"
},
"creationTimestamp": "2025-11-10T08:09:33Z",
"finalizers": [
"external-attacher/ebs-csi-aws-com"
],
"name": "csi-c6ba5d163cd17699ec1d27b391374573e96011470eb79f08575ee1a0aae8fbcb",
"resourceVersion": "13495",
"uid": "4af77cfb-675e-4ea3-a02d-8033757895c4"
},
"spec": {
"attacher": "ebs.csi.aws.com",
"nodeName": "i-04a771ff5fadd43fb",
"source": {
"persistentVolumeName": "pvc-f4d9245b-8e05-45e9-a6d3-c67a36b2bde1"
}
},
"status": {
"attached": true,
"attachmentMetadata": {
"devicePath": "/dev/xvdaa"
}
}
}EBS Pod is saying the following not too happy things :'(
✦ ➜ k logs -n kube-system ebs-csi-node-qv926 -f
Defaulted container "ebs-plugin" out of: ebs-plugin, node-driver-registrar, liveness-probe
I1110 07:45:09.250327 1 main.go:147] "Region provided via AWS_REGION environment variable" region="eu-central-1"
I1110 07:45:09.250381 1 main.go:149] "Node service requires metadata even if AWS_REGION provided, initializing metadata"
E1110 07:45:14.251672 1 metadata.go:51] "Retrieving IMDS metadata failed, falling back to Kubernetes metadata" err="could not get EC2 instance identity metadata: operation error ec2imds: GetInstanceIdentityDocument, canceled, context deadline exceeded"
I1110 07:45:14.258415 1 metadata.go:55] "Retrieved metadata from Kubernetes"
I1110 07:45:14.258734 1 driver.go:69] "Driver Information" Driver="ebs.csi.aws.com" Version="v1.38.1"
I1110 07:45:15.264842 1 node.go:936] "CSINode Allocatable value is set" nodeName="i-04a771ff5fadd43fb" count=26
I1110 08:09:39.645772 1 mount_linux.go:618] Disk "/dev/nvme1n1" appears to be unformatted, attempting to format as type: "ext4" with options: [-F -m0 /dev/nvme1n1]
I1110 08:09:46.975266 1 mount_linux.go:629] Disk successfully formatted (mkfs): ext4 - /dev/nvme1n1 /var/lib/kubelet/plugins/kubernetes.io/csi/ebs.csi.aws.com/ffeb8e27681e00386ca34317ead7f6aa8189a2f1c4b8080e94779e3a4029071a/globalmount
I1110 08:09:46.975369 1 mount_linux.go:295] Detected OS without systemd
E1110 08:11:40.216138 1 driver.go:108] "GRPC error" err="rpc error: code = Aborted desc = An operation with the given volume=\"vol-0ec38943edd360ec2\" is already in progress"
E1110 08:11:41.321444 1 driver.go:108] "GRPC error" err="rpc error: code = Aborted desc = An operation with the given volume=\"vol-0ec38943edd360ec2\" is already in progress"
E1110 08:11:43.333803 1 driver.go:108] "GRPC error" err="rpc error: code = Aborted desc = An operation with the given volume=\"vol-0ec38943edd360ec2\" is already in progress"
E1110 08:11:47.355531 1 driver.go:108] "GRPC error" err="rpc error: code = Aborted desc = An operation with the given volume=\"vol-0ec38943edd360ec2\" is already in progress"Controller seems happy
✦ ❯ kubectl logs -n kube-system deployment/ebs-csi-controller -c ebs-plugin | \
grep "vol-0ec38943edd360ec2" | tail -20
Found 2 pods, using pod/ebs-csi-controller-5569459555-mqkcv
I1110 08:09:33.632537 1 controller.go:397] "ControllerPublishVolume: called" args="volume_id:\"vol-0ec38943edd360ec2\" node_id:\"i-04a771ff5fadd43fb\" volume_capability:{mount:{fs_type:\"ext4\"} access_mode:{mode:SINGLE_NODE_WRITER}} volume_context:{key:\"storage.kubernetes.io/csiProvisionerIdentity\" value:\"1762760519115-7247-ebs.csi.aws.com\"}"
I1110 08:09:33.632643 1 controller.go:410] "ControllerPublishVolume: attaching" volumeID="vol-0ec38943edd360ec2" nodeID="i-04a771ff5fadd43fb"
I1110 08:09:34.763429 1 cloud.go:960] "[Debug] AttachVolume" volumeID="vol-0ec38943edd360ec2" nodeID="i-04a771ff5fadd43fb" resp={"AssociatedResource":null,"AttachTime":"2025-11-10T08:09:34.564Z","DeleteOnTermination":null,"Device":"/dev/xvdaa","InstanceId":"i-04a771ff5fadd43fb","InstanceOwningService":null,"State":"attaching","VolumeId":"vol-0ec38943edd360ec2","ResultMetadata":{}}
I1110 08:09:35.251660 1 manager.go:190] "[Debug] Releasing in-process" attachment entry="/dev/xvdaa" volume="vol-0ec38943edd360ec2"
I1110 08:09:35.251674 1 controller.go:419] "ControllerPublishVolume: attached" volumeID="vol-0ec38943edd360ec2" nodeID="i-04a771ff5fadd43fb" devicePath="/dev/xvdaa"
I1110 08:09:35.251695 1 inflight.go:74] "Node Service: volume operation finished" key="vol-0ec38943edd360ec2i-04a771ff5fadd43fb"I tried to delete the relevant Pod ebs-csi-node-qv926, but it is stuck in terminating state. Before doing anything more drastic I wanted to ask here for advice.
What you expected to happen?
The volume is attached and the Pod mounts the PVC
How to reproduce it (as minimally and precisely as possible)?
Difficult to reproduce, not sure how
Anything else we need to know?:
Environment
Client Version: v1.33.3
Kustomize Version: v5.6.0
Server Version: v1.33.5
aws-ebs-csi-driver:v1.38.1,