Skip to content

Conversation

@nikita-dubrovskii
Copy link
Contributor

firstboot of RHCOS on IBM zKVM from time to time fails during "File System Check".
This happens, because systemd unit has an old filesystem's UUID from pristine qcow2 image,
not the regenerated one:

coreos-boot-edit: + lsblk -o NAME,LABEL,UUID --paths --pairs /dev/disk/by-label/boot
coreos-boot-edit: NAME="/dev/mapper/crypt_bootfs" LABEL="boot" UUID="96d15588-3596-4b3c-adca-a2ff7279ea63"
coreos-boot-edit: + blkid /dev/disk/by-label/boot
coreos-boot-edit: /dev/disk/by-label/boot: LABEL="boot" UUID="eee55c4f-c2df-47e9-a284-992e9e122a97" BLOCK_SIZE="1024" TYPE="ext4"
coreos-boot-edit: + rdcore bind-boot /sysroot /mnt/boot_partition
.....
coreos-boot-mount-generator: ++ cat /run/coreos/bootfs_uuid
coreos-boot-mount-generator: + bootdev=/dev/disk/by-uuid/96d15588-3596-4b3c-adca-a2ff7279ea63

firstboot of RHCOS on IBM zKVM from time to time fails during "File System Check".
This happens, because systemd unit has an old filesystem's UUID from pristine qcow2 image,
not the regenerated one:

```
coreos-boot-edit: + lsblk -o NAME,LABEL,UUID --paths --pairs /dev/disk/by-label/boot
coreos-boot-edit: NAME="/dev/mapper/crypt_bootfs" LABEL="boot" UUID="96d15588-3596-4b3c-adca-a2ff7279ea63"
coreos-boot-edit: + blkid /dev/disk/by-label/boot
coreos-boot-edit: /dev/disk/by-label/boot: LABEL="boot" UUID="eee55c4f-c2df-47e9-a284-992e9e122a97" BLOCK_SIZE="1024" TYPE="ext4"
coreos-boot-edit: + rdcore bind-boot /sysroot /mnt/boot_partition
.....
coreos-boot-mount-generator: ++ cat /run/coreos/bootfs_uuid
coreos-boot-mount-generator: + bootdev=/dev/disk/by-uuid/96d15588-3596-4b3c-adca-a2ff7279ea63
```

Signed-off-by: Nikita Dubrovskii <[email protected]>
@jlebon
Copy link
Member

jlebon commented Jul 12, 2022

Hmm, so the bootfs UUID reported by lsblk is stale? Do we know why?

@nikita-dubrovskii
Copy link
Contributor Author

Hmm, so the bootfs UUID reported by lsblk is stale? Do we know why?

i guess that issue is somewhere between old kernel of RHEL and udev on zKVM. FCOS works just fine. Maybe i'm wrong.

@cgwalters
Copy link
Member

Hmm. I think this may be that lsblk uses the kernel's cached view of things by reading from /sys, but blkid opens the block device directly.

(Comparing e.g. strace -f lsblk /dev/vda vs strace -f blkid /dev/vda in a cosa run shell)

This to mean signals that the real problem is likely that we need to synchronously wait for a partprobe.

@jlebon
Copy link
Member

jlebon commented Jul 14, 2022

Do we still need this now that we're using an Ignition config for the reprovisioning in coreos/fedora-coreos-config#1819?

@nikita-dubrovskii
Copy link
Contributor Author

Do we still need this now that we're using an Ignition config for the reprovisioning in coreos/fedora-coreos-config#1819?

i'd prefer to have this. i wasn't able to test ignition+luks on RHCOS, because it again switched to an old kernel (or haven't picked up a fixed one): https://bugzilla.redhat.com/show_bug.cgi?id=2075085 . switching to dev/vda instead of coreos-boot-disks doesn't help much. so i'm still debugging why /dev/disk/by-*/ are partially empty after ignition

Copy link
Member

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, while I think ensuring we can know the kernel state is correct here would be better, I'm personally fine with this too.

@jlebon
Copy link
Member

jlebon commented Aug 10, 2022

Right, my concern with this is that this feels like it's working around what could possibly be a deeper issue. We're fixing it for rdcore but other code (present and future) may still be using the wrong information. If Secure Execution is triggering this, let's try to find out why that is and fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants