-
Notifications
You must be signed in to change notification settings - Fork 178
kola: Add soft-reboot support for external tests #4119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Skipping CI for Draft Pull Request. |
depends on: coreos/coreos-assembler#4119
Is there a case where a soft-reboot would be a better test than a hard reboot? Is the goal here to reduce test time? |
Right now soft-reboots are not really supported by ostree but we are implementing soft-reboots for ostree on : ostreedev/ostree#3420 |
Ok, Just comment here when you move this out of draft |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me overall! Main issue is code duplication with the "hard" reboot path but that's probably hard to fix and not worth doing.
echo "test beginning" | ||
# Check that boot_id stays the same across soft-reboot | ||
INITIAL_BOOT_ID=$(cat /proc/sys/kernel/random/boot_id) | ||
echo "Initial boot ID: $INITIAL_BOOT_ID" | logger |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't need the |logger
, we always log tests to the journal
# Verify boot_id is the same (soft-reboot should not change it) | ||
CURRENT_BOOT_ID=$(cat /proc/sys/kernel/random/boot_id) | ||
echo "Current boot ID after soft-reboot: $CURRENT_BOOT_ID" | logger |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this verified though? I think we need to save the boot id to a persistent file like /var/cache/kola-boot-id
or something in the first boot and then compare it here.
mark2) | ||
echo "test in mark2" | ||
FINAL_BOOT_ID=$(cat /proc/sys/kernel/random/boot_id) | ||
echo "Final boot ID after forced soft-reboot: $FINAL_BOOT_ID" | logger |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto
OK test now dies with a timeout which is the same behavior I see when I do edit: looking at the console log the VM appears to come back. It's just cosa that loses the ability to ssh back in, even during a |
OK needed to modify qemu.go to get the test to comeback too. Now need to figureout this failed service.
|
4944160
to
d0743f6
Compare
I am not sure of the story of that service but it looks like we could ignore that failed service without any real issues. |
My offhand guess as to what's happening here is that service isn't prepared for soft reboots. https://github.com/coreos/fedora-coreos-config/blob/20feb176f19c3142b7256c1eb5bf1cb7c53b29b9/overlay.d/05core/usr/lib/systemd/system/coreos-ignition-firstboot-complete.service#L4 is still triggered because we didn't change the kernel commandline across the soft reboot. I think CoreOS probably does need to fix this because soft rebooting from the initial boot is a sane thing to do and I see real world use cases for it. It is perhaps as simple as |
On coreos/coreos-assembler#4119 surfaced that this service would fail on soft-reboots, it's non fatal but would make the Kola tests fail. It looks like originally this was set to fail instead of risking not running it: https://github.com/coreos/fedora-coreos-config/blob/20feb176f19c3142b7256c1eb5bf1cb7c53b29b9/overlay.d/05core/usr/libexec/coreos-ignition-firstboot-complete#L14
coreos/coreos-assembler#4119 surfaced that this service would fail on soft-reboots, it's non fatal but would make the Kola tests fail. It looks like originally this was set to fail instead of risking not running it: https://github.com/coreos/fedora-coreos-config/blob/20feb176f19c3142b7256c1eb5bf1cb7c53b29b9/overlay.d/05core/usr/libexec/coreos-ignition-firstboot-complete#L14
Holy cow that was painful to figure out but fixes in #4133 get us improved error message handling to the point where I now see
Which was the real bug here - with soft reboots |
coreos/coreos-assembler#4119 surfaced that this service would fail on soft-reboots, it's non fatal but would make the Kola tests fail. It looks like originally this was set to fail instead of risking not running it: https://github.com/coreos/fedora-coreos-config/blob/20feb176f19c3142b7256c1eb5bf1cb7c53b29b9/overlay.d/05core/usr/libexec/coreos-ignition-firstboot-complete#L14
coreos/coreos-assembler#4119 surfaced that this service would fail on soft-reboots, it's non fatal but would make the Kola tests fail. It looks like originally this was set to fail instead of risking not running it: https://github.com/coreos/fedora-coreos-config/blob/20feb176f19c3142b7256c1eb5bf1cb7c53b29b9/overlay.d/05core/usr/libexec/coreos-ignition-firstboot-complete#L14 Co-authored-by: Colin Walters <[email protected]>
…t-reboots coreos/coreos-assembler#4119 surfaced that this service would fail on soft-reboots, it's non fatal but would make the Kola tests fail. It looks like originally this was set to fail instead of risking not running it: https://github.com/coreos/fedora-coreos-config/blob/20feb176f19c3142b7256c1eb5bf1cb7c53b29b9/overlay.d/05core/usr/libexec/coreos-ignition-firstboot-complete#L14 Co-authored-by: Colin Walters <[email protected]>
coreos/coreos-assembler#4119 surfaced that this services using ConditionKernelCommandLine=ignition.firstboot would fail on soft-reboots, it's non fatal but would make the Kola tests fail. Co-authored-by: Colin Walters <[email protected]>
Implements soft-reboot capabilities for Kola, it enables tests to use systemd's soft-reboot functionality. The implementation follows the same pattern as regular reboots but for `systemctl soft-reboot`, tracks systemd boot timestamps rather than kernel boot IDs for state detection. Co-Authored-By: Colin Walters <[email protected]> Co-Authored-By: Claude <[email protected]> Signed-off-by: Colin Walters <[email protected]> Signed-off-by: Joseph Marrero Corchado <[email protected]>
/test rhcos |
@jmarrero: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
coreos/coreos-assembler#4119 surfaced that this services using ConditionKernelCommandLine=ignition.firstboot would fail on soft-reboots, it's non fatal but would make the Kola tests fail. Co-authored-by: Jonathan Lebon <[email protected]> Co-authored-by: Colin Walters <[email protected]>
coreos/coreos-assembler#4119 surfaced that this services using ConditionKernelCommandLine=ignition.firstboot would fail on soft-reboots, it's non fatal but would make the Kola tests fail. Co-authored-by: Jonathan Lebon <[email protected]> Co-authored-by: Colin Walters <[email protected]>
Implements soft-reboot capabilities for Kola,
it enables tests to use systemd's soft-reboot functionality.
The implementation follows the same pattern as regular reboots but for
systemctl soft-reboot
, tracks systemd boottimestamps rather than kernel boot IDs for state detection.