Skip to content

Freeze container during live checkpoint for consistency#28963

Open
rst0git wants to merge 1 commit into
podman-container-tools:mainfrom
rst0git:checkpoint-leave-running-fs-consistency
Open

Freeze container during live checkpoint for consistency#28963
rst0git wants to merge 1 commit into
podman-container-tools:mainfrom
rst0git:checkpoint-leave-running-fs-consistency

Conversation

@rst0git

@rst0git rst0git commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

When checkpointing a container with --leave-running, the processes inside the container continue running after CRIU captures the container's runtime state. Because the rootfs diff and named volumes are saved afterward (in exportCheckpoint() / createCheckpointImage()), this can result in an inconsistent checkpoint state, where the CRIU images reflect an earlier point in time than the captured filesystem state. To fix this, we freeze the container cgroup during the checkpoint operation similar to the approach used with other engines (e.g. CRI-O, containerd).

@packit-as-a-service

Copy link
Copy Markdown

[NON-BLOCKING] Packit jobs failed. @containers/packit-build please check. Everyone else, feel free to ignore.

@mheon

mheon commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Arguably a breaking change? But, given we did the same to podman commit in 6.0, might be worth including in 6.0?

@mheon mheon added the 6.0 Breaking changes for Podman 6.0 label Jun 18, 2026
@mheon

mheon commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

After discussion, let's get this in 6.0

@Luap99 Luap99 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems reasonable overall.
some nits for the test

Also please squash the commits into one, we like feature/bug and test in one commit.

Comment thread test/e2e/checkpoint_test.go Outdated
Comment thread test/e2e/checkpoint_test.go Outdated
Comment thread test/e2e/checkpoint_test.go Outdated
Comment thread test/e2e/checkpoint_test.go Outdated
Comment thread libpod/container_internal_common.go Outdated
Comment thread libpod/container_internal_common.go Outdated
Comment thread test/e2e/checkpoint_test.go Outdated
Comment thread test/e2e/checkpoint_test.go Outdated
@rst0git

rst0git commented Jun 19, 2026

Copy link
Copy Markdown
Contributor Author

Arguably a breaking change?

@mheon It shouldn't be a breaking change. The only difference is that we keep a container paused until the checkpoint is fully written when --leave-running is used. The container continues running exactly as before. This is just fixing a problem with the internal behavior of the checkpointing functionality.

cc: @adrianreber

@rst0git rst0git force-pushed the checkpoint-leave-running-fs-consistency branch 2 times, most recently from 065ebd0 to 068d2c2 Compare June 19, 2026 11:13
When checkpointing a container with --leave-running, libpod dumps the
container's memory via the OCI runtime (CRIU) first and only captures
the rootfs diff and named volumes afterwards. CRIU thaws the container
as soon as the memory dump finishes, so the processes inside the
container continue to run between the memory snapshot and the
file-system capture. As a result, the checkpoint can be inconsistent:
have CRIU images and a file system that reflect different points in time.

To fix this, we freeze the container's cgroup before invoking the OCI
runtime and thaw it again only after the checkpoint image/archive has
been written. The OCI runtime calls CRIU with the freezer cgroup and
restores it to its previous state once the dump completes, so a
container that was already frozen stays frozen across the dump and
the file system is captured at the same instant as the CRIU images.
This mirrors the approach other engines (e.g. CRI-O and containerd).

The default (stopping) checkpoint functionality is not affected by this
issue because CRIU leaves the tasks dead after the dump.

This patch also adds a regression test for the consistency of live
(--leave-running) checkpoints. The container runs a workload that
keeps an in-memory counter in sync with a value written to a file
on its root file system, maintaining the invariant that the on-disk
value never gets ahead of the in-memory counter.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
@rst0git rst0git force-pushed the checkpoint-leave-running-fs-consistency branch from 068d2c2 to 44d1e68 Compare June 19, 2026 11:26

@Luap99 Luap99 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

6.0 Breaking changes for Podman 6.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants