Skip to content

fix(uffd): register guest memory with WRITE_PROTECT in addition to MISSING#12

Closed
ValentaTomas wants to merge 1 commit intomainfrom
fix/uffd-register-write-protect
Closed

fix(uffd): register guest memory with WRITE_PROTECT in addition to MISSING#12
ValentaTomas wants to merge 1 commit intomainfrom
fix/uffd-register-write-protect

Conversation

@ValentaTomas
Copy link
Copy Markdown
Member

Changes

In guest_memory_from_uffd (src/vmm/src/persist.rs), switch the per-region UFFD registration from the Uffd::register(...) convenience wrapper (which uses UFFDIO_REGISTER_MODE_MISSING only) to Uffd::register_with_mode(... MISSING | WRITE_PROTECT).

To use RegisterMode::WRITE_PROTECT, also enable the linux5_7 feature on the userfaultfd crate dependency in src/vmm/Cargo.toml. The flag is named after the kernel that introduced UFFDIO_WRITEPROTECT (Linux 5.7), which is well below Firecracker's supported kernel range.

Reason

Any UFFD handler that asks the kernel to keep a copied page write-protected by passing UFFDIO_COPY_MODE_WP got a synchronous EINVAL on the very first read fault, because the destination range was never registered with UFFDIO_REGISTER_MODE_WP. WRITE_PROTECT registration is what enables the standard CoW snapshot pattern:

  1. Handler receives a MISSING fault, serves the page via UFFDIO_COPY with MODE_WP so the page lands write-protected.
  2. Guest writes to the page; kernel re-faults to the handler.
  3. Handler now knows which pages were dirtied after restore (without MODE_WP it cannot distinguish "page was just populated" from "page was modified after population").

Without WP registration, step 1 fails immediately with EINVAL, breaking the resume path for any handler that opts into WP tracking. We hit this in practice with the e2b-dev/infra orchestrator's UFFD handler.

This change is a strict superset of the previous registration. Handlers that never pass MODE_WP behave exactly as before — WRITE_PROTECT registration on its own is a no-op until the handler explicitly write-protects pages.

Repro

Trivial against the unpatched binary:

// inside any UFFD handler responding to a MISSING fault
let mut copy = uffdio_copy { ... mode: UFFDIO_COPY_MODE_WP, .. };
ioctl(uffd, UFFDIO_COPY, &mut copy);  // -> EINVAL on stock FC
                                      // -> Ok with this patch

Reproduced end-to-end from the orchestrator's resume path: every read fault returned failed to handle uffd: failed to handle uffd: invalid argument (the propagated EINVAL) until WRITE_PROTECT was registered on the FC side.

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license.

PR Checklist

  • I have read and understand CONTRIBUTING.md.
  • cargo check -p vmm --target x86_64-unknown-linux-gnu passes.
  • cargo build -p vmm --target x86_64-unknown-linux-gnu passes.
  • tools/devtool checkbuild --all — not run locally (this is a draft).
  • tools/devtool checkstyle — not run locally (this is a draft).
  • I have described what is done in these changes, why they are needed, and how they are solving the problem.
  • CHANGELOG.md entry — happy to add if reviewers want one.
  • I have tested all new and changed functionalities in unit tests and/or integration tests.
  • I have linked an issue to every new TODO.

  • This functionality cannot be added in rust-vmm (it's purely a Firecracker registration change).

…SSING

`guest_memory_from_uffd` registered each guest memory region with
`Uffd::register(...)`, which is the convenience wrapper that uses
`UFFDIO_REGISTER_MODE_MISSING` only. As a result, any UFFD handler that
asks the kernel to keep a copied page write-protected (by passing
`UFFDIO_COPY_MODE_WP`) gets a synchronous EINVAL on the very first read
fault, because the destination range was never registered with
`UFFDIO_REGISTER_MODE_WP`.

WRITE_PROTECT registration is what enables the standard CoW snapshot
pattern: the handler serves a missing page via UFFDIO_COPY with
MODE_WP, the kernel re-faults on the next write to that page, and the
handler observes (and can record) which pages got dirtied after
restore. Without WP registration this pattern silently breaks the
resume path.

Switch `register(...)` to `register_with_mode(... MISSING |
WRITE_PROTECT)`. `RegisterMode::WRITE_PROTECT` lives behind the
`linux5_7` feature of the `userfaultfd` crate (UFFDIO_WRITEPROTECT was
added in Linux 5.7), so also enable that feature on the dependency in
`src/vmm/Cargo.toml`. Firecracker's minimum supported kernel is well
past 5.7.

This is a strict superset of the previous registration: existing
handlers that don't pass MODE_WP behave identically.

Signed-off-by: ValentaTomas <valenta.and.thomas@gmail.com>
@cursor
Copy link
Copy Markdown

cursor Bot commented Apr 18, 2026

PR Summary

Medium Risk
Touches snapshot restore and userfaultfd registration behavior; mis-registration could break UFFD-backed resume or change fault handling on supported kernels.

Overview
Fixes UFFD-backed snapshot restore by registering each guest memory region with RegisterMode::MISSING | RegisterMode::WRITE_PROTECT (instead of MISSING-only), allowing external UFFD handlers to use UFFDIO_COPY_MODE_WP for CoW-style write tracking without hitting EINVAL.

Enables the userfaultfd crate’s linux5_7 feature so WRITE_PROTECT registration is available.

Reviewed by Cursor Bugbot for commit 32330a4. Bugbot is set up for automated code reviews on this repo. Configure here.

@ValentaTomas
Copy link
Copy Markdown
Member Author

Closing — this is a duplicate. Babis already implemented the WRITE_PROTECT registration (and went further with WP_ASYNC and immediate uffd.write_protect on hugetlbfs) in 8fc760f61 on firecracker-v1.12-direct-mem (PR #2), which is the actual production fork base.

I missed that branch when investigating because I only searched origin/main. The EINVAL we hit during resume-build testing was because we built our binary from lazy-mem-copy, which forked from upstream v1.14.1 before that fix existed, so it still does uffd.register(...) (MISSING-only). The right action is to cherry-pick 8fc760f61 (and its dependencies, if any) onto lazy-mem-copy, not to open a separate fix against main.

@ValentaTomas ValentaTomas deleted the fix/uffd-register-write-protect branch April 18, 2026 23:05
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 32330a4. Configure here.

Comment thread src/vmm/src/persist.rs
mem_region.size() as _,
RegisterMode::MISSING | RegisterMode::WRITE_PROTECT,
)
.map_err(GuestMemoryFromUffdError::Register)?;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UFFD WRITE_PROTECT registration breaks aarch64 on kernels below 6.10

High Severity

Unconditionally registering with RegisterMode::WRITE_PROTECT will cause UFFDIO_REGISTER to fail with EINVAL on aarch64 with kernels before 6.10, because arm64 userfaultfd write-protect page table support (pgtable_uffd_wp_supported()) was only added in Linux 6.10. Firecracker supports aarch64 on kernels 5.10 and 6.1, both of which lack this support. This is a regression that breaks all UFFD-based snapshot restores on aarch64, not just WP-tracking ones, since the registration itself fails.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 32330a4. Configure here.

Comment thread src/vmm/src/persist.rs
.create()
.map_err(GuestMemoryFromUffdError::Create)?;

// Register every region for both MISSING and WRITE_PROTECT faults.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing WP feature negotiation during UFFDIO_API handshake

Medium Severity

The code registers regions with RegisterMode::WRITE_PROTECT but never negotiates FeatureFlags::PAGEFAULT_FLAG_WP during the UFFDIO_API handshake. The Linux man page states the user needs to check availability of UFFD_FEATURE_PAGEFAULT_FLAG_WP via UFFDIO_API before using write-protect mode. Adding this to require_features would also provide an early, clear failure on platforms that lack WP support instead of a less informative error at registration time.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 32330a4. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant