Foundation types: platform traits, event extensions, sync changes, AnyMap clone#743
Foundation types: platform traits, event extensions, sync changes, AnyMap clone#743wdcui wants to merge 24 commits intowdcui/stacked/pr2-packager-crossplatformfrom
Conversation
…encoding New trampoline format changes for the syscall rewriter: Rewriter (litebox_syscall_rewriter): - Add redzone reservation (LEA RSP,[RSP-0x80]) before syscall callback entry on x86-64, allowing the callback to use the 128-byte red zone - Add R11 restart address (LEA R11,[RIP+disp32]) pointing back to the call-site JMP, enabling SA_RESTART signal re-execution - Re-encode RIP-relative memory operands in pre-syscall instructions when they are copied to the trampoline, using iced_x86::Encoder at the trampoline IP so displacements remain correct - Guard post-syscall instructions with RIP-relative operands by delegating to hook_syscall_before_and_after instead of raw-copying - Append header-only marker (trampoline_size=0) when no syscall instructions are found, so the loader can distinguish checked binaries from unpatched ones - Add 5 inline unit tests for Bun detection and RIP-relative encoding Loader (litebox_common_linux): - Handle trampoline_size==0 as a valid no-op (checked, no syscalls) - Add UnpatchedBinary error variant for binaries missing the magic - Add has_trampoline() accessor Platform/shim (litebox_platform_linux_userland): - Add saved_r11 TLS slot and save R11 on syscall callback entry - Add syscall_callback_redzone entry point that undoes red zone reservation before saving registers - Return syscall_callback_redzone from get_syscall_entry_point() Shim loader (litebox_shim_linux): - Treat UnpatchedBinary as non-fatal in parse_trampoline calls, allowing unpatched binaries to load without a trampoline
- Gate syscall_callback_redzone behind #[cfg(target_arch = "x86_64")] on Linux since the asm symbol only exists in the x86_64 asm block, fixing the i686 linker error. - Add syscall_callback_redzone entry point to the Windows platform so the new trampoline format (with redzone reservation) works correctly on the Windows emulator. Uses mov+add to SCRATCH to avoid clobbering rax. - Fix rustfmt import ordering in litebox_shim_linux/src/loader/elf.rs.
Add runtime syscall patching in the shim's mmap hook: when an ELF segment with PROT_EXEC is mapped, patch syscall instructions in-place and set up a trampoline region. The loader also patches the main binary at load time when it lacks a trampoline. Remove rtld_audit entirely: gut build.rs, remove the audit .so injection from the runner, and remove the REQUIRE_RTLD_AUDIT global. Supporting changes: - Add ReadAt impl for &[u8] in litebox_common_linux - Hook finalize_elf_patch into sys_close to mprotect trampolines RX - Add elf_patch_cache on GlobalState and suppress_elf_runtime_patch on Task - Update ratchet test (runner has zero globals now)
…with new trampoline format)
…UserPointer The new trampoline format loads a restart address into R11 (for SA_RESTART) before jumping to the callback. On Windows, the TLS index lookup clobbers R11, so we temporarily stash R11 in the per-thread TEB.ArbitraryUserPointer slot (gs:[0x28]) for the ~20 instructions of inline asm between callback entry and pt_regs save. Also removes the dead syscall_callback entry point (only syscall_callback_redzone is used since get_syscall_entry_point always returns the redzone variant).
… discriminate rewriter errors - Remove litebox_rtld_audit/ directory entirely (Makefile, rtld_audit.c, .gitignore) - Replace litebox_packager/build.rs with no-op (was building rtld_audit.so) - Remove rtld_audit tar entry from litebox_packager/src/lib.rs - Remove fixup_env and set_load_filter from both Linux and LoW runners - Fix RFLAGS clobber on Windows: use lea+mov instead of mov+add - Simplify is_at_syscall_callback: x86 checks syscall_callback, x86_64 checks syscall_callback_redzone - Discriminate trampoline parse errors: only UnpatchedBinary triggers runtime patching - Discriminate rewriter errors: expected non-fatal vs unexpected with logging - Restore fork-vfork patch error path from PR 1c - Simplify suppress_elf_runtime_patch logic - Clean up rtld_audit references in comments across codebase
…ck, add x86_64 comment, add post-syscall RIP-relative comment, fix formatting
…d guards, remove unused ElfPatchState fields
Deleting litebox_runner_linux_userland/build.rs (rtld_audit removal) also
removed Cargo's OUT_DIR env var from integration tests. Replace the three
call sites with env!("CARGO_TARGET_TMPDIR"), a compile-time macro
available since Rust 1.68 that requires no build.rs.
…olution, rewrite-include flag Make litebox_packager compile and work on non-Linux hosts (primarily Windows) by: - Remove #![cfg(target_os = "linux")] crate-level gate and the dual-main pattern; gate only the host-mode code path behind cfg(target_os) - Add file_mode() helper with unix/non-unix variants to replace MetadataExt::mode() calls - Extract run_host_mode() behind #[cfg(target_os = "linux")] - Track OCI layer symlinks in-memory instead of creating OS symlinks (Windows requires special privileges for symlinks); materialize them after all layers are extracted via resolve_symlink_in_rootfs() - Add is_unix_absolute(), strip_unix_root(), normalize_path() helpers for cross-platform path handling - Force linux/amd64 platform when pulling OCI images - Normalize path separators to Unix-style in tar entries - Add --rewrite-include CLI flag for dlopen'd libraries - Change Bun executable detection from warning to hard error - Switch tar headers from GNU to UStar format
…tion
Two bugs found during review:
1. Opaque whiteouts (.wh..wh..opq) and regular whiteouts (.wh.<name>)
removed files from disk but did not prune corresponding entries from
the in-memory symlinks vec. This caused materialize_symlinks() to
resurrect deleted symlinks that a later layer intended to remove.
2. resolve_symlink_in_rootfs() could return Some(rootfs) when a
degenerate symlink target with excess .. segments normalized to an
empty path via normalize_path(). rootfs.join("") == rootfs, which
exists as a directory, causing the entire rootfs to be treated as
a resolution target. Guard against empty rel_path at function entry.
- Store Unix permission modes from tar headers in a HashMap during extraction, so permission bits are accurate on non-Unix hosts (Windows) instead of relying on the file_mode() heuristic which returns wrong answers (0o755 for most files). - Build symlink_map once in pull_and_extract and pass through to materialize_symlinks and scan_rootfs (was duplicated in both). - Add lookup_mode() helper that prefers tar header permissions, falls back to file_mode(), defaults to 0o644. - Add existence check for --rewrite-include in finalize_tar (was missing). - Remove redundant --include/--rewrite-include parsing from run_host_mode. - Replace Bun test with rewrite_elf_skips_non_elf_files test. - Add 22 unit tests for normalize_path, is_unix_absolute, strip_unix_root, resolve_symlink_in_rootfs, and lookup_mode. - Add litebox_packager to build_and_test_windows CI job.
f9f769e to
30947db
Compare
7f33b78 to
aa7e539
Compare
30947db to
0c440ee
Compare
jaybosamiya-ms
left a comment
There was a problem hiding this comment.
This is an overlarge PR that conflates a lot of distinct concepts together, and should be broken down. Nonetheless, I've left comments throughout. I do not think in the current state (i.e., as one big PR) this is merge-able, or indeed even that a proper review can be done, since a bunch of the pieces require deeper design analysis, while some parts are easy enough to merge on their own. By properly splitting it up, we might actually be able to move forward without it becoming bogged down for a long time.
There was a problem hiding this comment.
This is an unnecessarily complex change. The insert here forces any types that exist in the AnyMap to also be Clone (which is fine), but that means that the storage could just declare + Clone, and not need to do all this song and dance about type-erased clone functions. Is there an actual reason we cannot just declare storage as HashMap<TypeId, Box<dyn Any + Send + Sync + Clone>>?
There was a problem hiding this comment.
These changes should be factored out into a separate PR, they are orthogonal to the rest of the changes here.
| //! Address-space management types and traits for multi-process support. | ||
| //! | ||
| //! The [`AddressSpaceProvider`] trait is an **optional** South interface that | ||
| //! platforms implement to manage per-process address spaces. On kernel platforms | ||
| //! (e.g., LVBS) each process gets its own page table; on userland platforms the | ||
| //! single host address space is partitioned into non-overlapping VA sub-ranges. | ||
| //! | ||
| //! See the multi-process design document at | ||
| //! `docs/multi-process-single-address-space.md` §4 for the full | ||
| //! rationale. |
There was a problem hiding this comment.
Please don't keep misleading docs (e.g., multi-process-single-address-space.md).
Also, currently this doc string is not telling me anything at all about multi process support. Why is there documentation about AddressSpaceProvider being optional here, why not literally at AddressSpaceProvider?
| pub enum ForkedAddressSpace<Id> { | ||
| /// Kernel platforms: independent copy-on-write copy with the full address | ||
| /// range. The child has its own page tables; CoW faults are resolved by | ||
| /// the platform. | ||
| Independent(Id), | ||
| /// Userland platforms: a new VA-range partition is assigned to the child. | ||
| /// Parent memory is shared (vfork-style semantics); the shim is | ||
| /// responsible for copying pages as needed. | ||
| SharedWithParent(Id), | ||
| } |
There was a problem hiding this comment.
I'm not convinced that asking both the shim and the platform to separately handle things is the right design. Ideally, if every shim needs to be copy pages, then that should be provided by the core, not handled at the shim.
| pub enum AddressSpaceError { | ||
| /// No free address-space slots or VA ranges available. | ||
| #[error("no address space slots available")] | ||
| NoSpace, | ||
| /// The given address-space ID is not valid (already destroyed, never | ||
| /// created, etc.). | ||
| #[error("invalid address space id")] | ||
| InvalidId, | ||
| /// The platform does not support this operation. | ||
| #[error("operation not supported by this platform")] | ||
| NotSupported, | ||
| } |
There was a problem hiding this comment.
Nit: generally speaking for error types, we've only maintained #[error(...)] and no doc strings, because duplicating / keeping almost-equiv text is not valuable.
litebox/src/sync/futex.rs
Outdated
| /// Supports both private and shared futexes. On userland platforms where all | ||
| /// processes share a single flat virtual address space (with non-overlapping VA | ||
| /// partitions), both private and shared futexes naturally work with | ||
| /// `address_space_id = 0` because the same virtual address always refers to | ||
| /// the same backing memory. On kernel platforms, the `address_space_id` | ||
| /// discriminator prevents false aliasing between processes that might have | ||
| /// overlapping virtual address ranges. |
There was a problem hiding this comment.
This includes internal details (such as address_space_id) inside a public doc string.
litebox/src/sync/futex.rs
Outdated
| /// Discriminator to prevent false aliasing across address spaces. | ||
| /// On userland (non-overlapping VA partitions) this is always 0. | ||
| /// On kernel platforms each process passes its AddressSpaceId as u64. | ||
| address_space_id: u64, |
There was a problem hiding this comment.
Do we need to know what it is in userspace/kernel space at this level? The code here does not care whether it is 0 or not, right?
litebox/src/sync/futex.rs
Outdated
| /// | ||
| /// `address_space_id` is an opaque discriminator that distinguishes futexes | ||
| /// at the same virtual address in different address spaces. On userland | ||
| /// platforms (non-overlapping VA partitions) pass `0`. On kernel platforms | ||
| /// pass the process's `AddressSpaceId` converted to `u64`. |
There was a problem hiding this comment.
How is an AddressSpaceId converted to a u64? Isn't it supposed to be opaque? Why not just use an opaque identifier?
| /// `address_space_id` must match the value passed to the corresponding | ||
| /// [`wait`](Self::wait) call. | ||
| /// |
There was a problem hiding this comment.
What happens if it does not should be documented
There was a problem hiding this comment.
I'm very confused by this change, why is this needed?
Add prune_dead_observers() to Subject, called during register_observer() to eagerly clean up stale weak references. Previously dead observers were only cleaned up during notify_observers(), which could leave orphaned entries indefinitely if notifications were infrequent. Includes a test verifying that registering a new observer prunes stale entries from a previously dropped observer.
e619f83 to
0bee7b2
Compare
Clone is not object-safe, so Box<dyn Any + Send + Sync + Clone> is not valid Rust. Instead, store a type-erased clone function pointer alongside each value that knows the concrete type and can clone through the trait object. This requires all types stored in AnyMap to implement Clone. Update the Clone bound on insert() and set_entry_metadata()/set_fd_metadata() in fd/mod.rs, and add #[derive(Clone)] to shim types that are stored as fd metadata: StdioStatusFlags, PipeStatusFlags, SocketOptions, SocketOFlags, SocketProxy.
0bee7b2 to
6c425be
Compare
Add two new default methods to the IOPollable trait: - needs_host_poll(): indicates the pollable cannot deliver async observer notifications (e.g. host-backed stdin), so callers should poll periodically. - should_block_read(): indicates whether reads should block when no data is available. Returns false for fds that use epoll/poll and expect EAGAIN. Both default to the backward-compatible behavior (no host poll needed, reads block). Forward the methods through the Arc<T> blanket impl.
…ngth Two changes to page_mgmt: 1. Add a 'noreserve' bool parameter to PageManagementProvider::allocate_pages(). When true, platforms that support it (Linux userland via MAP_NORESERVE) avoid reserving swap/commit upfront, enabling sparse memory reservations. Kernel, LVBS, and Windows platforms accept but ignore the parameter. All callers currently pass false (no behavior change). 2. Fix a bug in the default remap_pages() copy loop: the to_owned_slice() call was using old_range.len() (the entire range length) instead of the per-chunk length, causing each iteration to read beyond its chunk boundary.
…essor_number, stdin guard Small independent cleanups: - Make MockInstant::time and MockSystemTime::time pub(crate) so tests in other modules can construct instances with specific values. - Add #[allow(unused_imports)] to log_println! macro use-statements to suppress warnings when the macro is invoked in contexts where the import is redundant. - Add SystemInfoProvider::current_processor_number() default method returning 0, for platforms that don't expose processor topology. - Add empty-buffer early return in MockPlatform::read_from_stdin() to avoid unnecessarily consuming queued input on zero-length reads.
Add SendError::Io(i32) and ReceiveError::{ProtocolError, Eof} variants.
Introduce RawMessageProvider trait for direct guest-broker byte-stream
messaging (bypassing IP stack), with default stubs. Add Provider
supertrait bound. Implement (empty defaults) for all platform crates.
Update net/phy.rs to match on new error variants.
Add StdioReadError::WouldBlock, StdioIoctlError, TerminalAttributes, WindowSize, SetTermiosWhen, and HostTtyDeviceInfo types. Add read_from_stdin_nonblocking, get/set_terminal_attributes, get/set_window_size, get_terminal_input_bytes, poll_stdin_readable, cancel_stdin, and host_stdin_tty_device_info methods to StdioProvider with sensible defaults. Add mock implementations for get_terminal_input_bytes and poll_stdin_readable. Add test for nonblocking stdin reads.
…ment Introduce address_space.rs with AddressSpaceProvider trait, ForkedAddressSpace enum, and AddressSpaceError. The trait defines create/destroy/fork/activate/with_address_space operations with NotSupported defaults for platforms that lack multi-process support. Add Provider supertrait bound. Implement (with defaults) for all platform crates and mock.
…upport Add address_space_id: u64 discriminator to FutexEntry, bucket hashing, wait(), and wake() to prevent false aliasing between processes with overlapping virtual address ranges. On userland (non-overlapping VA partitions) callers pass 0. Update all shim callers and tests.
Add trace_fs feature to litebox Cargo.toml. When enabled (without lock_tracing), RawSyncPrimitivesProvider additionally requires DebugLogProvider, allowing filesystem tracing code to log through the platform's debug output.
Summary
Layer 0 of the core
liteboxcrate changes for multi-process support — platform traits, event extensions, sync changes, and AnyMap clone support.Platform traits
AddressSpaceProvidertrait (new file) for per-process address space managementRawMessageProviderfor direct guest–broker byte channels (bypasses IP stack)StdioProviderexpansion: terminal attributes, window size, nonblocking stdin, cancel, host TTY device infoSendError::Io,ReceiveError::ProtocolError/Eof,StdioReadError::WouldBlockSystemInfoProvider::current_processor_number()Providersupertrait now requires+ RawMessageProvider + AddressSpaceProviderEvent / sync
IOPollable::needs_host_poll()andshould_block_read()default methodsObserver::prune_dead_observers()— GCs staleWeakrefs on registerWaker::ptr_eq()utilityFutexManager::wait/wake—address_space_idparameter for multi-process futex isolationtrace_fsfeature gate forRawSyncPrimitivesProviderUtilities
AnyMap:Clonebound oninsert, type-erasedCloneimpl for the mapPageManagementProvider::allocate_pages:noreserveparametercopy_pages: fix chunk-length bug (was usingold_range.len()per iteration instead ofmin(remaining, ALIGN))Caller updates
RawMessageProvider+AddressSpaceProviderimpls,noreserveparamlitebox_shim_linux:Clonederives for metadata types,address_space_idin futex callsStack