Runtime ELF patching, trampoline format changes, and rtld_audit removal#739
Runtime ELF patching, trampoline format changes, and rtld_audit removal#739wdcui wants to merge 14 commits intowdcui/stacked/pr1c-rewriter-interfacefrom
Conversation
17cfce6 to
aa7dd83
Compare
35d845d to
434eb38
Compare
| push r9 // pt_regs->r9 | ||
| push r10 // pt_regs->r10 | ||
| push [rsp + 88] // pt_regs->r11 = rflags | ||
| push QWORD PTR gs:saved_r11@tpoff // pt_regs->r11 (syscall call-site from rewriter) |
There was a problem hiding this comment.
Currently, saved_r11 doesn't store rflags. Better to read the current (guest) rflags and store it to r11.
| push r10 // pt_regs->r10 | ||
| push [rsp + 88] // pt_regs->r11 = rflags | ||
| mov r10, gs:[0x28] // recover guest R11 saved at entry | ||
| push r10 // pt_regs->r11 = guest R11 (restart addr from rewriter) |
There was a problem hiding this comment.
similarly, this no longer stores rflags, which is different from PtRegs's expectation.
| if file_size < 32 { | ||
| return (false, 0, 0, 0); | ||
| } | ||
| let mut tail = [0u8; 32]; |
There was a problem hiding this comment.
nits. this is incompatible with 32-bit ELF.
| self.init_elf_patch_state(fd, mapped_addr.as_usize()); | ||
| } | ||
|
|
||
| let mut cache = self.global.elf_patch_cache.lock(); |
There was a problem hiding this comment.
wonder whether this heavy lock is acceptable.
There was a problem hiding this comment.
I think we don't expect a process to load executables concurrently?
| pub(crate) fn sys_close(&self, fd: i32) -> Result<(), Errno> { | ||
| // Finalize any in-progress ELF patching for this fd (mprotect | ||
| // trampoline RW→RX) before closing the descriptor. | ||
| self.finalize_elf_patch(fd); |
There was a problem hiding this comment.
nits. fd based trampoline management might be incompatible with dup2/3. Not so sure whether a program dup ELF files though.
There was a problem hiding this comment.
fixed it in sys_dup. please help check if it's correct.
litebox_syscall_rewriter/src/lib.rs
Outdated
| let r11_disp = i64::try_from(replace_start).unwrap() | ||
| - i64::try_from(trampoline_base_addr + trampoline_data.len() as u64 + 7).unwrap(); |
There was a problem hiding this comment.
nits. we might want to use checked_add/sub.
- Use EINVAL instead of ENODATA for trampoline parse failures (loader.rs) - Handle UnpatchedBinary as non-fatal in OptEE ELF loader (optee/elf.rs) - Document R11 restart-address contract in rewriter (lib.rs) - Replace unchecked arithmetic with checked_add_u64 in rewriter (lib.rs) - Rename saved_r11 to saved_restart_addr in Linux userland TLS (lib.rs) - Store RFLAGS from stack ([rsp+88]) instead of TLS in Linux/Windows userland pt_regs->r11 (lib.rs) - Save R11 restart address to TlsState on Windows userland (lib.rs) - Add cleanup-leak TODO comment in PatchedMapper::map_file (elf.rs) - Restore trampoline RX on mprotect failure path (mm.rs) - Make check_trampoline_magic pointer-width aware (mm.rs) - Validate e_phentsize before parsing program headers (mm.rs) - Clarify elf_patch_cache lock scope comment (mm.rs) - Finalize ELF patch for implicitly-closed fd in dup2/dup3 (file.rs)
…encoding New trampoline format changes for the syscall rewriter: Rewriter (litebox_syscall_rewriter): - Add redzone reservation (LEA RSP,[RSP-0x80]) before syscall callback entry on x86-64, allowing the callback to use the 128-byte red zone - Add R11 restart address (LEA R11,[RIP+disp32]) pointing back to the call-site JMP, enabling SA_RESTART signal re-execution - Re-encode RIP-relative memory operands in pre-syscall instructions when they are copied to the trampoline, using iced_x86::Encoder at the trampoline IP so displacements remain correct - Guard post-syscall instructions with RIP-relative operands by delegating to hook_syscall_before_and_after instead of raw-copying - Append header-only marker (trampoline_size=0) when no syscall instructions are found, so the loader can distinguish checked binaries from unpatched ones - Add 5 inline unit tests for Bun detection and RIP-relative encoding Loader (litebox_common_linux): - Handle trampoline_size==0 as a valid no-op (checked, no syscalls) - Add UnpatchedBinary error variant for binaries missing the magic - Add has_trampoline() accessor Platform/shim (litebox_platform_linux_userland): - Add saved_r11 TLS slot and save R11 on syscall callback entry - Add syscall_callback_redzone entry point that undoes red zone reservation before saving registers - Return syscall_callback_redzone from get_syscall_entry_point() Shim loader (litebox_shim_linux): - Treat UnpatchedBinary as non-fatal in parse_trampoline calls, allowing unpatched binaries to load without a trampoline
- Gate syscall_callback_redzone behind #[cfg(target_arch = "x86_64")] on Linux since the asm symbol only exists in the x86_64 asm block, fixing the i686 linker error. - Add syscall_callback_redzone entry point to the Windows platform so the new trampoline format (with redzone reservation) works correctly on the Windows emulator. Uses mov+add to SCRATCH to avoid clobbering rax. - Fix rustfmt import ordering in litebox_shim_linux/src/loader/elf.rs.
Add runtime syscall patching in the shim's mmap hook: when an ELF segment with PROT_EXEC is mapped, patch syscall instructions in-place and set up a trampoline region. The loader also patches the main binary at load time when it lacks a trampoline. Remove rtld_audit entirely: gut build.rs, remove the audit .so injection from the runner, and remove the REQUIRE_RTLD_AUDIT global. Supporting changes: - Add ReadAt impl for &[u8] in litebox_common_linux - Hook finalize_elf_patch into sys_close to mprotect trampolines RX - Add elf_patch_cache on GlobalState and suppress_elf_runtime_patch on Task - Update ratchet test (runner has zero globals now)
…with new trampoline format)
…UserPointer The new trampoline format loads a restart address into R11 (for SA_RESTART) before jumping to the callback. On Windows, the TLS index lookup clobbers R11, so we temporarily stash R11 in the per-thread TEB.ArbitraryUserPointer slot (gs:[0x28]) for the ~20 instructions of inline asm between callback entry and pt_regs save. Also removes the dead syscall_callback entry point (only syscall_callback_redzone is used since get_syscall_entry_point always returns the redzone variant).
… discriminate rewriter errors - Remove litebox_rtld_audit/ directory entirely (Makefile, rtld_audit.c, .gitignore) - Replace litebox_packager/build.rs with no-op (was building rtld_audit.so) - Remove rtld_audit tar entry from litebox_packager/src/lib.rs - Remove fixup_env and set_load_filter from both Linux and LoW runners - Fix RFLAGS clobber on Windows: use lea+mov instead of mov+add - Simplify is_at_syscall_callback: x86 checks syscall_callback, x86_64 checks syscall_callback_redzone - Discriminate trampoline parse errors: only UnpatchedBinary triggers runtime patching - Discriminate rewriter errors: expected non-fatal vs unexpected with logging - Restore fork-vfork patch error path from PR 1c - Simplify suppress_elf_runtime_patch logic - Clean up rtld_audit references in comments across codebase
…ck, add x86_64 comment, add post-syscall RIP-relative comment, fix formatting
…d guards, remove unused ElfPatchState fields
Deleting litebox_runner_linux_userland/build.rs (rtld_audit removal) also
removed Cargo's OUT_DIR env var from integration tests. Replace the three
call sites with env!("CARGO_TARGET_TMPDIR"), a compile-time macro
available since Rust 1.68 that requires no build.rs.
- Use EINVAL instead of ENODATA for trampoline parse failures (loader.rs) - Handle UnpatchedBinary as non-fatal in OptEE ELF loader (optee/elf.rs) - Document R11 restart-address contract in rewriter (lib.rs) - Replace unchecked arithmetic with checked_add_u64 in rewriter (lib.rs) - Rename saved_r11 to saved_restart_addr in Linux userland TLS (lib.rs) - Store RFLAGS from stack ([rsp+88]) instead of TLS in Linux/Windows userland pt_regs->r11 (lib.rs) - Save R11 restart address to TlsState on Windows userland (lib.rs) - Add cleanup-leak TODO comment in PatchedMapper::map_file (elf.rs) - Restore trampoline RX on mprotect failure path (mm.rs) - Make check_trampoline_magic pointer-width aware (mm.rs) - Validate e_phentsize before parsing program headers (mm.rs) - Clarify elf_patch_cache lock scope comment (mm.rs) - Finalize ELF patch for implicitly-closed fd in dup2/dup3 (file.rs)
ecf9130 to
15eb87e
Compare
- Add fork_to_vfork_patch computation to metadata extraction block - Rename replace_with_ud2 -> replace_with_trap (pr1c rename) - Adapt hook_syscalls_in_elf callers to (Vec<u8>, Vec<u64>) return type - Adapt patch_code_segment callers to 4-arg signature - Update error variant names (UnsupportedExecutable, UnsupportedObjectFile) - Remove dead has_bun_footer_marker (pr1c uses ends_with directly) - Run cargo fmt
15eb87e to
8ea3756
Compare
get_syscall_entry_point() now returns 0 when seccomp_interception_enabled is set, preventing unnecessary binary patching of syscall instructions when the systrap/seccomp backend is handling interception via SIGSYS.
| } | ||
| let p_vaddr = u64::from_le_bytes(ph[16..24].try_into().unwrap()); | ||
| let p_memsz = u64::from_le_bytes(ph[40..48].try_into().unwrap()); | ||
| let end = p_vaddr + p_memsz; |
| } | ||
|
|
||
| // Read program headers to find max PT_LOAD end | ||
| let phdrs_size = e_phentsize * e_phnum; |
| // Recover the restart address from the TEB slot and store it in TLS. | ||
| // We use SCRATCH as a temporary since all guest GPRs must be preserved | ||
| // and RSP modifications would break the stack pointer recovery below. | ||
| push QWORD PTR gs:[0x28] |
| self.parsed.load(&mut mapper, &mut &*platform) | ||
| } else { | ||
| self.parsed.load(&mut self.file, &mut &*platform) |
There was a problem hiding this comment.
if this load fails, self.file.task.suppress_elf_runtime_patch.set(false) is never executed.
| Some(tramp_addr), | ||
| tramp_len, | ||
| ProtFlags::PROT_READ | ProtFlags::PROT_WRITE, | ||
| MapFlags::MAP_ANONYMOUS | MapFlags::MAP_PRIVATE | MapFlags::MAP_FIXED, |
There was a problem hiding this comment.
MAP_FIXED_NOREPLACE might be better.
| // Patch fork → vfork: overwrite the first bytes of __libc_fork with a | ||
| // JMP to __libc_vfork. This prevents glibc's fork wrapper from running | ||
| // post-fork handlers that corrupt shared state under vfork semantics. | ||
| if let Some((fork_file_offset, fork_patch_end, rel32)) = fork_to_vfork_patch { |
There was a problem hiding this comment.
This patch is introduced again?
| Ok(()) | Err(litebox_common_linux::loader::ElfParseError::UnpatchedBinary) => { | ||
| // Unpatched binary is expected in the LVBS scenario where not | ||
| // all binaries are rewritten. Proceed without a trampoline. | ||
| } |
There was a problem hiding this comment.
Better to have TODO or open an issue to track this.
There was a problem hiding this comment.
Out of curiosity: how is OPTEE-on-Linux supposed to work for unpatched binaries?
| } | ||
|
|
||
| /// Find fork and vfork symbols in the ELF and compute the patch needed to | ||
| /// redirect fork -> vfork. Returns `Some((fork_file_offset, jmp_rel32))` if |
There was a problem hiding this comment.
This function now returns 3-tuple.
There was a problem hiding this comment.
Now that the entire body of this has been removed, I don't think this file is needed anymore, right?
| The packager discovers dependencies, rewrites all ELFs, and creates a tar. | ||
| The rewritten main binary is extracted | ||
| from the tar and placed alongside it. |
| // When the seccomp/systrap backend is active, syscall instructions are | ||
| // trapped via SIGSYS — no binary rewriting needed. | ||
| #[cfg(feature = "systrap_backend")] | ||
| if self | ||
| .seccomp_interception_enabled | ||
| .load(std::sync::atomic::Ordering::SeqCst) | ||
| { | ||
| return 0; | ||
| } | ||
|
|
There was a problem hiding this comment.
I'm surprised by this conditional here. Do we need to give a 0 here? If it is not needed, it can just be kept at the non-tweaked default behavior, no?
| #[cfg(target_arch = "x86")] | ||
| let is_at_syscall_callback = ip == syscall_callback as *const () as usize; | ||
| #[cfg(target_arch = "x86_64")] | ||
| let is_at_syscall_callback = ip == syscall_callback_redzone as *const () as usize | ||
| || ip == syscall_callback as *const () as usize; | ||
| if is_at_syscall_callback { |
There was a problem hiding this comment.
Seems somewhat repetitive, and also somewhat surprising. get_syscall_entry_point only uses redzone variant, but this one checks both. Wouldn't it be correct to just do a ip == get_syscall_entry_point()?
| // When using the rewriter backend, the shim's mmap hook handles | ||
| // syscall patching at runtime — no audit library needed. | ||
| match cli_args.interception_backend { | ||
| InterceptionBackend::Rewriter => { | ||
| #[cfg(not(target_arch = "x86_64"))] | ||
| eprintln!("WARN: litebox_rtld_audit not currently supported on non-x86_64 arch"); | ||
| #[cfg(target_arch = "x86_64")] | ||
| in_mem.with_root_privileges(|fs| { | ||
| let rwxr_xr_x = Mode::RWXU | Mode::RGRP | Mode::XGRP | Mode::ROTH | Mode::XOTH; | ||
| let _ = fs.mkdir("/lib", rwxr_xr_x); | ||
| let fd = fs | ||
| .open( | ||
| "/lib/litebox_rtld_audit.so", | ||
| litebox::fs::OFlags::WRONLY | litebox::fs::OFlags::CREAT, | ||
| rwxr_xr_x, | ||
| ) | ||
| .expect("Failed to create /lib/litebox_rtld_audit.so"); | ||
| fs.initialize_primarily_read_heavy_file( | ||
| &fd, | ||
| include_bytes!(concat!(env!("OUT_DIR"), "/litebox_rtld_audit.so")).into(), | ||
| ); | ||
| fs.close(&fd) | ||
| .expect("Failed to close /lib/litebox_rtld_audit.so"); | ||
| }); | ||
| } | ||
| InterceptionBackend::Seccomp => { | ||
| // No need to include rtld_audit.so for seccomp backend | ||
| } | ||
| InterceptionBackend::Rewriter | InterceptionBackend::Seccomp => {} | ||
| } |
There was a problem hiding this comment.
Superfluous code left in
There was a problem hiding this comment.
Afaict, the shim should not be doing the rewriting, that should be the platform's job. The shim is in charge of the actual handling of syscalls, the platform is in charge of making sure that syscalls actually show up to the shim. allocate_pages/update_permissions from the platform traits are probably the correct places to hook in for the platform to trigger things. If you want to keep shared pieces together across platforms, then litebox_common_linux is probably where the common bits should sit. The actual invocation of the rewriter is not a shim-level concern, imho.
There was a problem hiding this comment.
Due to this above comment, I have not done a proper review of the exact code in this file, since I think it should (fully, or at least mostly) be left untouched by this PR. I will look at the migrated code after we're done updating it.
There was a problem hiding this comment.
Similar to the elf.rs thing, nothing about elf patching should really be showing up here, right?
| Ok(()) | Err(litebox_common_linux::loader::ElfParseError::UnpatchedBinary) => { | ||
| // Unpatched binary is expected in the LVBS scenario where not | ||
| // all binaries are rewritten. Proceed without a trampoline. | ||
| } |
There was a problem hiding this comment.
Out of curiosity: how is OPTEE-on-Linux supposed to work for unpatched binaries?
| seq-macro = "0.3" | ||
| ringbuf = { version = "0.4.8", default-features = false, features = ["alloc"] } | ||
| zerocopy = { version = "0.8", default-features = false, features = ["derive"] } | ||
| litebox_syscall_rewriter = { version = "0.1.0", path = "../litebox_syscall_rewriter", default-features = false } |
There was a problem hiding this comment.
See comment at elf.rs, the shim itself should not need to even be aware of the syscall rewriter.
Summary