Skip to content

Runtime ELF patching, trampoline format changes, and rtld_audit removal#739

Open
wdcui wants to merge 14 commits intowdcui/stacked/pr1c-rewriter-interfacefrom
wdcui/stacked/pr1b-trampoline-format
Open

Runtime ELF patching, trampoline format changes, and rtld_audit removal#739
wdcui wants to merge 14 commits intowdcui/stacked/pr1c-rewriter-interfacefrom
wdcui/stacked/pr1b-trampoline-format

Conversation

@wdcui
Copy link
Copy Markdown
Member

@wdcui wdcui commented Apr 2, 2026

Summary

  • Runtime ELF patching: The shim's mmap hook now patches syscall instructions in executable segments on the fly, and the loader patches the main binary at load time when it lacks a trampoline.
  • Trampoline format changes: Redzone reservation (128-byte SUB RSP), R11-based restart instead of LEA RCX, and RIP-relative instruction re-encoding in the rewriter.
  • rtld_audit removal: The audit library build, injection, and LD_AUDIT environment setup are fully removed. Runtime patching replaces the audit library's role.

@wdcui wdcui marked this pull request as draft April 2, 2026 13:52
@wdcui wdcui force-pushed the wdcui/stacked/pr1b-trampoline-format branch from 17cfce6 to aa7dd83 Compare April 2, 2026 14:43
@wdcui wdcui changed the title Trampoline format: redzone reservation, R11 restart, RIP-relative re-encoding Runtime ELF patching, trampoline format changes, and rtld_audit removal Apr 2, 2026
@wdcui wdcui marked this pull request as ready for review April 3, 2026 04:43
@wdcui wdcui requested review from CvvT and sangho2 April 3, 2026 04:43
@wdcui wdcui force-pushed the wdcui/stacked/pr1b-trampoline-format branch from 35d845d to 434eb38 Compare April 5, 2026 04:14
push r9 // pt_regs->r9
push r10 // pt_regs->r10
push [rsp + 88] // pt_regs->r11 = rflags
push QWORD PTR gs:saved_r11@tpoff // pt_regs->r11 (syscall call-site from rewriter)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, saved_r11 doesn't store rflags. Better to read the current (guest) rflags and store it to r11.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

push r10 // pt_regs->r10
push [rsp + 88] // pt_regs->r11 = rflags
mov r10, gs:[0x28] // recover guest R11 saved at entry
push r10 // pt_regs->r11 = guest R11 (restart addr from rewriter)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similarly, this no longer stores rflags, which is different from PtRegs's expectation.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

if file_size < 32 {
return (false, 0, 0, 0);
}
let mut tail = [0u8; 32];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits. this is incompatible with 32-bit ELF.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

self.init_elf_patch_state(fd, mapped_addr.as_usize());
}

let mut cache = self.global.elf_patch_cache.lock();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wonder whether this heavy lock is acceptable.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't expect a process to load executables concurrently?

pub(crate) fn sys_close(&self, fd: i32) -> Result<(), Errno> {
// Finalize any in-progress ELF patching for this fd (mprotect
// trampoline RW→RX) before closing the descriptor.
self.finalize_elf_patch(fd);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits. fd based trampoline management might be incompatible with dup2/3. Not so sure whether a program dup ELF files though.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed it in sys_dup. please help check if it's correct.

Comment on lines +574 to +575
let r11_disp = i64::try_from(replace_start).unwrap()
- i64::try_from(trampoline_base_addr + trampoline_data.len() as u64 + 7).unwrap();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits. we might want to use checked_add/sub.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@wdcui wdcui requested a review from jaybosamiya-ms April 8, 2026 16:11
wdcui added a commit that referenced this pull request Apr 8, 2026
- Use EINVAL instead of ENODATA for trampoline parse failures (loader.rs)
- Handle UnpatchedBinary as non-fatal in OptEE ELF loader (optee/elf.rs)
- Document R11 restart-address contract in rewriter (lib.rs)
- Replace unchecked arithmetic with checked_add_u64 in rewriter (lib.rs)
- Rename saved_r11 to saved_restart_addr in Linux userland TLS (lib.rs)
- Store RFLAGS from stack ([rsp+88]) instead of TLS in Linux/Windows
  userland pt_regs->r11 (lib.rs)
- Save R11 restart address to TlsState on Windows userland (lib.rs)
- Add cleanup-leak TODO comment in PatchedMapper::map_file (elf.rs)
- Restore trampoline RX on mprotect failure path (mm.rs)
- Make check_trampoline_magic pointer-width aware (mm.rs)
- Validate e_phentsize before parsing program headers (mm.rs)
- Clarify elf_patch_cache lock scope comment (mm.rs)
- Finalize ELF patch for implicitly-closed fd in dup2/dup3 (file.rs)
wdcui added 12 commits April 8, 2026 17:57
…encoding

New trampoline format changes for the syscall rewriter:

Rewriter (litebox_syscall_rewriter):
- Add redzone reservation (LEA RSP,[RSP-0x80]) before syscall callback
  entry on x86-64, allowing the callback to use the 128-byte red zone
- Add R11 restart address (LEA R11,[RIP+disp32]) pointing back to the
  call-site JMP, enabling SA_RESTART signal re-execution
- Re-encode RIP-relative memory operands in pre-syscall instructions
  when they are copied to the trampoline, using iced_x86::Encoder at
  the trampoline IP so displacements remain correct
- Guard post-syscall instructions with RIP-relative operands by
  delegating to hook_syscall_before_and_after instead of raw-copying
- Append header-only marker (trampoline_size=0) when no syscall
  instructions are found, so the loader can distinguish checked
  binaries from unpatched ones
- Add 5 inline unit tests for Bun detection and RIP-relative encoding

Loader (litebox_common_linux):
- Handle trampoline_size==0 as a valid no-op (checked, no syscalls)
- Add UnpatchedBinary error variant for binaries missing the magic
- Add has_trampoline() accessor

Platform/shim (litebox_platform_linux_userland):
- Add saved_r11 TLS slot and save R11 on syscall callback entry
- Add syscall_callback_redzone entry point that undoes red zone
  reservation before saving registers
- Return syscall_callback_redzone from get_syscall_entry_point()

Shim loader (litebox_shim_linux):
- Treat UnpatchedBinary as non-fatal in parse_trampoline calls,
  allowing unpatched binaries to load without a trampoline
- Gate syscall_callback_redzone behind #[cfg(target_arch = "x86_64")] on
  Linux since the asm symbol only exists in the x86_64 asm block, fixing
  the i686 linker error.
- Add syscall_callback_redzone entry point to the Windows platform so the
  new trampoline format (with redzone reservation) works correctly on the
  Windows emulator. Uses mov+add to SCRATCH to avoid clobbering rax.
- Fix rustfmt import ordering in litebox_shim_linux/src/loader/elf.rs.
Add runtime syscall patching in the shim's mmap hook: when an ELF
segment with PROT_EXEC is mapped, patch syscall instructions in-place
and set up a trampoline region. The loader also patches the main
binary at load time when it lacks a trampoline.

Remove rtld_audit entirely: gut build.rs, remove the audit .so
injection from the runner, and remove the REQUIRE_RTLD_AUDIT global.

Supporting changes:
- Add ReadAt impl for &[u8] in litebox_common_linux
- Hook finalize_elf_patch into sys_close to mprotect trampolines RX
- Add elf_patch_cache on GlobalState and suppress_elf_runtime_patch on Task
- Update ratchet test (runner has zero globals now)
…UserPointer

The new trampoline format loads a restart address into R11 (for
SA_RESTART) before jumping to the callback.  On Windows, the TLS
index lookup clobbers R11, so we temporarily stash R11 in the
per-thread TEB.ArbitraryUserPointer slot (gs:[0x28]) for the ~20
instructions of inline asm between callback entry and pt_regs save.

Also removes the dead syscall_callback entry point (only
syscall_callback_redzone is used since get_syscall_entry_point
always returns the redzone variant).
… discriminate rewriter errors

- Remove litebox_rtld_audit/ directory entirely (Makefile, rtld_audit.c, .gitignore)
- Replace litebox_packager/build.rs with no-op (was building rtld_audit.so)
- Remove rtld_audit tar entry from litebox_packager/src/lib.rs
- Remove fixup_env and set_load_filter from both Linux and LoW runners
- Fix RFLAGS clobber on Windows: use lea+mov instead of mov+add
- Simplify is_at_syscall_callback: x86 checks syscall_callback, x86_64 checks syscall_callback_redzone
- Discriminate trampoline parse errors: only UnpatchedBinary triggers runtime patching
- Discriminate rewriter errors: expected non-fatal vs unexpected with logging
- Restore fork-vfork patch error path from PR 1c
- Simplify suppress_elf_runtime_patch logic
- Clean up rtld_audit references in comments across codebase
…ck, add x86_64 comment, add post-syscall RIP-relative comment, fix formatting
…d guards, remove unused ElfPatchState fields
Deleting litebox_runner_linux_userland/build.rs (rtld_audit removal) also
removed Cargo's OUT_DIR env var from integration tests. Replace the three
call sites with env!("CARGO_TARGET_TMPDIR"), a compile-time macro
available since Rust 1.68 that requires no build.rs.
- Use EINVAL instead of ENODATA for trampoline parse failures (loader.rs)
- Handle UnpatchedBinary as non-fatal in OptEE ELF loader (optee/elf.rs)
- Document R11 restart-address contract in rewriter (lib.rs)
- Replace unchecked arithmetic with checked_add_u64 in rewriter (lib.rs)
- Rename saved_r11 to saved_restart_addr in Linux userland TLS (lib.rs)
- Store RFLAGS from stack ([rsp+88]) instead of TLS in Linux/Windows
  userland pt_regs->r11 (lib.rs)
- Save R11 restart address to TlsState on Windows userland (lib.rs)
- Add cleanup-leak TODO comment in PatchedMapper::map_file (elf.rs)
- Restore trampoline RX on mprotect failure path (mm.rs)
- Make check_trampoline_magic pointer-width aware (mm.rs)
- Validate e_phentsize before parsing program headers (mm.rs)
- Clarify elf_patch_cache lock scope comment (mm.rs)
- Finalize ELF patch for implicitly-closed fd in dup2/dup3 (file.rs)
@wdcui wdcui force-pushed the wdcui/stacked/pr1b-trampoline-format branch from ecf9130 to 15eb87e Compare April 8, 2026 18:03
- Add fork_to_vfork_patch computation to metadata extraction block
- Rename replace_with_ud2 -> replace_with_trap (pr1c rename)
- Adapt hook_syscalls_in_elf callers to (Vec<u8>, Vec<u64>) return type
- Adapt patch_code_segment callers to 4-arg signature
- Update error variant names (UnsupportedExecutable, UnsupportedObjectFile)
- Remove dead has_bun_footer_marker (pr1c uses ends_with directly)
- Run cargo fmt
@wdcui wdcui force-pushed the wdcui/stacked/pr1b-trampoline-format branch from 15eb87e to 8ea3756 Compare April 8, 2026 18:12
get_syscall_entry_point() now returns 0 when seccomp_interception_enabled
is set, preventing unnecessary binary patching of syscall instructions
when the systrap/seccomp backend is handling interception via SIGSYS.
}
let p_vaddr = u64::from_le_bytes(ph[16..24].try_into().unwrap());
let p_memsz = u64::from_le_bytes(ph[40..48].try_into().unwrap());
let end = p_vaddr + p_memsz;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits. checked_add.

}

// Read program headers to find max PT_LOAD end
let phdrs_size = e_phentsize * e_phnum;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits. checked_mul.

// Recover the restart address from the TEB slot and store it in TLS.
// We use SCRATCH as a temporary since all guest GPRs must be preserved
// and RSP modifications would break the stack pointer recovery below.
push QWORD PTR gs:[0x28]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stack overflow?

Comment on lines +372 to +374
self.parsed.load(&mut mapper, &mut &*platform)
} else {
self.parsed.load(&mut self.file, &mut &*platform)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this load fails, self.file.task.suppress_elf_runtime_patch.set(false) is never executed.

Some(tramp_addr),
tramp_len,
ProtFlags::PROT_READ | ProtFlags::PROT_WRITE,
MapFlags::MAP_ANONYMOUS | MapFlags::MAP_PRIVATE | MapFlags::MAP_FIXED,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MAP_FIXED_NOREPLACE might be better.

// Patch fork → vfork: overwrite the first bytes of __libc_fork with a
// JMP to __libc_vfork. This prevents glibc's fork wrapper from running
// post-fork handlers that corrupt shared state under vfork semantics.
if let Some((fork_file_offset, fork_patch_end, rel32)) = fork_to_vfork_patch {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This patch is introduced again?

Comment on lines +199 to +202
Ok(()) | Err(litebox_common_linux::loader::ElfParseError::UnpatchedBinary) => {
// Unpatched binary is expected in the LVBS scenario where not
// all binaries are rewritten. Proceed without a trampoline.
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to have TODO or open an issue to track this.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity: how is OPTEE-on-Linux supposed to work for unpatched binaries?

}

/// Find fork and vfork symbols in the ELF and compute the patch needed to
/// redirect fork -> vfork. Returns `Some((fork_file_offset, jmp_rel32))` if
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function now returns 3-tuple.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that the entire body of this has been removed, I don't think this file is needed anymore, right?

Comment on lines +64 to 66
The packager discovers dependencies, rewrites all ELFs, and creates a tar.
The rewritten main binary is extracted
from the tar and placed alongside it.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: reflow text

Comment on lines +2080 to +2089
// When the seccomp/systrap backend is active, syscall instructions are
// trapped via SIGSYS — no binary rewriting needed.
#[cfg(feature = "systrap_backend")]
if self
.seccomp_interception_enabled
.load(std::sync::atomic::Ordering::SeqCst)
{
return 0;
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised by this conditional here. Do we need to give a 0 here? If it is not needed, it can just be kept at the non-tweaked default behavior, no?

Comment on lines +2764 to +2769
#[cfg(target_arch = "x86")]
let is_at_syscall_callback = ip == syscall_callback as *const () as usize;
#[cfg(target_arch = "x86_64")]
let is_at_syscall_callback = ip == syscall_callback_redzone as *const () as usize
|| ip == syscall_callback as *const () as usize;
if is_at_syscall_callback {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems somewhat repetitive, and also somewhat surprising. get_syscall_entry_point only uses redzone variant, but this one checks both. Wouldn't it be correct to just do a ip == get_syscall_entry_point()?

Comment on lines +306 to 310
// When using the rewriter backend, the shim's mmap hook handles
// syscall patching at runtime — no audit library needed.
match cli_args.interception_backend {
InterceptionBackend::Rewriter => {
#[cfg(not(target_arch = "x86_64"))]
eprintln!("WARN: litebox_rtld_audit not currently supported on non-x86_64 arch");
#[cfg(target_arch = "x86_64")]
in_mem.with_root_privileges(|fs| {
let rwxr_xr_x = Mode::RWXU | Mode::RGRP | Mode::XGRP | Mode::ROTH | Mode::XOTH;
let _ = fs.mkdir("/lib", rwxr_xr_x);
let fd = fs
.open(
"/lib/litebox_rtld_audit.so",
litebox::fs::OFlags::WRONLY | litebox::fs::OFlags::CREAT,
rwxr_xr_x,
)
.expect("Failed to create /lib/litebox_rtld_audit.so");
fs.initialize_primarily_read_heavy_file(
&fd,
include_bytes!(concat!(env!("OUT_DIR"), "/litebox_rtld_audit.so")).into(),
);
fs.close(&fd)
.expect("Failed to close /lib/litebox_rtld_audit.so");
});
}
InterceptionBackend::Seccomp => {
// No need to include rtld_audit.so for seccomp backend
}
InterceptionBackend::Rewriter | InterceptionBackend::Seccomp => {}
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Superfluous code left in

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Afaict, the shim should not be doing the rewriting, that should be the platform's job. The shim is in charge of the actual handling of syscalls, the platform is in charge of making sure that syscalls actually show up to the shim. allocate_pages/update_permissions from the platform traits are probably the correct places to hook in for the platform to trigger things. If you want to keep shared pieces together across platforms, then litebox_common_linux is probably where the common bits should sit. The actual invocation of the rewriter is not a shim-level concern, imho.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to this above comment, I have not done a proper review of the exact code in this file, since I think it should (fully, or at least mostly) be left untouched by this PR. I will look at the migrated code after we're done updating it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the elf.rs thing, nothing about elf patching should really be showing up here, right?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to elf.rs

Comment on lines +199 to +202
Ok(()) | Err(litebox_common_linux::loader::ElfParseError::UnpatchedBinary) => {
// Unpatched binary is expected in the LVBS scenario where not
// all binaries are rewritten. Proceed without a trampoline.
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity: how is OPTEE-on-Linux supposed to work for unpatched binaries?

seq-macro = "0.3"
ringbuf = { version = "0.4.8", default-features = false, features = ["alloc"] }
zerocopy = { version = "0.8", default-features = false, features = ["derive"] }
litebox_syscall_rewriter = { version = "0.1.0", path = "../litebox_syscall_rewriter", default-features = false }
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment at elf.rs, the shim itself should not need to even be aware of the syscall rewriter.

@jaybosamiya-ms jaybosamiya-ms added the expmt:shadow-kiln Tag to quickly find the different PRs as part of the "shadow kiln" experiment. label Apr 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

expmt:shadow-kiln Tag to quickly find the different PRs as part of the "shadow kiln" experiment.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants