Skip to content

Commit ef72d77

Browse files
authored
fix(ci): sandbox coreutils drift generation (#1611)
## What Hardens the coreutils argument-drift path so third-party generated Rust is not executed with repository write credentials. ## Why The weekly/manual drift workflow consumes `uutils/coreutils`, generates Rust from `uu_app()`, then builds and tests bashkit. A malicious upstream `uu_app()` could previously preserve executable statements in generated command builders and run them in a job with `contents: write` and `pull-requests: write`. ## How - Add args-mode validation in `bashkit-coreutils-port` so `uu_app()` must be a single clap `Command::new(...)` builder chain and rejects executable prefix statements/block control flow before emission. - Add regression fixtures proving a write+abort payload is rejected and a normal builder expression is accepted. - Split `.github/workflows/coreutils-args-drift.yml` into read-only regeneration/test and write-scoped PR creation jobs; read-only checkouts disable persisted credentials. - Document TM-INF-025 in the threat model and coreutils args-port spec. ## Risk - Low / Medium / High: Medium - What can break: future upstream uutils `uu_app()` shapes that are not simple builder chains will fail codegen and require an explicit generator update instead of silently emitting executable Rust. ## Checklist - [x] Tests added or updated - [x] Backward compatibility considered Verified: - `cargo test -p bashkit-coreutils-port` - `cargo test -p bashkit --test spec_tests bash_spec_tests` - `ruby -e 'require "yaml"; YAML.load_file(".github/workflows/coreutils-args-drift.yml"); puts "ok"'` - `git diff --check` - `just pre-pr`
1 parent 6931716 commit ef72d77

4 files changed

Lines changed: 280 additions & 13 deletions

File tree

.github/workflows/coreutils-args-drift.yml

Lines changed: 64 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,11 @@ name: Coreutils Args Drift
1111
# The PR's intermediate commits are bot-authored (this is automated drift
1212
# detection, not a code change). Squash-merge it as a human so the merge
1313
# commit lands under a real identity per AGENTS.md commit-attribution rules.
14+
#
15+
# Security boundary: uutils source is third-party input. Regeneration,
16+
# build, and tests run without repository write permission and without
17+
# persisted checkout credentials. Only the later PR job gets write
18+
# permission, and it only commits generated files after tests pass.
1419

1520
on:
1621
schedule:
@@ -19,8 +24,7 @@ on:
1924
workflow_dispatch:
2025

2126
permissions:
22-
contents: write
23-
pull-requests: write
27+
contents: read
2428

2529
env:
2630
CARGO_TERM_COLOR: always
@@ -31,17 +35,27 @@ jobs:
3135
name: Regenerate uutils argument surfaces
3236
runs-on: ubuntu-latest
3337
timeout-minutes: 20
38+
permissions:
39+
contents: read
40+
outputs:
41+
drifted: ${{ steps.drift.outputs.drifted }}
42+
pinned: ${{ steps.rev.outputs.pinned }}
43+
rev: ${{ steps.regen.outputs.rev }}
44+
utils: ${{ steps.regen.outputs.utils }}
45+
modules: ${{ steps.regen_modules.outputs.modules }}
3446
steps:
3547
- name: Checkout bashkit
3648
uses: actions/checkout@v6
3749
with:
3850
path: bashkit
51+
persist-credentials: false
3952

4053
- name: Checkout uutils/coreutils
4154
uses: actions/checkout@v6
4255
with:
4356
repository: uutils/coreutils
4457
path: uutils
58+
persist-credentials: false
4559
# Full history so we can checkout the pinned rev below; depth=1
4660
# would only have HEAD.
4761
fetch-depth: 0
@@ -173,6 +187,47 @@ jobs:
173187
BASHKIT_RUN_COREUTILS_DIFF: '1'
174188
run: cargo test -p bashkit --test coreutils_differential_tests
175189

190+
- name: Check for generated drift
191+
id: drift
192+
working-directory: bashkit
193+
run: |
194+
set -euo pipefail
195+
if [ -z "$(git status --porcelain -- crates/bashkit/src/builtins/generated)" ]; then
196+
echo "drifted=false" >> "$GITHUB_OUTPUT"
197+
else
198+
echo "drifted=true" >> "$GITHUB_OUTPUT"
199+
fi
200+
201+
- name: Upload generated drift
202+
if: steps.drift.outputs.drifted == 'true'
203+
uses: actions/upload-artifact@v4
204+
with:
205+
name: coreutils-generated-drift
206+
path: bashkit/crates/bashkit/src/builtins/generated/
207+
if-no-files-found: error
208+
209+
open-pr:
210+
name: Open drift pull request
211+
needs: regenerate
212+
if: needs.regenerate.outputs.drifted == 'true'
213+
runs-on: ubuntu-latest
214+
timeout-minutes: 10
215+
permissions:
216+
contents: write
217+
pull-requests: write
218+
steps:
219+
- name: Checkout bashkit
220+
uses: actions/checkout@v6
221+
with:
222+
path: bashkit
223+
persist-credentials: false
224+
225+
- name: Download generated drift
226+
uses: actions/download-artifact@v4
227+
with:
228+
name: coreutils-generated-drift
229+
path: bashkit/crates/bashkit/src/builtins/generated/
230+
176231
- name: Open pull request if drifted
177232
uses: peter-evans/create-pull-request@v7
178233
with:
@@ -186,19 +241,19 @@ jobs:
186241
chore(builtins): sync uutils/coreutils argument surfaces and vendored modules
187242
188243
Regenerated by `bashkit-coreutils-port` against
189-
uutils/coreutils@${{ steps.regen.outputs.rev }}.
244+
uutils/coreutils@${{ needs.regenerate.outputs.rev }}.
190245
191-
Utilities: ${{ steps.regen.outputs.utils }}
192-
Vendored modules: ${{ steps.regen_modules.outputs.modules }}
246+
Utilities: ${{ needs.regenerate.outputs.utils }}
247+
Vendored modules: ${{ needs.regenerate.outputs.modules }}
193248
body: |
194249
Automated regeneration of argument surfaces and vendored
195250
modules ported from
196251
[uutils/coreutils](https://github.com/uutils/coreutils).
197252
198-
**Source revision:** `${{ steps.regen.outputs.rev }}`
199-
(was `${{ steps.rev.outputs.pinned }}`)
200-
**Utilities:** ${{ steps.regen.outputs.utils }}
201-
**Vendored modules:** ${{ steps.regen_modules.outputs.modules }}
253+
**Source revision:** `${{ needs.regenerate.outputs.rev }}`
254+
(was `${{ needs.regenerate.outputs.pinned }}`)
255+
**Utilities:** ${{ needs.regenerate.outputs.utils }}
256+
**Vendored modules:** ${{ needs.regenerate.outputs.modules }}
202257
**Trigger:** `.github/workflows/coreutils-args-drift.yml`
203258
204259
This PR bumps `UUTILS_REVISION` in

crates/bashkit-coreutils-port/src/args.rs

Lines changed: 197 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,9 @@ use anyhow::{Context, Result, anyhow, bail};
2929
use proc_macro2::TokenStream;
3030
use quote::quote;
3131
use syn::visit_mut::{self, VisitMut};
32-
use syn::{Expr, ExprCall, ExprMacro, Item, ItemFn, ItemImpl, ItemMod, LitStr, Type, parse_quote};
32+
use syn::{
33+
Expr, ExprCall, ExprMacro, Item, ItemFn, ItemImpl, ItemMod, LitStr, Stmt, Type, parse_quote,
34+
};
3335

3436
pub fn run(uutils_dir: &Path, util: &str, rev: &str) -> Result<String> {
3537
let src_path = uutils_dir
@@ -136,6 +138,8 @@ pub fn run(uutils_dir: &Path, util: &str, rev: &str) -> Result<String> {
136138
);
137139
}
138140

141+
validate_uu_app_body(&uu_app)?;
142+
139143
let cmd_fn_name = syn::Ident::new(&format!("{util}_command"), proc_macro2::Span::call_site());
140144
uu_app.sig.ident = cmd_fn_name.clone();
141145
let uu_app_block = &uu_app.block;
@@ -638,6 +642,109 @@ fn collect_option_constants(file: &syn::File) -> Vec<Item> {
638642
.collect()
639643
}
640644

645+
// THREAT[TM-INF-025]: args-mode consumes third-party Rust from uutils.
646+
// Only a single clap builder expression may become executable bashkit
647+
// code; prefix statements would run during CI/parser construction.
648+
fn validate_uu_app_body(uu_app: &ItemFn) -> Result<()> {
649+
let command_expr = match uu_app.block.stmts.as_slice() {
650+
[Stmt::Expr(expr, None)] => expr,
651+
stmts => bail!(
652+
"unsafe uu_app body: expected a single clap Command builder expression, found {} statements",
653+
stmts.len()
654+
),
655+
};
656+
657+
let mut validator = UuAppExprValidator { errors: vec![] };
658+
syn::visit::Visit::visit_expr(&mut validator, command_expr);
659+
if !validator.errors.is_empty() {
660+
bail!("unsafe uu_app body: {}", validator.errors.join("; "));
661+
}
662+
if !chain_roots_at_command_new(command_expr) {
663+
bail!("unsafe uu_app body: tail expression must be a clap Command::new(...) builder chain");
664+
}
665+
Ok(())
666+
}
667+
668+
struct UuAppExprValidator {
669+
errors: Vec<String>,
670+
}
671+
672+
impl<'ast> syn::visit::Visit<'ast> for UuAppExprValidator {
673+
fn visit_expr_block(&mut self, node: &'ast syn::ExprBlock) {
674+
self.errors.push(format!(
675+
"block expression is not allowed in command builder: {}",
676+
quote::quote!(#node)
677+
));
678+
}
679+
680+
fn visit_expr_unsafe(&mut self, node: &'ast syn::ExprUnsafe) {
681+
self.errors.push(format!(
682+
"unsafe block is not allowed in command builder: {}",
683+
quote::quote!(#node)
684+
));
685+
}
686+
687+
fn visit_expr_async(&mut self, node: &'ast syn::ExprAsync) {
688+
self.errors.push(format!(
689+
"async block is not allowed in command builder: {}",
690+
quote::quote!(#node)
691+
));
692+
}
693+
694+
fn visit_expr_loop(&mut self, node: &'ast syn::ExprLoop) {
695+
self.errors.push(format!(
696+
"loop is not allowed in command builder: {}",
697+
quote::quote!(#node)
698+
));
699+
}
700+
701+
fn visit_expr_while(&mut self, node: &'ast syn::ExprWhile) {
702+
self.errors.push(format!(
703+
"while is not allowed in command builder: {}",
704+
quote::quote!(#node)
705+
));
706+
}
707+
708+
fn visit_expr_for_loop(&mut self, node: &'ast syn::ExprForLoop) {
709+
self.errors.push(format!(
710+
"for loop is not allowed in command builder: {}",
711+
quote::quote!(#node)
712+
));
713+
}
714+
715+
fn visit_expr_match(&mut self, node: &'ast syn::ExprMatch) {
716+
self.errors.push(format!(
717+
"match is not allowed in command builder: {}",
718+
quote::quote!(#node)
719+
));
720+
}
721+
}
722+
723+
fn chain_roots_at_command_new(mut expr: &Expr) -> bool {
724+
while let Expr::MethodCall(mc) = expr {
725+
expr = &mc.receiver;
726+
}
727+
728+
let Expr::Call(call) = expr else {
729+
return false;
730+
};
731+
path_ends_with_command_new(&call.func)
732+
}
733+
734+
fn path_ends_with_command_new(func: &Expr) -> bool {
735+
let Expr::Path(p) = func else {
736+
return false;
737+
};
738+
let segs = path_segments(&p.path);
739+
let last_two: Vec<&str> = segs
740+
.iter()
741+
.rev()
742+
.take(2)
743+
.map(String::as_str)
744+
.collect::<Vec<_>>();
745+
matches!(last_two.as_slice(), ["new", "Command"])
746+
}
747+
641748
fn find_fn<'a>(file: &'a syn::File, name: &str) -> Option<&'a ItemFn> {
642749
file.items.iter().find_map(|it| match it {
643750
Item::Fn(f) if f.sig.ident == name => Some(f),
@@ -876,3 +983,92 @@ fn matches_shortcut_value_parser(segs: &[String]) -> bool {
876983
.collect::<Vec<_>>();
877984
matches!(last_two.as_slice(), ["new", "ShortcutValueParser"])
878985
}
986+
987+
#[cfg(test)]
988+
mod tests {
989+
use super::*;
990+
use std::fs;
991+
use tempfile::TempDir;
992+
993+
fn fixture(files: &[(&str, &str)]) -> (TempDir, PathBuf) {
994+
let tmp = tempfile::tempdir().unwrap();
995+
let uutils = tmp.path().join("uutils");
996+
for (rel, content) in files {
997+
let path = uutils.join(rel);
998+
fs::create_dir_all(path.parent().unwrap()).unwrap();
999+
fs::write(path, content).unwrap();
1000+
}
1001+
(tmp, uutils)
1002+
}
1003+
1004+
#[test]
1005+
fn rejects_executable_statements_in_uu_app() {
1006+
let (_tmp, uutils) = fixture(&[
1007+
(
1008+
"src/uu/cat/src/cat.rs",
1009+
r#"
1010+
mod options {
1011+
pub static FILE: &str = "file";
1012+
}
1013+
1014+
pub fn uu_app() -> clap::Command {
1015+
std::fs::write("coreutils-port-poc", b"owned").unwrap();
1016+
std::process::abort();
1017+
Command::new("cat").arg(Arg::new(options::FILE))
1018+
}
1019+
"#,
1020+
),
1021+
("src/uu/cat/locales/en-US.ftl", ""),
1022+
]);
1023+
1024+
let err = run(&uutils, "cat", "poc").unwrap_err();
1025+
let msg = format!("{err:#}");
1026+
assert!(msg.contains("unsafe uu_app"), "got: {msg}");
1027+
}
1028+
1029+
#[test]
1030+
fn accepts_single_command_builder_expression() {
1031+
let (_tmp, uutils) = fixture(&[
1032+
(
1033+
"src/uu/cat/src/cat.rs",
1034+
r#"
1035+
mod options {
1036+
pub static FILE: &str = "file";
1037+
}
1038+
1039+
pub fn uu_app() -> clap::Command {
1040+
Command::new("cat").arg(Arg::new(options::FILE))
1041+
}
1042+
"#,
1043+
),
1044+
("src/uu/cat/locales/en-US.ftl", ""),
1045+
]);
1046+
1047+
let body = run(&uutils, "cat", "poc").unwrap();
1048+
assert!(body.contains("pub fn cat_command() -> clap::Command"));
1049+
assert!(body.contains("Command::new(\"cat\")"));
1050+
}
1051+
1052+
#[test]
1053+
fn rejects_non_command_tail_expression() {
1054+
let (_tmp, uutils) = fixture(&[
1055+
(
1056+
"src/uu/cat/src/cat.rs",
1057+
r#"
1058+
mod options {
1059+
pub static FILE: &str = "file";
1060+
}
1061+
1062+
pub fn uu_app() -> clap::Command {
1063+
std::process::abort()
1064+
}
1065+
"#,
1066+
),
1067+
("src/uu/cat/locales/en-US.ftl", ""),
1068+
]);
1069+
1070+
let err = run(&uutils, "cat", "poc").unwrap_err();
1071+
let msg = format!("{err:#}");
1072+
assert!(msg.contains("Command::new"), "got: {msg}");
1073+
}
1074+
}

specs/coreutils-args-port.md

Lines changed: 18 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,13 @@ codegen**, not by depending on `uu_*` crates at runtime.
4444
empty). Per-builtin opt-in: a builtin chooses whether to wire
4545
through the shim — if it does, every uutils env-default
4646
auto-lights as that option's bashkit support lands.
47-
5. Emits a generated file under
47+
5. Validates the rewritten `uu_app()` before emission: args mode accepts
48+
only a single tail clap `Command` builder expression. Prefix
49+
statements, block expressions, loops, matches, async blocks, and
50+
unsafe blocks are rejected before any generated Rust is written.
51+
This keeps third-party uutils source from smuggling arbitrary
52+
executable statements into `<util>_command()` (TM-INF-025).
53+
6. Emits a generated file under
4854
`crates/bashkit/src/builtins/generated/<util>_args.rs` with a clean
4955
`pub fn <util>_command() -> clap::Command`.
5056

@@ -297,8 +303,17 @@ and on `workflow_dispatch`. It:
297303
4. Verifies bashkit still builds and the cat/tac spec tests pass.
298304
5. Builds the uutils multicall from the same checkout and runs the
299305
differential harness with `BASHKIT_RUN_COREUTILS_DIFF=1`.
300-
6. Opens a PR with the regenerated files + bumped pin if `git diff` is
301-
non-empty.
306+
6. Uploads generated files as an artifact if `git diff` is non-empty.
307+
7. A separate `open-pr` job downloads that artifact and opens a PR with
308+
the regenerated files + bumped pin.
309+
310+
Security boundary: the regeneration/build/test job treats uutils as
311+
third-party input. It has only `contents: read` permission and both
312+
checkouts set `persist-credentials: false`, so generated Rust can be
313+
compiled and exercised without a repository write token. The later
314+
`open-pr` job has `contents: write`/`pull-requests: write`, but it does
315+
not build or execute the generated code; it only commits the already
316+
tested generated files.
302317

303318
The PR's intermediate commits are bot-authored (this is automated drift
304319
detection, not a code change). Maintainers must **squash-merge as a human**

specs/threat-model.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -439,6 +439,7 @@ All execution stays within the virtual interpreter — no OS subprocess is spawn
439439
| TM-INF-022 | Library Debug shapes leak via stderr | Builtins wrapping external libs (jaq, regex, serde_json, semver, chrono, …) format errors with `format!("{:?}", e)` and dump internal struct shapes into stderr — e.g. jq printed `(File { code: "<800 chars of prepended compat-defs source>", path: () }, [("@tsv", Filter(0))])` to LLM agents | Three-layer enforcement, all run by `cargo test`: (1) static — `builtins::tests::no_debug_fmt_in_builtin_source` walks every `crates/bashkit/src/builtins/*.rs` and forbids `{:?}` / `{:#?}` / `{name:?}` (per-line opt-out: `// debug-ok: <reason>`); (2) dynamic per-tool — each tool's `mod tests` calls `bashkit::testing::assert_no_leak` with malformed inputs; (3) fuzz — every cargo-fuzz target plus 5 proptest cases call `bashkit::testing::assert_fuzz_invariants` against random input and check the same invariants on stderr/stdout | **FIXED** |
440440
| TM-INF-023 | jq `halt` / `halt_error` terminate the host process | jaq-std 3.0's `halt(N)` native calls `std::process::exit(exit_code)`. `defs.jq` wraps it with `def halt: halt(0);` and `def halt_error(...): ..., halt(...);`. Untrusted jq filters can invoke `halt` / `halt_error` to exit the entire embedding process, escaping the `ExecResult`-based sandbox boundary and causing process-wide DoS for any caller hosting bashkit. | Strip the upstream `halt` native from `jaq_std::funs::<D>()` and chain in a safe replacement that pops the variable argument and returns `Error::str("halt is disabled in the bashkit sandbox")`. The wrapper defs (`halt`, `halt_error`) still resolve, so callers see a normal jq runtime error (exit 5) instead of the host being killed. Regression tests in `builtins::jq::tests`: `halt_does_not_terminate_host_process`, `halt_with_arg_does_not_terminate_host_process`, `halt_error_does_not_terminate_host_process`. | **FIXED** |
441441
| TM-INF-024 | Host env side-channel via clap `Arg::env(...)` | Clap-based builtins ported from uutils (currently `ls`, with `.env("TABSIZE")` and `.env("TIME_STYLE")` on `uu_app()`) resolve those defaults from `std::env`, not bashkit's `ctx.env`. This breaks the sandbox boundary two ways: (1) presence-probe — a script can tell whether the host process has `TIME_STYLE`/`TABSIZE` set by observing `ls`'s behaviour; (2) availability — a host- or container-set `TIME_STYLE` materialises as a clap value source for an option bashkit hasn't implemented yet, so the unsupported-option gate fires on every plain `ls` and breaks the builtin for unrelated tenants | Four-layer fix: (1) **codegen strips runtime `.env(...)` and harvests it** — `bashkit-coreutils-port` rewrites `.env("FOO")` chained method calls out of every Arg builder it ports AND emits a sidecar `<UTIL>_ENV_DEFAULTS: &[EnvDefault]` table recording each stripped annotation; (2) **virtual-env shim** — `crate::builtins::clap_env::apply_env_defaults` rewrites argv from `ctx.env` (never `std::env`) before `try_get_matches_from`, emulating clap's "argv > env > default" precedence so uutils' env-default UX is preserved without re-opening the host channel; (3) **static guards** — `no_clap_env_in_generated_parsers` forbids runtime `.env(` calls in `generated/*.rs`, and `every_generated_parser_emits_env_defaults_table` enforces every util emits the sidecar (uniform surface, no per-util conditionals); (4) **defence-in-depth** — workspace `clap` dep drops the `env` cargo feature, so a re-introduced `.env(...)` fails to compile rather than silently re-opening the channel. Coverage: ls runtime tests pin both directions — `ls_ignores_host_time_style_and_tabsize` (host `std::env` ignored) and `ls_honors_virtual_env_time_style` (virtual `ctx.env` honoured) | **FIXED** |
442+
| TM-INF-025 | Untrusted generated Rust runs in drift CI with repository write token | The coreutils argument-drift workflow consumes third-party `uutils/coreutils` source, regenerates Rust from `uu_app()`, then builds/tests bashkit. If the generator preserved arbitrary upstream statements and the same job held `contents: write`/`pull-requests: write`, malicious upstream code could execute during parser construction with repository write impact. | Two-layer fix: (1) `bashkit-coreutils-port` validates args-mode `uu_app()` before emission and rejects anything other than a single tail clap `Command` builder expression; regression test `rejects_executable_statements_in_uu_app` proves write+abort payloads fail codegen. (2) `.github/workflows/coreutils-args-drift.yml` separates privilege: regeneration/build/test has only `contents: read` and `persist-credentials: false` on both checkouts; the later `open-pr` job gets write permission but does not build or execute generated Rust, only commits the tested artifact. | **FIXED** |
442443

443444
**TM-INF-022**: Generalizes TM-INF-016 to cover the whole builtin surface.
444445
The originating bug was `builtins/jq/` formatting jaq's compile/parse

0 commit comments

Comments
 (0)