Skip to content

Vendor uucore::format and migrate printf.rs to use it #1534

@chaliy

Description

@chaliy

Background

First consumer of the module-vendor mode tool capability tracked in #1533.

crates/bashkit/src/builtins/printf.rs is ~766 LoC, of which roughly 600 are a handwritten format-specifier parser. uutils has a battle-tested, fuzzed equivalent in src/uucore/src/lib/features/format/. Verification confirmed:

  • The format/ module's imports are platform-clean: std, bigdecimal, num-traits, unit-prefix, os_display, and a handful of uucore-internal types (extendedbigdecimal, UError, NonUtf8OsStrError). No rustix, fluent, or icu_locale come from inside format/.
  • FormatError's Display impl uses Display-only formatting (no {:?}) — TM-INF-022 clean by inspection.
  • format/ has no width/precision DoS caps: format/num_format.rs calls String::with_capacity(precision) and iter::repeat_n('0', N) without bounds. A wrapper enforcing the existing MAX_FORMAT_WIDTH = 10_000 cap is mandatory.

This issue is the end-to-end use of #1533: register format/ in the vendoring manifest, generate the module, migrate printf.rs, prove the build matrix.

Acceptance criteria

  • Depends on Add module-vendor mode to bashkit-coreutils-port #1533 (module-vendor mode merged first).
  • format/ registered in the vendoring manifest introduced by Add module-vendor mode to bashkit-coreutils-port #1533, pinned to the same uutils revision the args codegen currently uses.
  • crates/bashkit/src/builtins/generated/format/ produced by port-module, committed to the repo.
  • Internal-type substitutions documented in the manifest:
    • extendedbigdecimal → inline (small file)
    • UError → replace with bashkit's crate::error::Error or stripped if only used for error-propagation traits we don't need
    • NonUtf8OsStrError → inline or replace with a local equivalent
  • New direct deps in crates/bashkit/Cargo.toml: bigdecimal, num-traits, unit-prefix, os_display. Versions aligned to uutils' workspace pins.
  • Width/precision wrapper:
    • Pre-validates the format string before calling sprintf. Suggested implementation: iterate parse_spec_only(...), reject any spec where width or precision exceeds MAX_FORMAT_WIDTH.
    • Wrapper lives in printf.rs (or a small helper module under builtins/) — not inside the generated format/ directory, so regenerating the module never overwrites the cap.
    • Unit tests cover: width over cap, precision over cap, asterisk-resolved width over cap (%*s with arg 999_999), nested sub-spec ignoring cap (verify wrapper still catches it).
  • crates/bashkit/src/builtins/printf.rs switches its format-spec parsing to call into super::generated::format::sprintf(...) (or chosen entry point). Handwritten format-spec parsing is removed; body shrinks to VFS/stdout glue.
  • TM-INF-022: a new unit test exercises every FormatError variant via bashkit::testing::assert_no_leak to confirm Display-only output.
  • All existing printf spec tests stay green (tests/spec_cases/bash/printf*.test.sh and any inline tests).
  • Build matrix:
    • cargo build -p bashkit --release — green.
    • cargo build -p bashkit --target wasm32-unknown-unknown — green (this is the whole point — uucore-as-runtime-dep broke this; vendor must preserve it).
    • cargo build -p bashkit --no-default-features — green.
    • cargo clippy --workspace --all-targets --all-features -- -D warnings — green.
  • Release build time on bashkit-cli does not regress beyond +20 s vs current main. The probe-measured uucore-runtime cost was +98 s; vendor-only cost should be a fraction.
  • specs/coreutils-args-port.md "Vendored modules" section (introduced by Add module-vendor mode to bashkit-coreutils-port #1533) gets a format entry with the substitution decisions used.

Verification artefacts (probe branch)

Metric Baseline uucore runtime dep Delta
bashkit-cli size 24,260,000 B 24,269,344 B +9 KB (declared, unused)
Cold release build time 1m 37s 3m 15s +98 s
wasm32-unknown-unknown build ✅ pass ❌ fail (errno) hard blocker

This issue should restore wasm ✅, deliver real format parity, and add only the four small numeric-deps' compile cost.

Out of scope

Open questions

  • Width-cap location. Inside the wrapper in printf.rs (recommended — generated module stays untouched on regen) vs as a patch applied by the codegen tool itself (more invasive, easier to forget). Recommend the wrapper.
  • Internal-type substitutions. Whether UError is needed at all once format/ is divorced from uucore's wider error machinery, or whether a tiny local Error shim is enough. Decide during the port — the manifest captures the decision.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions