Skip to content

fix: remove pre-installed Rust stable toolchain to stabilize CI cache keys#4511

Merged
lwshang merged 3 commits intomasterfrom
lwshang/fix-ci-cache
Mar 27, 2026
Merged

fix: remove pre-installed Rust stable toolchain to stabilize CI cache keys#4511
lwshang merged 3 commits intomasterfrom
lwshang/fix-ci-cache

Conversation

@lwshang
Copy link
Copy Markdown
Contributor

@lwshang lwshang commented Mar 26, 2026

Summary

  • Remove the pre-installed Rust stable toolchain before setup-rust-toolchain runs in all cached CI workflows (unit, lint, e2e, release)
  • Fix icx-asset ls dropping output beyond 128 assets by blocking on a full async log channel instead of silently dropping messages

Why

CI cache instability

The Swatinem/rust-cache action (used by setup-rust-toolchain) hashes all installed Rust toolchains into the cache environment key. GitHub's runner fleet — especially macOS — has a mix of runner image versions where different VMs ship with different pre-installed Rust stable versions (e.g. 1.93.1 vs 1.94.0). Which VM a job gets is random, so the environment hash becomes non-deterministic:

Run Platform Pre-installed stable Env hash Cache result
master e2e macOS ARM 1.93.1 5b4abfef saved
PR e2e macOS ARM 1.94.0 a57e084d miss
master e2e Ubuntu x64 1.94.0 35ab6fff saved
PR e2e Ubuntu x64 1.94.0 35ab6fff hit

Ubuntu happened to be consistent across VMs, but this is fragile and could break on the next runner image rollout.

By removing the stable toolchain before the cache key is computed, only the project toolchain (1.88.0 from rust-toolchain.toml) remains, making the environment hash deterministic regardless of runner image version.

icx-asset ls truncated output

The icx-asset ls command logs each asset via slog_async::Async, which has a default channel capacity of 128. Two issues combined to truncate output:

  1. Dropped messages: The default OverflowStrategy::DropAndReport silently drops messages when the channel is full. With 151 assets logged in a tight loop, messages 129+ were dropped before ever entering the channel.
  2. Unflushed channel: The process could exit before the background logging thread drained queued messages.

Fixed by switching to OverflowStrategy::Block (sender waits when channel is full) and using build_with_guard() (joins the background thread on drop to flush remaining messages).

Test plan

  • Verify CI cache hits on macOS runners after the first run populates the cache
  • Verify no regressions on Ubuntu runners
  • Verify icx-asset e2e test passes consistently with all 151 assets listed

🤖 Generated with Claude Code

lwshang and others added 3 commits March 26, 2026 13:48
… keys

The macOS runner fleet has a mix of runner image versions with different
pre-installed Rust stable versions (e.g. 1.93.1 vs 1.94.0). Since
Swatinem/rust-cache hashes all installed toolchains into the cache
environment key, the hash becomes non-deterministic and causes constant
cache misses on macOS.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The `icx-asset ls` command used `slog_async::Async` with a default channel
capacity of 128. When listing >128 assets, the process could exit before
the background logging thread drained all messages, causing flaky test
failures where only 128 of 151 assets appeared in output.

Use `build_with_guard()` so the AsyncGuard joins the background thread on
drop, ensuring all log lines are flushed before the process exits.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The previous flush-on-exit fix (AsyncGuard) was necessary but not
sufficient. slog_async defaults to OverflowStrategy::DropAndReport,
which silently drops messages when the 128-capacity channel is full.
The `icx-asset ls` command fires 151 info!() calls in a tight loop,
so messages 129+ were dropped before entering the channel — no amount
of flushing can recover them.

Switch to OverflowStrategy::Block so the sender waits when the channel
is full, guaranteeing all log lines are delivered.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@lwshang lwshang marked this pull request as ready for review March 26, 2026 18:55
@lwshang lwshang requested a review from a team as a code owner March 26, 2026 18:55
@lwshang lwshang enabled auto-merge (squash) March 26, 2026 18:55
@lwshang lwshang merged commit 1ea80b5 into master Mar 27, 2026
115 checks passed
@lwshang lwshang deleted the lwshang/fix-ci-cache branch March 27, 2026 08:38
lwshang added a commit to dfinity/cdk-rs that referenced this pull request Mar 27, 2026
GitHub runners ship with varying stable Rust versions that pollute the
rust-cache environment hash, causing non-deterministic cache keys and
constant cache misses. Remove the pre-installed stable toolchain before
setup-rust-toolchain in all jobs that use caching.

See also: dfinity/sdk#4511

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
lwshang added a commit to dfinity/cdk-rs that referenced this pull request Mar 30, 2026
…-types (#704)

* fix(ci): remove pre-installed stable toolchain to fix rust-cache misses

GitHub runners ship with varying stable Rust versions that pollute the
rust-cache environment hash, causing non-deterministic cache keys and
constant cache misses. Remove the pre-installed stable toolchain before
setup-rust-toolchain in all jobs that use caching.

See also: dfinity/sdk#4511

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore(ci): bump actions/checkout from v4 to v6

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: remove ic-management-canister-types from workspace

The crate has been migrated to the dfinity/ic repo. Use it as an
external crates.io dependency instead of a local workspace member.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* ci: add publish workflow for trusted publishing to crates.io

Uses OIDC-based authentication via rust-lang/crates-io-auth-action.
Each crate has a toggle input, ordered by dependency graph. ic-cdk and
ic-cdk-macros share a single toggle and are published together.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: fmt

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants