Skip to content

feat(workers): migrate harness — 47 modular workers across 5 waves#66

Draft
ytallo wants to merge 47 commits intomainfrom
feat/harness-modular-workers
Draft

feat(workers): migrate harness — 47 modular workers across 5 waves#66
ytallo wants to merge 47 commits intomainfrom
feat/harness-modular-workers

Conversation

@ytallo
Copy link
Copy Markdown
Contributor

@ytallo ytallo commented May 1, 2026

Summary

Migrates the harness workspace into modular standalone workers, each self-contained per the workers repo CI contract (per-worker cargo build, no cross-worker path deps). Each worker is published independently via iii worker add <name>.

Worker dependency model

Two kinds of deps in play:

  • Worker-to-worker (iii.worker.yaml dependencies:) — the iii package manager. Resolves at iii worker add time. Used when a worker calls another worker's bus functions at runtime.
  • Rust library deps (crates/<dep>/) — vendored under each worker for cargo to build against. Used for shared types and helpers that have no bus surface (harness-types, provider-base, auth-credentials (as a type-only re-export source), sandbox-helpers).

Where a vendored crate had a bus equivalent (e.g. models-catalog, session-tree), the bus-call path is preferred and the vendor dropped — see context-compaction for the cleanest example.

Workers

Worker Purpose yaml deps
audit-log Append-only JSON-lines audit log of every tool call + result on agent::after_tool_call.
auth-credentials Provider credential vault under auth::* — API keys and OAuth tokens. Defaults to iii-state-backed storage; opt into in-memory via AUTH_CREDENTIALS_STORE=memory.
auth-rbac HMAC API keys and workspace roles (owner/admin/member/viewer) under auth::rbac::*.
context-compaction Subscriber to agent::events that triggers session compaction once context-window thresholds are reached. models-catalog, session-tree
dlp-scrubber Hook subscriber on agent::after_tool_call that redacts common secret shapes (AWS/OpenAI/GitHub/Stripe/Google).
document-extract PDF/Word text extraction under document::extract for agent context ingestion.
guardrails Local heuristics for PII, leaked API keys, jailbreak keywords, and toxicity under guardrails::*.
hook-fanout Reusable publish-collect primitive under hooks::publish_collect — fans an event to subscribers and collects responses.
llm-budget Workspace + agent LLM spend caps with alerts, forecast, and period rollover under budget::*.
models-catalog Model capabilities knowledge base under models::* (list/get/supports/register).
oauth-anthropic Anthropic Claude Pro/Max OAuth (PKCE localhost flow) under oauth::anthropic::*.
oauth-github-copilot GitHub Copilot OAuth (device-code flow) under oauth::github_copilot::*.
oauth-google-antigravity Google Antigravity OAuth (PKCE localhost flow) under oauth::google_antigravity::*.
oauth-google-gemini-cli Google Gemini CLI OAuth (PKCE localhost flow) under oauth::google_gemini_cli::*.
oauth-openai-codex OpenAI Codex OAuth (PKCE localhost flow) under oauth::openai_codex::*.
policy-denylist Hook subscriber on agent::before_tool_call that blocks calls whose name matches a configured denylist.
provider-anthropic Native Anthropic Messages API streaming provider under provider::anthropic::*. auth-credentials
provider-azure-openai Azure OpenAI Responses provider under provider::azure-openai::*. auth-credentials
provider-bedrock AWS Bedrock provider under provider::bedrock::*. (stub today; emits a not-implemented error.) auth-credentials
provider-cerebras OpenAI-compatible provider under provider::cerebras::*. auth-credentials
provider-cli Wrap installed coding CLIs (claude, codex, opencode, openclaw, hermes, pi, gemini, cursor-agent) as iii providers. shell-bash
provider-deepseek OpenAI-compatible provider under provider::deepseek::*. auth-credentials
provider-fireworks OpenAI-compatible provider under provider::fireworks::*. auth-credentials
provider-google Google Gemini provider under provider::google::*. auth-credentials
provider-google-vertex Vertex AI Gemini provider under provider::google-vertex::*. auth-credentials
provider-groq OpenAI-compatible provider under provider::groq::*. auth-credentials
provider-huggingface OpenAI-compatible provider under provider::huggingface::*. auth-credentials
provider-kimi-coding OpenAI-compatible provider under provider::kimi-coding::*. auth-credentials
provider-minimax OpenAI-compatible provider under provider::minimax::*. auth-credentials
provider-mistral OpenAI-compatible provider under provider::mistral::*. auth-credentials
provider-openai OpenAI Chat Completions provider under provider::openai::*. auth-credentials
provider-openai-responses OpenAI Responses API provider under provider::openai-responses::*. auth-credentials
provider-opencode-go OpenAI-compatible provider under provider::opencode-go::*. auth-credentials
provider-opencode-zen OpenAI-compatible provider under provider::opencode-zen::*. auth-credentials
provider-openrouter OpenAI-compatible provider under provider::openrouter::*. auth-credentials
provider-router router::stream_assistant provider router plus router::abort and router::push_steering / push_followup helpers. session-inbox, llm-budget
provider-vercel-ai-gateway OpenAI-compatible provider under provider::vercel-ai-gateway::*. auth-credentials
provider-xai OpenAI-compatible provider under provider::xai::*. auth-credentials
provider-zai OpenAI-compatible provider under provider::zai::*. auth-credentials
session-corpus Dataset publishing pipeline for completed sessions — secret scanning, redaction, review, and publish.
session-inbox Per-session durable inboxes under inbox::* — push, drain (atomic), peek backed by iii state.
session-tree Session storage as a parent-id tree of typed entries under session::*. Defaults to iii-state-backed storage with scope-per-session layout; opt into in-memory via SESSION_TREE_STORE=memory.
shell-bash Sandboxed shell execution under shell::bash::* — wraps the engine sandbox::exec primitive with no host fall-through.
shell-filesystem Sandboxed filesystem operations under shell::fs::* — read, write, list, stat, glob.
subagent Spawn child agent sessions under subagent::start via run::start_and_wait. turn-orchestrator
turn-orchestrator Durable run::start state machine driving each agent turn through provisioning, assistant, tools, steering, completion. session-inbox, hook-fanout, provider-router

iii-state-backed storage for auth-credentials and session-tree

Both workers previously shipped only an in-memory store; restart meant data loss. This branch adds iii-state-backed adapters and flips the default away from in-memory.

  • auth-credentialsIiiStateCredentialStore over state::* (scope auth_credentials, key credential:<provider>, value wraps { provider, credential } so list() recovers provider names from state::list). Trait now returns anyhow::Result<...>. An IIITrigger abstraction enables unit tests with a mock; a trait-parity suite runs every test against both impls. Default selected via AUTH_CREDENTIALS_STORE (iii_state default, memory opt-in; unknown values warn and fall back to default).
  • session-treeIiiStateSessionStore with scope-per-session layout: scope session_tree:<sid> keyed by entry-id, scope session_tree_meta keyed by session-id. Bounds state::list scan cost to entries-in-session. append() does a fatal entry write followed by a non-fatal meta updated_at refresh (logs a warn on failure, entry still persists). load_entries sorts by entry-id and returns an empty Vec for unknown sessions, matching in-memory parity. Default selected via SESSION_TREE_STORE.
  • E2E restart tests for both workers: write data, kill the worker process, restart, read back the same data.
  • Both READMEs gain a "Storage backends" section documenting env vars, persistence semantics, and failure behavior.

Renames + namespace cutover

Worker name + bus namespace renamed at every surface (no aliases — pre-production).

  • durable-queuesession-inbox. Disambiguates from the iii-queue engine builtin (job queue with retries/DLQ); this worker is a session inbox (pull, batch, on-demand). Function ids: queue::push|drain|peekinbox::push|drain|peek. Drain is now atomic: it uses state::update with Set { path: "", value: [] } and reads old_value from the response, replacing the previous state::get + state::set [] pair (concurrent producer could lose items in the gap).
  • harness-runtimeprovider-router. The README itself flagged the confusion: turn execution lives in turn-orchestrator, not here. This worker is a provider router. Function ids: agent::stream_assistant|abort|push_steering|push_followuprouter::*.
  • shell-subagentsubagent. Nothing about this worker involves a shell; it spawns agent sessions via run::start_and_wait. Function id: shell::subagent::startsubagent::start. README rewritten — dropped unregistered wait/cancel claims.

state-flag deleted; inlined as direct state::* calls

A 100-line wrapper that just normalized a session-keyed bool over state::set/state::get. Two callers (provider-router::abort, turn-orchestrator steering check); both now call iii-state directly. Key conventions hardcoded at the call sites (session/<sid>/abort_signal for abort; session/<sid>/flags/<name> otherwise — greppable). Worker deleted; dependency removed from provider-router, harness, turn-orchestrator, registry/index.json.

README drift fixes

  • shell-bash/README.md — corrected sandbox attribution (provided by the iii-worker sandbox surface, not the iii-exec builtin).
  • subagent/README.md — dropped unregistered wait/cancel rows from the function table.
  • llm-budget/README.md — reconciled function-name table with the actual register.rs (record/reset/alert_set/usage/forecast/enforce/exempt/pause).

Tests

  • Subscriber wiring + handler tests for hook subscribers (audit-log, dlp-scrubber, policy-denylist, context-compaction).
  • OAuth handler tests across all 5 OAuth workers; copilot device-code header construction extracted and tested.
  • Serde round-trip + public-API tests across all 22 providers (replaced the previous symbol-smoke test).
  • Adapter unit tests for both iii-state stores (mock IIITrigger covers happy + error paths for every method).
  • Adapter integration tests that round-trip against a live engine (gated on IIITEST_ENGINE_URL).
  • Trait-parity test suites: every behavior runs against in-memory and iii-state impls.
  • E2E restart-survival tests for auth-credentials and session-tree (gated on IIITEST_WORKER_BIN).
  • session-inbox: atomic-drain test across concurrent push + drain.
  • provider-router: handler tests covering router::abort writing the abort_signal state key directly.

Verification

Per worker, all green:

  • cargo fmt --all -- --check
  • cargo clippy --all-targets --all-features -- -D warnings
  • cargo test --all-features
  • cargo machete — no unused declared deps

pr-checks per the workers CI contract: every worker has README.md, iii.worker.yaml (parseable, with name/language/deploy/manifest), non-empty tests/, and version 0.1.0.

Trade-offs and notes

  • Vendoring duplication: harness-types, provider-base, and auth-credentials (type-only re-export) are vendored across multiple workers. The alternative — publishing those as crates.io packages — is the next follow-up.
  • provider-bedrock is a stub today (its register_with_iii returns a not-implemented error); published to reserve the namespace.
  • turn-orchestrator declares provider-router: "^0.1.0" — the runtime it routes to. Same for subagentturn-orchestrator.
  • context-compaction is the most architecturally interesting worker: it found and fixed a real bug (the previous subscribe-on-stream trigger never fired), split the reactive watcher from the proactive compactor cleanly, and uses bus calls for both models-catalog and session-tree instead of vendoring.

Skipped

  • harnessd (all-in-one daemon) — separate distribution decision; was the bundled harness/ pilot, now superseded by the modular workers above.
  • harness-cli, harness-tui — user-facing apps, not registry workers.
  • fixtures-gen, replay-test, hook-example, provider-faux — test/dev fixtures.
  • harness-types, provider-base, overflow-classify, sandbox-helpers — pure Rust libraries with no register_with_iii. Vendored inside consumers.

Test plan

  • CI green across all changed workers
  • Smoke install of one worker via iii worker add once published (e.g. auth-rbac or provider-anthropic)
  • Verify iii.worker.yaml dependencies: resolution by adding a worker that has deps (e.g. provider-anthropic should pull in auth-credentials)
  • Run provider-router end-to-end against a live engine with its consumers (session-inbox, llm-budget, plus a provider)
  • Persistence smoke: iii-cli auth set anthropic --api-key=..., restart auth-credentials worker, iii-cli auth get anthropic returns the key
  • Persistence smoke: append entries via session-tree, restart worker, load_entries returns the same set

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 1, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: dba0d7e0-f6fc-45ab-bc28-fcdfc2ba9d40

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/harness-modular-workers

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ytallo ytallo changed the title Feat/harness modular workers feat(workers): migrate harness — 47 modular workers across 5 waves May 4, 2026
ytallo added 28 commits May 5, 2026 18:52
…naries

Migrate seven leaf harness crates into root-level workers, each vendoring its
required workspace deps under <worker>/crates/ so cargo can build them in
isolation per the workers repo CI contract.

Each new worker registers its functions on the iii engine via a thin main.rs
wrapper and runs until Ctrl-C. Initial version 0.1.0 across the set; entries
added to README modules table and registry/index.json.

Workers landed in this wave:
  - auth-credentials (auth::*)
  - auth-rbac (auth::rbac::*)
  - document-extract (document::extract)
  - guardrails (guardrails::*)
  - llm-budget (budget::*)
  - models-catalog (models::*)
  - session-tree (session::*)

Verified per worker: cargo fmt --check, cargo clippy --all-targets
--all-features -- -D warnings, cargo test --all-features.

Constraint: workers repository CI runs cargo per-worker and forbids cross-
worker path dependencies, so harness library crates are vendored under each
worker's crates/ folder rather than referenced as workspace path deps.

Confidence: high

Scope-risk: medium (10 new module rows in registry; per-worker default_config
is empty pending real config schema decisions for budget/session/models)
…y main

Each Wave 1 worker had a duplicated layout where its own source lived under
crates/<worker>/ alongside an empty top-level src/. Move the worker's own
code to src/ at top level and let crates/ hold only true vendored
dependencies (harness-types). Three pure-leaf workers (auth-rbac,
llm-budget, guardrails) drop crates/ entirely.

main.rs simplified: drop the list_functions trigger wrapper and the
parse_args helper. The binary connects to III_URL (env var, default
ws://127.0.0.1:49134), calls <crate>::register_with_iii, and waits for
Ctrl-C. No iii-SDK boilerplate beyond what's strictly necessary.

Lints carried forward from the harness root so the existing source code
clippy-clean surface keeps holding (pedantic + nursery enabled with the
same allow list).

Verified per worker: cargo fmt --check, cargo clippy --all-targets
--all-features -- -D warnings, cargo test --all-features.

Confidence: high
Land four standalone workers from the harness primitives and shells groups,
following the same layout principles as Wave 1:

  - turn-orchestrator (run::start, run::start_and_wait, turn::step)
  - shell-bash (shell::bash::*, wraps engine sandbox::exec)
  - shell-filesystem (shell::fs::*)
  - shell-subagent (shell::subagent::* via run::start_and_wait)

Worker code lives at top-level src/; only true vendored Rust deps live
under crates/. Three workers vendor harness-types (and hook-fanout /
durable-queue / state-flag for turn-orchestrator; sandbox-helpers for the
two shells). shell-subagent has no vendored crates.

shell-subagent declares iii.worker.yaml dependencies on turn-orchestrator
because it calls run::start_and_wait via the bus at runtime — the iii
worker resolver pulls turn-orchestrator on `iii worker add shell-subagent`.

main.rs minimal across all four: connect via III_URL env, call
register_with_iii, wait for Ctrl-C. No iii-SDK boilerplate.

Verified per worker: cargo fmt --check, cargo clippy --all-targets
--all-features -- -D warnings, cargo test --all-features.

Confidence: high
Audited every Wave 1 + Wave 2 worker that vendored harness-types and
removed the vendor where the worker doesn't actually use it.

Trim outcome:
  - auth-credentials: dep was declared but no source-code use. Drop.
  - document-extract: same — dep was vestigial. Drop.
  - models-catalog: only used 4 small types (ThinkingLevel, ThinkingBudgets,
    Transport, CacheRetention; ~30 lines total). Inline locally and drop
    the vendored crate.
  - turn-orchestrator/crates/hook-fanout: Cargo.toml listed harness-types
    but the crate doesn't import it. Trim the manifest entry. The
    workspace still vendors harness-types because turn-orchestrator's own
    src code uses AgentEvent / AgentMessage / ToolCall / ContentBlock.

Net diff: -2063 lines, +68 lines (mostly the inlined enums in
models-catalog).

Workers still vendoring harness-types (heavy, real use):
  - session-tree (AgentMessage, ContentBlock, AgentContext, ...).
  - turn-orchestrator (AgentEvent, AgentMessage, ToolCall, ToolResult,
    ContentBlock, TextContent, AssistantMessage, ToolResultMessage,
    UserMessage).
  - shell-bash + shell-filesystem (sandbox-helpers depends on
    ContentBlock/TextContent/ToolResult to build error-as-tool-result
    payloads).

Verified per worker: cargo fmt --check, cargo clippy --all-targets
--all-features -- -D warnings, cargo test --all-features.

Confidence: high
…orkers

Ran cargo-machete across every Wave 1 + Wave 2 worker (top-level package
plus every vendored crate) and removed every dependency that was declared
but never imported in source.

Direct trims (worker Cargo.toml [dependencies]):
  - auth-credentials: chrono, thiserror
  - document-extract: async-trait
  - guardrails: tracing
  - llm-budget: async-trait, thiserror
  - shell-subagent: serde (kept serde_json)
  - turn-orchestrator: uuid

Vendored trims:
  - harness-types/Cargo.toml (4 copies): chrono
  - sandbox-helpers/Cargo.toml (2 copies): serde, tokio

Re-ran cargo-machete after the change — every worker reports "clean".

Verified per worker: cargo clippy --all-targets --all-features -- -D
warnings, cargo test --all-features. Test counts unchanged across the
matrix (37, 16, 16, 11, 10, 8, 8, 5, 2, 2, 1).

Confidence: high
…r own workers

These three primitives were vendored under turn-orchestrator/crates/ even
though turn-orchestrator only consumed three string constants from them
(DRAIN_ID, FUNCTION_ID, IS_SET_ID) and called the underlying functions via
the bus. They are real iii workers — each registers handlers on the bus —
not implementation details of turn-orchestrator.

Promote them to root-level workers:
  - durable-queue (queue::push, queue::drain, queue::peek)
  - hook-fanout (hooks::publish_collect)
  - state-flag (flag::set, flag::clear, flag::is_set)

In turn-orchestrator:
  - Replace the three const refs with inline string literals
  - Drop crates/durable-queue, crates/hook-fanout, crates/state-flag
  - Drop the corresponding workspace members + path-dep entries
  - Add iii.worker.yaml dependencies block declaring all three at "^0.1.0"
    so `iii worker add turn-orchestrator` resolves and installs them

Verified per worker: cargo fmt --check, cargo clippy --all-targets
--all-features -- -D warnings, cargo test --all-features.

Confidence: high
…olicy subscribers

Six more workers landed:

  - harness-runtime (agent::stream_assistant + agent::abort/push helpers)
  - session-corpus (corpus::*; leaf, no Rust deps)
  - context-compaction (subscriber on agent::events; vendors session-tree
    + models-catalog + harness-types)
  - policy-denylist (split out of policy-subscribers)
  - audit-log (split out of policy-subscribers)
  - dlp-scrubber (split out of policy-subscribers)

policy-subscribers was a multi-binary crate; the three subscribers are now
distinct workers because each is independently installable and runs as
its own process. Each pulls only the helpers (Subscriber struct,
unwrap_envelope, write_hook_reply) it needs.

harness-runtime in modular form drops the bundled hook-fanout /
durable-queue / state-flag / shell-* registrations — those are standalone
workers — and only owns the agent::* surface. The two const refs
state_flag::SET_ID and durable_queue::PUSH_ID become inline string
literals.

iii.worker.yaml dependencies declared:
  - harness-runtime → durable-queue, state-flag, llm-budget
  - turn-orchestrator → adds harness-runtime to its existing block,
    closing the dep gap noted in the previous audit

session-corpus, context-compaction, policy-*, audit-log, dlp-scrubber
have no worker-to-worker deps — they only call engine builtins (state::*,
stream::*, publish, pubsub topics).

Verified per worker: cargo fmt --check, cargo clippy --all-targets
--all-features -- -D warnings, cargo test --all-features. cargo-machete
reports clean across the matrix.

Confidence: high
…p the vendor

context-compaction was vendoring the entire models-catalog crate (~490
lines + 144-line models.json) just to read one u64 (context_window) per
session. models-catalog ships as its own worker now — call it via the
bus instead of bundling its source.

Changes:
  - Drop crates/models-catalog/.
  - Replace `use models_catalog::get` with an inline async helper
    `lookup_context_window(iii, provider, model_id)` that calls
    `models::get` via iii.trigger and reads context_window from the
    response.
  - Split CompactionConfig::new (sync, uses DEFAULT_CONTEXT_WINDOW) from
    a new CompactionConfig::resolve (async, hits the bus).
  - Declare `models-catalog: "^0.1.0"` in iii.worker.yaml so
    `iii worker add context-compaction` resolves and installs it.
  - Drop the test that asserted a specific catalog value (was effectively
    testing models-catalog data, not context-compaction logic).

Verified: cargo fmt, cargo clippy --all-targets --all-features
-- -D warnings, cargo test --all-features (16 pass). cargo-machete
clean.

Confidence: high
…the vendor

Convert the compactor's two session-tree calls (load_messages, compact) to
bus calls (session::messages, session::compact). Drop the vendored
session-tree crate (~1238 lines).

Trade-offs mitigated by introducing a small abstraction:

  - New `IiiBus` trait with the two operations the compactor needs.
  - `IiiSdkBus` wraps `iii_sdk::III` and translates each call to a
    typed `iii.trigger(...)` request.
  - `InMemoryBus` (test-only) holds messages + a Vec<CompactionRecord>
    in-process. Tests assert against typed compaction records via
    `bus.recorded_compactions()` instead of matching `SessionEntry`
    JSON tags — cleaner than the old approach.

Other changes:
  - Inline `CompactionDetails` (4 lines) since session-tree no longer
    sources it. Wire shape is unchanged — `serde_json` round-trips
    correctly with the session-tree worker on the bus.
  - `Compactor<S, F>` becomes `Compactor<F>` taking `Arc<dyn IiiBus>`.
  - `CompactionError::Storage(SessionError)` becomes
    `Storage(String)` since SessionError is no longer in scope.
  - Declare `session-tree: "^0.1.0"` in iii.worker.yaml so
    `iii worker add context-compaction` resolves the runtime dep.

Net diff: -1421 / +232 lines (≈ -1190 net). All 16 tests pass with the
in-memory bus; cargo fmt, clippy --all-targets --all-features
-- -D warnings, and machete are all clean.

context-compaction/crates/ now holds only `harness-types/` — the last
remaining genuine Rust-library dep that has no bus equivalent.

Confidence: high
…te modules

Single worker, two responsibilities, each in its own module with the correct
stream-trigger wiring.

Module layout:

  src/lib.rs       — coordinator. register_with_iii(iii, summariser) calls
                     both register_watcher and register_compactor and
                     returns a compound Handles.
  src/watcher.rs   — reactive: payload_signals_overflow + a
                     trigger_type:"stream" subscriber that republishes
                     overflow events to agent::transform_context.
  src/compactor.rs — proactive: Compactor + IiiBus + IiiSdkBus +
                     CompactorRegistry + extract_file_ops + a
                     trigger_type:"stream" subscriber that routes events
                     by group_id (= session id) to per-session Compactors.
  src/main.rs      — instantiates a NoopSummariser and registers both.

Also fixes a real bug: the previous register_with_iii used
trigger_type:"subscribe" with topic "agent::events". `agent::events` is an
iii stream (turn-orchestrator publishes via stream::set), not a pubsub
topic — that subscriber never fired in production. Both new register
functions use trigger_type:"stream" with config { stream_name }.

CompactorRegistry::handle reads payload.get("group_id") for session id and
payload.get("data") for the AgentEvent (matches the engine's stream-item
envelope shape).

CompactorRegistry production constructor takes III for
CompactionConfig::resolve; a #[cfg(test)] for_tests constructor takes None
and falls back to CompactionConfig::new (no bus call). seed() helper pre-
populates the registry for routing tests.

Tests: 19 pass (was 16). Three new registry tests:
  - registry_routes_event_by_group_id
  - registry_skips_envelope_without_group_id
  - registry_skips_envelope_with_non_event_data

cargo fmt, clippy --all-targets --all-features -- -D warnings, machete
all clean.

Confidence: high
Land all provider adapters as standalone modular workers:

  Native APIs (7):
    - provider-anthropic, provider-openai, provider-openai-responses,
      provider-google, provider-google-vertex, provider-azure-openai,
      provider-bedrock

  OpenAI-compatible (14):
    - provider-openrouter, provider-groq, provider-cerebras, provider-xai,
      provider-deepseek, provider-mistral, provider-fireworks,
      provider-kimi-coding, provider-minimax, provider-zai,
      provider-huggingface, provider-vercel-ai-gateway,
      provider-opencode-zen, provider-opencode-go

  Special (1):
    - provider-cli — wraps installed coding CLIs via shell::bash::*

Standard providers vendor four shared crates (harness-types,
overflow-classify, auth-credentials, provider-base) under crates/. Per-
provider Cargo.toml top deps trimmed via cargo-machete to the minimum
each lib actually imports — clean across all 22.

iii.worker.yaml dependencies declared per provider:
  - 21 standard providers depend on auth-credentials (provider-base
    fetches API keys via auth::get_token at runtime).
  - provider-cli depends on shell-bash (calls shell::bash::which and
    shell::bash::exec).

provider-bedrock is a stub — its register_with_iii returns a
not-implemented error; it's published to keep the namespace reserved
and ship later when AWS SDK wiring lands.

Verified per worker: cargo fmt --check, cargo clippy --all-targets
--all-features -- -D warnings, cargo test --all-features. cargo-machete
clean across the matrix.

Confidence: high
Land all OAuth providers as standalone modular workers:

  - oauth-anthropic (PKCE localhost)
  - oauth-openai-codex (PKCE localhost)
  - oauth-github-copilot (device-code)
  - oauth-google-gemini-cli (PKCE localhost)
  - oauth-google-antigravity (PKCE localhost)

Each vendors only auth-credentials (used as a Rust library for the
Credential type re-export — oauth workers don't store tokens themselves;
they return the resulting Credential to the caller, which in turn writes
via auth::set_token).

iii.worker.yaml has no dependencies block — oauth workers register their
own oauth::* surface and don't call other workers' bus functions.

Per-provider Cargo.toml top deps machete-trimmed to the minimum each lib
actually imports. cargo-machete clean across all 5.

Verified per worker: cargo fmt --check, cargo clippy --all-targets
--all-features -- -D warnings, cargo test --all-features.

This closes Wave 5 of the original harness migration plan. Total modular
workers in the registry from harness sources: 47 (Wave 1: 7, Wave 2: 4,
primitive-promotions: 3, Wave 3: 6, Wave 4: 22, Wave 5: 5).

Confidence: high
…nt/ToolResult, drop harness-types vendor

Both shell workers were vendoring harness-types only because their
vendored sandbox-helpers used three types from it (ContentBlock,
TextContent, ToolResult) — all three constructed locally and serialized
out. The shell workers themselves never imported harness-types directly.

Inline wire-compatible mirrors of those types into
sandbox-helpers/src/errors.rs:
  - ContentBlock (Text, Image, ToolCall, ToolResult variants)
  - TextContent
  - ImageContent
  - ToolResult

Serde tags + field names match canonical harness_types byte-for-byte, so
downstream consumers (turn-orchestrator, agents) deserialize the produced
JSON back as harness_types::ToolResult without any conversion. Tests
include round-trip + canonical-tag assertions to lock the wire format.

Drop:
  - shell-bash/crates/harness-types/ (8 files, ~600 lines)
  - shell-filesystem/crates/harness-types/ (8 files, ~600 lines)
  - shell-bash/Cargo.toml workspace member + path dep
  - shell-filesystem/Cargo.toml workspace member + path dep

Re-add `serde = { workspace = true }` to each vendored sandbox-helpers
Cargo.toml (now needed since the inlined types use derive(Serialize,
Deserialize)). machete clean across both workers.

Net diff: -1252 / +152 lines.

Verified per worker: cargo fmt --check, cargo clippy --all-targets
--all-features -- -D warnings, cargo test --all-features.

Confidence: high
Per plan at .claude/plans/look-at-users-...md (eng-reviewed).

audit-log, policy-denylist: pub(crate) ReplyBus trait + InMemoryBus mock
recording write_reply + register_function + register_trigger calls.
Each gets >=1 wiring test (function id + trigger config) and >=1
handler-behavior test (envelope -> bus calls). dlp-scrubber adds wiring
+ unwrap_envelope extraction for parity.

TODOS.md created with audit-log silent-failure follow-up entry — the
Phase 1 silent_on_unwritable_path test surfaces the bug; production
fix tracked separately.

Acceptance gate per worker: cargo fmt, clippy --all-targets
--all-features -- -D warnings, test --all-features, machete clean.
… public-API tests across 22 providers

Per plan at .claude/plans/look-at-users-...-quirky-duckling.md (eng-reviewed).

Native API providers (7): assert serde round-trip on representative
request shapes plus pub const + helper-fn assertions where exposed.
provider-google-vertex and provider-azure-openai also assert their
api_url() template substitution.

OpenAI-compat providers (14): each gets the same 4-test template:
register-symbol smoke, config serde round-trip, with_credential
API-key path assertion, OpenAI Chat Completions request shape
round-trip. Catches the meaningful drift class (config field renames,
wire-format renames).

provider-cli: known-CLI bin/tag inspection over CLI_SHAPES.

provider-anthropic visibility:
  - content_block_to_wire promoted private→pub (consumed by external
    integration test).
  - build_content kept pub(crate) because it consumes private
    PartialState; tests added inline in src/lib.rs #[cfg(test)] mod.

Per-worker acceptance gate met: cargo fmt --check, cargo clippy
--all-targets --all-features -- -D warnings, cargo test --all-features,
cargo machete .
…rs extract

Per plan at .claude/plans/look-at-users-...md (eng-reviewed).

Each oauth-* worker gets a pub(crate) CredentialBus trait with
record_function + record_trigger + set_token, an IiiSdkBus production
impl, and an InMemoryBus test impl. Tests assert (a) wiring: each
oauth::* function id and stream/subscribe trigger config registered
correctly, (b) handler-behavior: canned token response stores the
right Credential via the bus.

oauth-github-copilot: extract copilot_headers() helper from inline
register-handler code; unit-test header values.

Acceptance gate per worker: fmt, clippy -D warnings, test, machete.
CI uses Rust stable 1.95.0; local was 1.94.1, so three new lints surfaced
only on CI:

  - clippy::duration_suboptimal_units in
    provider-base/src/openai_compat.rs:236 — Duration::from_secs(120)
    becomes Duration::from_mins(2). Applied to all 21 vendored copies
    (every native-API and OpenAI-compat provider).

  - clippy::sort_by_key in harness-runtime/src/resume.rs:263 and
    session-corpus/src/lib.rs:315 — sort_by(|a,b| b.x.cmp(&a.x))
    becomes sort_by_key(|s| std::cmp::Reverse(s.x)).

  - clippy::manual_is_some_and in provider-cli/src/register.rs:121 —
    .filter(|s| !s.is_empty()).is_some() collapses into
    .is_some_and(|s| !s.is_empty()).

No behavior change; pure clippy-clean rewrites.
…vider lib.rs

Each native-API provider builds its own reqwest client with a 120s
timeout in src/lib.rs (separate from the vendored provider-base copy
fixed in 5260c98). Apply the same Duration::from_secs(120) ->
Duration::from_mins(2) rewrite to:

  - provider-anthropic/src/lib.rs:249
  - provider-azure-openai/src/lib.rs:295
  - provider-google/src/lib.rs:271
  - provider-google-vertex/src/lib.rs:291
  - provider-openai-responses/src/lib.rs:282
…w::Result

CredentialStore::{get, set, clear, list} now return anyhow::Result<...>
so adapters can surface transient backend failures (eg iii-state trigger
errors) to bus callers. InMemoryStore wraps returns in Ok. register_with_iii
closures propagate via ? mapping anyhow::Error to IIIError::Handler.

Prepares the trait for the iii-state-backed adapter landing in subsequent
tasks (E2-E8).
Introduces a small trait that mirrors the iii_sdk::III::trigger surface
that store adapters need. Blanket impl on iii_sdk::III preserves
production behavior; tests in subsequent tasks (E3-E5) will provide a
mock for unit-level adapter testing.

Compile-time assertion in tests catches future iii_sdk::III::trigger
signature changes that would break the blanket impl.
New iii-state-backed adapter struct that takes Arc<dyn IIITrigger> for
testability (E2). get() issues state::get under scope auth_credentials,
key credential:<provider>, deserializes the response, surfaces trigger
or deserialize errors as anyhow::Err.

set/clear/list are unimplemented!() stubs; landing in E4-E5.

Tests use MockTrigger to verify payload shape, hit/miss handling, and
error propagation without a live engine.
set issues state::set with the credential serialized as the value;
clear issues state::delete. Both surface trigger failures as
anyhow::Err per Task E1's trait contract.

list still unimplemented; lands in E5.
…rapping

state::list returns values without keys, so list() must recover the
provider name from the value itself. Schema evolves to
{ provider, credential } so list can return Vec<(String, Credential)>.

set wraps with provider on write; get unwraps on read; list iterates
the array and pulls (provider, credential) tuples. Existing get/set
tests updated to assert the wrapped shape; 3 new list tests cover
populated scope, empty scope, trigger failure.

Adapter is now feature-complete; default-flip + integration test land
in E6.
main.rs picks IiiStateCredentialStore unless AUTH_CREDENTIALS_STORE=memory
is set. Logs the active backend at startup so operators can confirm.

tests/integration_iii_state.rs exercises the wire path (set, get, clear)
against a live engine, gated on IIITEST_ENGINE_URL. The test skips
cleanly when the env var is unset so cargo test stays green in CI.
Adds a Storage backends section covering AUTH_CREDENTIALS_STORE, the
iii_state schema (provider-wrapped value), and failure semantics
(trait now returns Result; bus errors surface to callers).

Fixes the now-stale 'Defaults to in-memory' claim — the default is
iii_state after E6.

Function-id table drift (auth::set vs auth::set_token, etc.) is left
for a separate doc-cleanup pass; not in this task's scope.
Five behaviors exercised on both backends: round-trip set/get/clear,
set overwrites existing, clear is idempotent, list includes set
entries, get on missing returns None.

InMemoryStore tests run unconditionally; IiiStateCredentialStore tests
gate on IIITEST_ENGINE_URL and skip cleanly when unset. Drift between
the two impls fails here before it can surprise a caller.
Mirrors the auth-credentials abstraction (commit 938c734). Lets the
upcoming iii-state SessionStore adapter (E10-E13) be unit-tested with
a mock without spinning up a live engine.

Compile-time assertion in tests catches future iii_sdk::III::trigger
signature changes.
ytallo added 19 commits May 5, 2026 18:52
New iii-state-backed adapter struct that takes Arc<dyn IIITrigger>
for testability. create() issues state::set under scope
session_tree_meta, key <session_id>, with the SessionMeta serialized
as the value.

append/load_entries/load_meta/list are unimplemented!() stubs;
landing in E11-E13.

Storage layout (scope-per-session, bounded scan cost):
  scope session_tree:<sid> for entries (E11-E12)
  scope session_tree_meta keyed by sid for metadata (this task + E13)
…efresh

append issues state::set on scope session_tree:<sid>, key <entry_id>;
this is fatal on failure (entry must persist).

Then it refreshes SessionMeta::updated_at via a load-mutate-write
sequence on scope session_tree_meta. Failures in the refresh log a
warning and return Ok — entry persistence > meta freshness, and a
stale updated_at is acceptable per plan-eng-review decision.

If meta doesn't exist yet (no prior create), refresh is a no-op.
state::list on scope session_tree:<sid> returns all entries for the
session (one value per state::set issued by append). Deserializes each
as SessionEntry and sorts by entry.id() lexicographically for stable
ordering across runs.

Empty scope returns Ok(vec![]); trigger failure returns
SessionError::Storage.
load_meta: state::get on session_tree_meta. null response maps to
SessionError::NotFound(sid); deserialization or trigger failure maps
to SessionError::Storage.

list: state::list on session_tree_meta. Empty scope returns
Ok(vec![]); each value deserializes as SessionMeta.

Adapter is now feature-complete: all 5 SessionStore trait methods
implemented over state::*. Default-flip + integration test land in
E14.
main.rs picks IiiStateSessionStore unless SESSION_TREE_STORE=memory
is set. Logs the active backend at startup so operators can confirm.

register_with_iii is generalized to accept Arc<dyn SessionStore> by
adding ?Sized to its type parameter; existing concrete-Arc call sites
continue to compile unchanged.

tests/integration_iii_state.rs exercises the wire path (create, append,
load_entries, load_meta) against a live engine, gated on
IIITEST_ENGINE_URL. Test skips cleanly when the env var is unset.
Adds a Storage backends section covering SESSION_TREE_STORE, the
scope-per-session iii-state layout (session_tree:<sid> for entries,
session_tree_meta for metadata), and failure semantics including the
non-fatal meta refresh in append().

Fixes the now-stale 'default is in-memory' claim — the default is
iii_state after E14.
Four behaviors exercised on both backends: create→load_meta,
append×2→load_entries, load_meta→NotFound for missing, list includes
created sessions.

InMemoryStore tests run unconditionally; IiiStateSessionStore tests
gate on IIITEST_ENGINE_URL and skip cleanly when unset.

A behavioral divergence (in-memory returns entries in insertion order,
iii-state sorts by entry id) is intentionally avoided in the test set
since the parity tests append in id order; a follow-up may either fix
in-memory to sort or document the difference.
…tree

Spawns the worker binary as a child process pointed at a live iii
engine, writes data, kills + restarts the worker, asserts data
survives. Marked #[ignore] so cargo test default runs skip; opt in
with --ignored after building the worker in release mode and setting
IIITEST_ENGINE_URL + IIITEST_WORKER_BIN.

Each test cleans up after itself (delete or just leave session/cred
behind if cleanup fails — restart-survival is the only assertion).
…unknown sessions

Aligns InMemoryStore with IiiStateSessionStore (and the README contract:
load_entries never NotFound; only load_meta does). Whole-feature review
caught the parity divergence.
Both AUTH_CREDENTIALS_STORE and SESSION_TREE_STORE now explicitly match
'memory' and 'iii_state' (plus unset → default). Unknown values log a
warning naming the valid choices instead of silently falling into the
persistent backend.

Whole-feature review flagged a typo footgun: AUTH_CREDENTIALS_STORE=mem
intended for tests would silently write to durable state.
shell-bash: sandbox::exec is provided by the iii-worker sandbox
surface, not iii-exec (which is a startup pipeline daemon and exposes
no bus functions).

shell-subagent: only shell::subagent::start is registered;
shell::subagent::wait and shell::subagent::cancel were never
implemented.

llm-budget: function id table now matches src/register.rs (record vs
record_spend; reset vs rollover; usage/enforce/exempt/pause are real
ids; alert_clear/exemption_grant/exemption_revoke/log_list never
existed).
Rename worker, function namespace, and library:
- Worker dir / Cargo / yaml: durable-queue → session-inbox
- Binary: iii-durable-queue → iii-session-inbox
- Library crate: durable_queue → session_inbox
- Function ids: queue::{push,drain,peek} → inbox::{push,drain,peek}
- Public API: queue_key → inbox_key, QueueRequest → InboxRequest

Atomic drain (was non-atomic get-then-set): use state::update with a
single Set { path: "", value: [] } op, which atomically returns the
prior value AND writes []. One round-trip; no race window where a
concurrent push between read and reset would be lost.

Callers updated:
- harness-runtime/src/register.rs:448 (inbox::push)
- turn-orchestrator/src/states/steering.rs:129 (inbox::drain)

Dependency manifests updated:
- harness-runtime, harness, turn-orchestrator iii.worker.yaml
- registry/index.json

Why: 'durable-queue' collided conceptually with iii-queue (the engine
builtin job/message queue with retries and DLQ). This worker is a
session inbox — pull-mode list, drained at session boundaries.
Misleading name; session-inbox is the honest one.
Worker dir, binary, library crate, and namespace renames:
- Dir / Cargo / yaml: harness-runtime → provider-router
- Binary: iii-harness-runtime → iii-provider-router
- Library: harness_runtime → provider_router
- Function ids: agent::{stream_assistant,abort,push_steering,push_followup}
  → router::{stream_assistant,abort,push_steering,push_followup}

Why: the README itself documented the confusion. Turn execution lives
in turn-orchestrator; this worker is a provider router. The 'harness-
runtime' name implied the loop runs here, which is wrong.

HTTP triggers under agent/{session_id}/... kept stable for backwards
compat with anything calling the HTTP surface.

Callers updated:
- turn-orchestrator/src/states/assistant.rs (router::stream_assistant)
- turn-orchestrator comments referencing harness-runtime/ paths
- audit-log, dlp-scrubber, policy-denylist README mentions
- harness/src/lib.rs EXPECTED_WORKERS array
- harness/README.md example JSON

Dependency manifests:
- harness, turn-orchestrator iii.worker.yaml
- registry/index.json
- WORKERS-MIGRATING.md, README.md
state-flag was a 100-line wrapper over state::set/get keyed by session.
Two callers — both now do the bus call directly:

- provider-router/src/register.rs router::abort:
    state::set { scope: "agent", key: session/<id>/abort_signal, value: true }
- turn-orchestrator/src/states/steering.rs abort_set():
    state::get { scope: "agent", key: session/<id>/abort_signal }

Convention preserved: name 'abort' maps to key
session/<id>/abort_signal (vs session/<id>/flags/<other> for
non-abort flags). Documented inline at each call site since the
convention now lives there instead of in a shared crate.

Removed:
- workers/state-flag/ directory
- state-flag dependency from provider-router, harness, turn-orchestrator iii.worker.yaml
- state-flag entry from registry/index.json
- state-flag mentions across READMEs and HARNESS-WORKER-PUBLISHING-MAP.md
- build_flag_payload helper + its test in provider-router/src/register.rs
- unused Value import in turn-orchestrator/src/states/steering.rs

Why: per the plan-eng-review scope decision, the abort_signal
convention is just two call sites. A 100-line worker wrapper for two
literal keys was overkill; direct state::* calls are greppable and
remove a layer of indirection.
Worker dir, binary, library, and namespace renames:
- Dir / Cargo / yaml: shell-subagent → subagent
- Binary: iii-shell-subagent → iii-subagent
- Library: shell_subagent → subagent
- Function id: shell::subagent::start → subagent::start

Why: nothing about this worker involves a shell. It spawns child
agent sessions and awaits their result. The shell- prefix was
historical and misleading.

Plus residual cleanup from the A/B/D rename passes that landed in
this commit: Cargo.lock churn from the moves, missed sed targets in
turn-orchestrator/src/{events,run_start,state}.rs comments
referencing the old paths, and HARNESS-WORKER-PUBLISHING-MAP.md row
updates.

Dependency manifests:
- harness/iii.worker.yaml: shell-subagent → subagent
- registry/index.json
- README.md, WORKERS-MIGRATING.md, HARNESS-WORKER-PUBLISHING-MAP.md
- harness/README.md, harness/src/lib.rs (EXPECTED_WORKERS array)
The harness/ all-in-one orchestrator is the bundled-pilot variant,
superseded by the modular workers in this PR. Listed as skipped in
the PR description but iii.worker.yaml + lib.rs were tracked, which
caused CI to discover it as a Rust worker and fail at cargo fmt
(Cargo.toml is local-only, never committed).

Defer harness/ to a follow-up PR with full sources committed.
Worker constants for hook topics, limits, denylist entries, and log paths were hardcoded in code while registry publishing duplicated the same defaults. The workers now load manifest config with env overrides while preserving existing registration defaults.

Constraint: Worker defaults must be declared on iii.worker.yaml and remain user-overridable
Rejected: Leave constants as the source of truth | registry and runtime defaults would keep drifting
Confidence: high
Scope-risk: moderate
Directive: Keep iii.worker.yaml config, registry default_config, and runtime config loaders aligned when adding worker defaults
Tested: cargo test for audit-log, policy-denylist, dlp-scrubber, document-extract, context-compaction, shell-bash, shell-filesystem
Tested: cargo clippy -- -D warnings for the same workers
Tested: cargo fmt --check, YAML/JSON parse check, git diff --check
Not-tested: Full end-to-end registry publish workflow against remote CI
These planning/review notes are useful locally but should not be versioned with worker source. The tracked docs are removed from the index and matching filenames are ignored so future local edits stay out of commits.

Constraint: User asked to remove the docs from git while keeping local copies
Rejected: Delete the files from disk | user explicitly asked to keep them locally
Confidence: high
Scope-risk: narrow
Directive: Do not re-add these local planning docs unless they become maintained repository documentation
Tested: Confirmed HARNESS-WORKER-PUBLISHING-MAP.md, WORKERS-MIGRATING.md, and WORKERS-PR-REVIEW.md still exist locally after git rm --cached
Not-tested: N/A
@ytallo ytallo force-pushed the feat/harness-modular-workers branch from 51f30a0 to 61d613b Compare May 5, 2026 21:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant