feat(runtime,bridge,channels): MCP bridge for CC subprocesses + shell pre-gate uplift + approval push surface#1182
Conversation
Standalone crate exposing OpenFang's tool surface to MCP clients (primarily
Claude Code subprocesses) over stdio. Per architectural decision in ANAI-22:
not folded into openfang-runtime — keeps the protocol adapter out of the
kernel/compactor blast radius and the dep graph clean.
This commit is scaffolding only:
* Cargo manifest with rmcp 1.x (server, transport-io, macros)
* lib.rs: ToolDispatcher seam trait (runtime-implements, bridge-consumes,
one-way dep), ToolDispatchError enum, Bridge struct wrapping an
Optional<Arc<dyn ToolDispatcher>>, single stub `ping` tool
* main.rs: stdio MCP server entrypoint, tracing -> stderr (stdout is the
transport), no dispatcher attached
* Workspace members updated
Identity is bound at Bridge construction time, not per-call — the security
invariant tracked by ANAI-31. Real tool surface mapping lands in ANAI-30.
cargo check -p openfang-mcp-bridge: clean.
cargo check --workspace: clean (pre-existing imap-proto future-incompat
warning unrelated).
Refs: ANAI-22, ANAI-29
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds the daemon-side foundation for the MCP bridge per the ANAI-30 plan
(topology 1b: daemon → CC → bridge → unix socket → daemon dispatcher).
- New `protocol` module in openfang-mcp-bridge: Frame/Hello/HelloAck/
CallRequest/CallResponse types with length-prefixed JSON framing
(1 MiB cap, 4-byte BE length prefix). Gated by `ipc-codec` feature
so type-only consumers can drop the tokio io traits.
- New `bridge_ipc` module in openfang-api: BridgeIpcServer binds
<home_dir>/run/bridge.sock (0600), accept loop with graceful
shutdown via Notify, per-connection Hello validation and CallRequest
→ CallResponse loop.
- run_daemon spawns the listener; failure is non-fatal (HTTP keeps
serving; bridge just unavailable). Socket file removed on shutdown.
Step 1 stub: the dispatcher returns CallResult::Error
("not yet wired"). Step 2 replaces this with a call into
openfang_runtime::tool_runner::execute_tool, scoped to the four-tool
allowlist (file_read, file_list, agent_list, channel_send). Identity
binding + token-table auth land in ANAI-31.
Tests: 3 protocol roundtrip tests + 4 IPC handler tests
(handshake/dispatch end-to-end via tempfile socket, version mismatch
rejection, empty-token rejection).
Refs ANAI-30, ANAI-22.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replaces the step-1 stub in `BridgeIpcServer` with a real call into
`openfang_runtime::tool_runner::execute_tool`, mirroring the argument
bundle used by the HTTP /mcp endpoint in routes.rs.
- Added ALLOWED_TOOLS allowlist: file_read, file_list, agent_list,
channel_send. Rejection happens at the protocol layer (CallResult::Error)
before any kernel touch.
- Added dispatch_call(): snapshots the skill registry, builds a
KernelHandle from Arc<OpenFangKernel>, and invokes execute_tool.
- ToolResult mapped to CallResult::Ok { content, is_error }, preserving
the Ok/Error distinction (Error = bridge couldn't dispatch; Ok with
is_error = tool ran but returned an error).
- Identity stub: caller_agent_id taken at face value from
CallRequest::agent_id. Real per-spawn token-bound identity lands in
ANAI-31.
Test: ipc_handshake_and_allowlist_gate verifies wire shape end-to-end:
disallowed tool gets allowlist Error, allowed tool gets Ok response. Real
execute_tool integration tests come once the daemon spawns the bridge
for real (ANAI-31).
…l surface
Replaces the stub `ping` tool with the four ANAI-30 tools (file_read,
file_list, agent_list, channel_send) and wires the bridge binary to forward
each `tools/call` over the daemon IPC socket established in step 1.
Library (lib.rs):
- ToolDispatcher::call now returns DispatchOk { content, is_error }
preserving the tool-error-vs-dispatch-error distinction across the seam
- built_in_tools() declares the four-tool slice; schemas mirror
runtime::tool_runner::builtin_tool_definitions() (kept in lockstep)
- Bridge: manual ServerHandler impl (drops the #[tool_router] macro). Filters
advertised tools by intersecting built_in_tools() with
ToolDispatcher::allowed_tools(); double-checks before dispatch
- Bridge::new now requires a dispatcher (was Option<_>)
Binary (main.rs):
- Reads OPENFANG_BRIDGE_SOCKET / TOKEN / AGENT_ID env vars (last is stub for
ANAI-30; ANAI-31 derives identity from token)
- Connects to daemon, performs Hello/HelloAck handshake, exits on rejection
- IpcDispatcher: bridge-side ToolDispatcher impl. Forwards each call via mpsc
to an actor task that owns the stream; correlation-by-request_id with a
PendingMap<u64, oneshot> so concurrent tools/call invocations don't
serialize at the dispatcher layer
- Reader task drains pending oneshots with an error on connection close so
in-flight calls don't hang; production path exits the process so CC
notices and tears down (gated behind cfg(not(test)))
Tests:
- lib: built_in_tools_has_anai30_slice, permitted_tools_intersects_with_dispatcher_allowed
- main: ipc_dispatcher_round_trip_and_correlation — fake daemon listener,
full handshake, two concurrent calls, verifies per-id correlation and the
NotPermitted gate
Workspace check clean. Daemon-side bridge_ipc tests still pass (4/4).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
End-to-end topology now exists at the type level:
daemon → claude (per-prompt) → openfang-mcp-bridge → IPC → daemon
- Add `caller_agent_id: Option<String>` to CompletionRequest. Plumbed
through all construction sites; agent_loop populates it with
session.agent_id, everywhere else passes None.
- Daemon (`server.rs::run_daemon`): after BridgeIpcServer starts,
publish OPENFANG_BRIDGE_SOCKET and OPENFANG_BRIDGE_BIN as process env
for subprocess drivers to discover. Bridge bin defaults to a sibling
of current_exe; operators can override with OPENFANG_BRIDGE_BIN. Both
set with `unsafe` (edition 2024) but only during single-threaded
daemon startup, before any subprocess spawns.
- BridgeIpcServer gains `socket_path()` accessor.
- ClaudeCodeDriver: per-spawn `try_build_bridge_mcp_config`. When
caller_agent_id is set AND both discovery env vars are present,
generate a UUID token, write `<home>/run/cc-mcp-<uuid>.json` (0600),
and add `--mcp-config <path> --strict-mcp-config` to the claude args.
RAII guard removes the file on drop so per-spawn token lifetime is
bounded by the CC subprocess.
- apply_env_filter extended to strip OPENFANG_BRIDGE_* from CC's child
env. Bridge gets these only via the explicit `env` map in the
mcp-config — CC inheriting them would risk a stray bridge picking up
the daemon socket without a fresh per-spawn token.
- Tests:
- test_build_bridge_mcp_config_shape — verifies wire shape claude
expects: mcpServers.openfang.{command,args,env} with exactly the
three discovery vars in env (no extras to leak state).
- test_apply_env_filter_strips_bridge_discovery_vars — confirms
filter removes all four bridge vars from CC's child env.
- test_bridge_mcp_config_drop_removes_file — RAII cleanup invariant.
Stub points still flagged: token validated as non-empty (ANAI-31
replaces with daemon-issued per-spawn token table); agent_id taken
in-band from CallRequest (ANAI-31 derives from token).
11 CC driver tests pass. bridge_ipc (4) and bridge crate (6) tests
unchanged. Workspace check clean.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…NABLED
Default-off kill switch so we can deploy the bridge code path without
inlining it into every CC invocation. When the gate is unset or not in
{1, true}, try_build_bridge_mcp_config returns None and CC is spawned
exactly as it was pre-step-4 — no --mcp-config, no temp file, no bridge
child. Validation flow: deploy with gate off (sanity), launchctl setenv
OPENFANG_BRIDGE_ENABLED 1, bounce daemon, observe; if anything regresses,
flip back to 0 and bounce for instant recovery.
Daemon still starts the IPC listener and publishes BRIDGE_SOCKET/BIN env
unconditionally — both are harmless without a bridge child connecting.
Pure additive switch; zero behavior change when off.
Test exercises the full truth table for bridge_enabled() (unset, truthy
variants, falsy/garbage variants) and confirms the gate suppresses
config generation regardless of other env. Single test owns the global
env var so no serial_test infra needed.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Bridge IPC handshake works standalone (bridge binary connects + Hello/HelloAck ok against the live socket), and the daemon-side `wired CC --mcp-config for OpenFang bridge` debug line confirms the flag is being passed to claude. But no `bridge IPC accepted connection` events ever fire — meaning claude is launched with `--mcp-config` but isn't spawning the MCP server subprocess. Without `--debug`, claude swallows MCP launch errors silently. And we drop CC's stderr on success spawns, so any silent rejection is invisible. Add (both spawn paths): - `--debug` flag when bridge config is wired, so MCP errors print to stderr. - Always log a 4 KB tail of CC stderr at info when bridge_wired, regardless of success/failure. Streaming path now drains stderr concurrently to avoid pipe deadlock under chatty --debug output. Existing 12/12 claude_code unit tests still pass; workspace check clean. Diagnostic only — once the cause is identified we'll pare back to bounded on-demand logging.
- bridge_ipc: promote handshake/dispatch events to INFO and add an `accepted connection` log on accept. Operators can now observe the full bridge lifecycle from daemon stderr without crawling through ~/.claude/debug/<uuid>.txt. - claude_code driver: gate --debug + the 4 KB CC-stderr-tail diagnostic behind a new OPENFANG_BRIDGE_DEBUG env var (off by default). With proper INFO logs daemon-side, the noisy --debug output and the per-spawn ~/.claude/debug/ files are no longer load-bearing. - server: validate operator-supplied OPENFANG_BRIDGE_BIN path at boot and log the resolution outcome (override vs. probe). Catches deploy ordering bugs where the env points at a binary that doesn't exist. Stderr is still drained concurrently in the streaming path — required whenever --debug might be on, cheap when it isn't.
Relocate builtin_tool_definitions() from runtime::tool_runner to
openfang_types::tool::registry as the single source of truth. Bridge
now derives its advertised surface from the registry, filtered by a
substrate-level BRIDGE_DENY allowlist (currently empty).
CC sees the full kernel surface; per-agent gating remains agent.toml.
web_fetch and web_search are no longer carved out — treat CC as an
API model: tools route through OpenFang, not hidden channels.
- openfang-types::tool: tool.rs → tool/{mod,registry}.rs
- tool_runner re-exports builtin_tool_definitions for callsite stability
- openfang-mcp-bridge: adds openfang-types dep (types-only, runtime-free
invariant preserved); built_in_tools() is now ~7 lines
- Tests: drift sentinel for BRIDGE_DENY, full-surface assertion,
ANAI-32 canonical-nine sanity (8/8 passing)
Validated end-to-end against live daemon: file_list, file_read,
agent_list, memory_recall, web_fetch, file_write all round-trip.
Lift the pure-syntactic shell validators -- metacharacter denylist and exec_policy allowlist -- out of the per-tool match arm in execute_tool and run them BEFORE the approval gate. Without this, denied commands were sent for human approval, approved, and only then rejected by the metachar denylist inside the per-tool arm. That wasted operator attention on commands guaranteed to fail. Validators remain inside the shell_exec arm as defense-in-depth. Scoped to shell_exec only -- not all is_shell_tool entries. process_start has a different input shape and its own validators. Widening the pre-gate is a separate change. For the deferred path -- commands that clear the pre-gate but still need human approval -- to reach a human, add the missing push surface. ApprovalManager. Tokio broadcast of Submitted / Resolved / TimedOut lifecycle events, plus a subscribe API. Lag-tolerant. Slow subscribers get RecvError::Lagged and resync via list_pending. Tracing now includes agent_id, tool_name, risk, and decided_by on every lifecycle line. channel_bridge. Spawn an approval surfacer that consumes those events, resolves the agent bindings via the registry, and pushes a formatted prompt to the most-specific bound channel and channel_id. Submission prompts include short id, agent, tool, risk, action summary -- truncated -- and timeout, with /approve and /reject hints. Resolved and TimedOut events post a follow-up so the prompt is not left dangling. Tests added on the approval side cover Submitted+TimedOut and Resolved event delivery, plus UTF-8-safe log truncation. Validated live on the daemon -- tests A through F. Metachar denial is synchronous, no approval burned. Allowlist match on argv0 basename clears without prompting. Approval surfacer delivers prompts to the bound channel for commands that fall through to approval.
Apply rustfmt to files introduced or modified by this branch. Upstream-drift files (kernel, agent_loop, channels, anthropic/openai drivers, host_functions, model_catalog, types message) intentionally left untouched as a separate concern.
Both bridge-MCP-config wiring sites in the Claude Code driver were
using .map(|cfg| { side-effects; cfg }) on Option<NamedTempFile>,
which clippy flags as manual_inspect. .inspect() expresses intent
directly. No behavior change.
The MCP bridge IPC is unix-domain-socket-only by construction (daemon
listens on a unix socket; bridge subprocess connects to it). The bridge
crate and the daemon-side `bridge_ipc` module unconditionally imported
`tokio::net::{UnixStream, UnixListener}`, which broke Windows CI with
E0432 unresolved-import errors in `openfang-mcp-bridge::main` and
`openfang-api::bridge_ipc`.
Gates:
- `openfang-mcp-bridge::main` — entire body cfg-gated to `unix`; on
non-unix the binary is a no-op stub that prints a clear message and
exits non-zero. Tests gated `cfg(all(test, unix))`.
- `openfang-api::lib` — `pub mod bridge_ipc` gated to `unix`.
- `openfang-api::server::run_daemon` — `BridgeIpcServer::start` call
gated to `unix`; non-unix logs a single info line and proceeds without
bridge IPC. The CC driver's existing missing-socket fallthrough means
CC subprocesses spawn without `--mcp-config` on Windows, matching the
bridge-disabled path.
No behavioral change on Linux/macOS. Windows users get a daemon that
boots without bridge support; MCP-routed tools are unavailable until a
Windows-native transport (named pipes / TCP loopback) lands as a
follow-up.
Verified: cargo check --workspace, cargo check --workspace --tests,
cargo test -p openfang-mcp-bridge -p openfang-api --lib, cargo fmt
--check, and cargo clippy all clean on macOS.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Superseded by ANAI-45 — closing unmerged. This PR is being closed in favor of ANAI-45 (trust pipeline unification), which absorbs the capability gate into a unified Rationale: this PR plus #1183 plus the (unpushed) ANAI-44 work each introduced their own "skip the approval prompt" path with different contracts. Consolidating them as three incremental PRs would mean reviewers seeing the design through a keyhole; ANAI-45 ships the whole shape coherently. Zero reviews here and zero installed users either side make a clean replacement the simpler move. Branch |
Closes #1180.
Summary
openfang-mcp-bridgecrate — a stdio MCP server that forwardstools/callrequests over a Unix-socket IPC channel back into the daemon's tool dispatcher, so Claude Code subprocesses can reach OpenFang's full tool surface (file_read,file_list,agent_list,channel_send,memory_recall,web_fetch, etc.) instead of being stuck with their built-ins.shell_execpre-gate (metacharacter denylist +exec_policyallowlist) ahead of the approval gate, so denied commands fail synchronously without burning operator attention.ApprovalManagerlifecycle broadcast (Submitted/Resolved/TimedOut) and achannel_bridgeapproval surfacer that pushes formatted prompts to the most-specific bound channel for any command that falls through to "needs human".builtin_tool_definitions()fromruntime::tool_runnertoopenfang-types::tool::registryso the bridge and the runtime share one source of truth (re-exported fromtool_runnerfor callsite stability).Topology
<home>/run/bridge.sock(0600), publishesOPENFANG_BRIDGE_SOCKET/OPENFANG_BRIDGE_BINfor subprocess drivers.<home>/run/cc-mcp-<uuid>.jsonMCP-config, passed via--mcp-config <path> --strict-mcp-config. RAII guard removes the file on drop.OPENFANG_BRIDGE_*env is stripped from CC's child env viaapply_env_filter— bridge gets the discovery vars only via the explicitenvmap in the mcp-config, so a stray bridge can't pick up the daemon socket without a fresh per-spawn token.PendingMap<u64, oneshot>so concurrent calls don't serialize at the dispatcher layer. Reader task drains pending oneshots with an error on connection close.Default-off kill switch
OPENFANG_BRIDGE_ENABLEDenv gate. Unset / not in{1, true}→try_build_bridge_mcp_configreturnsNoneand CC is spawned exactly as it was before this PR — no--mcp-config, no temp file, no bridge child. Daemon still starts the IPC listener and publishes discovery env unconditionally (both harmless without a bridge child connecting). Pure additive switch; zero behavior change when off.Notable invariants
CallRequest::agent_id(stub). A daemon-issued per-spawn token table is the natural follow-up — flagged in the issue.shell_execonly;process_starthas a different input shape and its own validators. Widening is a separate change.BRIDGE_DENYallowlist (currently empty). Per-agent gating remains inagent.toml.Test plan
cargo check --workspaceclean.cargo clippy -D warningsclean (only pre-existing transitiveimap-proto v0.10.2future-incompat note, not ours).openfang-mcp-bridgelib + binary tests:Frame/Hello/HelloAck/CallRequest/CallResponse)built_in_tools_has_*_slice(drift sentinel against the registry)permitted_tools_intersects_with_dispatcher_allowedipc_dispatcher_round_trip_and_correlation(fake daemon, full handshake, two concurrent calls +NotPermittedgate)bridge_ipc(4 tests): handshake + dispatch end-to-end via tempfile socket, version-mismatch rejection, empty-token rejection, allowlist gate.test_build_bridge_mcp_config_shape,test_apply_env_filter_strips_bridge_discovery_vars,test_bridge_mcp_config_drop_removes_file, fullbridge_enabled()truth-table.ApprovalManager:Submitted+TimedOutevent delivery;Resolvedevent delivery; UTF-8-safe log truncation.file_list,file_read,agent_list,memory_recall,web_fetch,file_writeall round-trip via the bridge; metacharacter denial is synchronous (no approval burned); allowlist match on argv0 basename clears without prompting; approval surfacer delivers prompts to the bound channel for fall-through commands.🤖 Generated with Claude Code