RFC: Governance patterns for behavioral control inside the OpenShell sandbox #1848
Replies: 5 comments 1 reply
-
|
Thanks for sharing governance patterns for behavioral control inside the OpenShell sandbox, which is an interesting topic that requires more discussion and exploration. |
Beta Was this translation helpful? Give feedback.
-
|
Following up with the reference architecture repo mentioned above: https://github.com/ljefford2-cmyk/local-first-ai-orchestration The governance patterns — Workflow Autonomy Levels (WAL 0–3), append-only audit log, and a structural privacy gate (Context Packager) — are detailed in the main README. Domain-specific implementations are under /domain-applications covering federal regulatory, clinical, K-12 education, and small business contexts. The core design question these address is the same one this RFC is working through: how do you let an agent act with graduated trust while keeping a human in the decision loop and maintaining an auditable record of every action? Happy to discuss fit with OpenShell's existing extension mechanisms. |
Beta Was this translation helpful? Give feedback.
-
|
Wrote up a formal integration profile mapping WAL 0–3 directly to OpenShell policy files — YAML examples, durable state store for restart reconciliation, escalation evaluator spec, model-change taxonomy, and shadow evaluation for security patches. v3, post-adversarial-review. |
Beta Was this translation helpful? Give feedback.
-
|
The WAL state machine and append-only audit log are implemented and running in Docker with 700+ tests — happy to share the implementation repo if it’s useful as a reference. |
Beta Was this translation helpful? Give feedback.
-
|
Moving this to GitHub Discussions (Ideas category) — this is exactly the kind of architectural conversation that benefits from a long-form thread rather than an issue. The WAL integration profile and Docker implementation are serious work and we want it visible to the broader community. The full thread and all comments will carry over automatically. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
NemoClaw and OpenShell solve runtime containment extremely well —
kernel-level isolation, network namespace enforcement, credential
management. What I haven't seen addressed yet is the behavioral
governance layer: how an agent earns the right to take progressively
more autonomous action, how context is curated before it reaches cloud
inference, and how you maintain an immutable record that the governance
system can actually depend on.
I've been developing these patterns independently across four regulated
domains (federal, clinical, SMB, and personal-scale), stress-tested
through multi-model adversarial review. Here are four that map directly
to NemoClaw's architecture.
Pattern 1: Workflow Autonomy Levels (WAL 0–3)
Failure mode addressed: Uncalibrated default autonomy. Too
permissive and the agent acts without oversight on day one. Too
restrictive and it's useless. There's no middle path that earns trust
through evidence.
Pattern: A graduated trust state machine. All new capabilities
start at WAL-0 (human reviews every output before any action is taken).
Promotion requires measurable thresholds — minimum transaction count,
minimum approval rate, zero sensitive data mishandling incidents.
Anomalies trigger automatic demotion. Model version changes reset the
trust clock entirely.
NemoClaw integration point: WAL layers over OpenShell's existing
policy YAML. Instead of static allow/deny rules, WAL provides a dynamic
state machine that adjusts agent permissions based on operational
history recorded in the audit log. The policy file becomes a runtime
artifact that reflects earned trust rather than assumed trust.
Pattern 2: Context Packager with Memory Selection Policy
Failure mode addressed: The agent retrieves "relevant" context from
memory and attaches it to a cloud inference call. The retrieval
inadvertently includes sensitive material. OpenShell's Privacy Router
sanitizes known PII patterns but cannot detect contextual sensitivity —
information that is sensitive not because of its format but because of
what it reveals in combination.
Pattern: A four-step sequential pipeline — receive, retrieve,
redact, assemble — with a defined sensitivity taxonomy and memory
selection policy. Every step produces an audit log entry. Memory objects
carry two attributes: a confidence score (user-stated facts score
highest; inferences from documents score lower; summaries of summaries
score lowest) and a shareability classification (local-only by default;
cloud-eligible only when explicitly tagged). Only tagged-shareable,
high-confidence context leaves the local perimeter.
NemoClaw integration point: The Context Packager operates as a
preprocessing layer before payloads reach the
inference.localPrivacyRouter. The Packager handles semantic-level context curation; the
Privacy Router handles transport-level credential injection and PII
pattern matching. They are complementary, not redundant — the Packager
narrows what the Privacy Router has to scan.
Pattern 3: Append-Only Audit Log (Two-Stream Design)
Failure mode addressed: The tension between replayability and
privacy. A log that captures full payloads for debugging exposes
sensitive content. A log that strips content for privacy cannot support
replay or forensic accountability. Most implementations pick one and
accept the tradeoff. There's a better path.
Pattern: Two separate streams. Stream 1 (routing metadata) captures
every decision — event type, timestamp, job ID, routing decision, target
model, context classes requested, redaction actions taken, packager rule
version — without containing sensitive content. This stream is
sufficient for WAL governance, anomaly detection, and accountability.
Stream 2 (optional, encrypted at rest) captures full pre- and
post-redaction payloads for debugging, with separate retention policy
and access controls. The operator opts into Stream 2; Stream 1 is
non-optional.
The log is cryptographically hashed and append-only (
chattr +a). Theagent cannot modify or delete its own logs.
NemoClaw integration point: OpenShell already generates telemetry.
The two-stream design provides the structured accountability layer that
the WAL state machine depends on — and that endpoint security platforms
can consume for behavioral analysis. Stream 1 is the evidence base.
Stream 2 is the debugger.
Pattern 4: Job-Based UX with Human Decision Loop
Failure mode addressed: Operator approval fatigue. When the TUI
surfaces individual blocked network requests for real-time approval,
operators rubber-stamp to unblock workflows. The approval mechanism
exists but provides no real oversight. The agent can exploit this
pattern by generating high volumes of individually plausible requests.
Pattern: Interactions are jobs, not chat sessions or individual
permission prompts. Submit → Acknowledge → Process → Deliver. Background
processing with asynchronous notification. The human reviews the job's
results and intent — not each HTTP call. Every job is logged with full
provenance. This reduces the cognitive surface area for approval fatigue
by collapsing many micro-decisions into one meaningful review.
NemoClaw integration point: The job model wraps OpenShell's TUI
approval flow. Instead of surfacing individual network requests, the job
model batches related requests into a single reviewable unit with context
about what the agent is trying to accomplish. The human approves intent,
not mechanics.
Validation
These patterns were developed across four domains and stress-tested
through independent adversarial review by multiple frontier AI models.
A cross-domain reconciliation pass traced nine structural findings across
all implementations to separate documentation debt from genuine
architectural debt. Eight of nine were resolved; one (network perimeter
model for personal-scale deployments) required a new design decision.
Full specifications, adversarial review methodology, and cross-domain
validation results are available here:
https://github.com/ljefford2-cmyk/Local-AI-Orchestrator
These are offered as architectural contributions to the NemoClaw and
OpenClaw communities — not a competing product, no runtime, no SDK.
Design specifications that can be implemented within NemoClaw's existing
extension mechanisms.
Interested in any feedback from the team on fit with the current roadmap.
Beta Was this translation helpful? Give feedback.
All reactions