diff --git a/use-cases/SAFE-UC-0023/README.md b/use-cases/SAFE-UC-0023/README.md
index 22db581..0726bfb 100644
--- a/use-cases/SAFE-UC-0023/README.md
+++ b/use-cases/SAFE-UC-0023/README.md
@@ -1,32 +1,579 @@
-# Cloud ops troubleshooting assistant
+# Cloud ops troubleshooting assistant — incident triage, telemetry correlation, and bounded remediation
 
-> Seed page for **SAFE-AUCA**. Expand this into a full analysis using [`templates/use-case-template.md`](../../templates/use-case-template.md).
+> **SAFE-AUCA industry reference guide**
+>
+> This use case describes a real-world workflow where SRE, platform engineering, NOC, and incident-response teams use an agentic assistant to investigate cloud and Kubernetes production issues by correlating metrics, logs, traces, change events, infrastructure state, and recent deployments across multiple tools.
+>
+> It focuses on:
+> - how the workflow works in practice (tools, data, trust boundaries, autonomy)
+> - what can go wrong (defender-friendly kill chain)
+> - how it maps to **SAFE-MCP techniques**
+> - what controls + tests make it safer
+>
+> **Defender-friendly only:** do **not** include operational exploit steps, payloads, or step-by-step attack instructions.
+>
+> **No sensitive info:** do not include internal hostnames/endpoints, secrets, customer data, non-public incidents, or proprietary details.
+
+---
 
 ## Metadata
 
 | Field | Value |
 |---|---|
 | **SAFE Use Case ID** | `SAFE-UC-0023` |
-| **Status** | `seed` |
-| **NAICS 2022** | `Information (51)` |
-| **Last updated** | `2026-02-17` |
+| **Status** | `draft` |
+| **NAICS 2022** | `51` (Information), `518210` (Computing Infrastructure Providers, Data Processing, Web Hosting, and Related Services), `541512` (Computer Systems Design Services) |
+| **Workflow family** | `Cloud operations, SRE, and incident response` |
+| **Last updated** | `2026-03-18` |
+
+### Evidence (public links)
+
+- [AWS CloudWatch Investigations](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Investigations.html)
+- [AWS CloudWatch Investigations security and access](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Investigations-Security.html)
+- [AWS CloudWatch cross-account observability](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-Unified-Cross-Account.html)
+- [AWS Systems Manager Automation approvals (`aws:approve`)](https://docs.aws.amazon.com/systems-manager/latest/userguide/running-automations-require-approvals.html)
+- [Google Cloud Assist investigations](https://docs.cloud.google.com/cloud-assist/investigations)
+- [Google Cloud Assist overview](https://docs.cloud.google.com/cloud-assist/overview)
+- [Google Gemini for Google Cloud data governance](https://docs.cloud.google.com/gemini/docs/discover/data-governance)
+- [Google Cloud Assist audit logging](https://docs.cloud.google.com/cloud-assist/audit-logging)
+- [Azure Copilot troubleshooting agent](https://learn.microsoft.com/en-us/azure/copilot/troubleshooting-agent)
+- [Azure observability agent overview](https://learn.microsoft.com/en-us/azure/azure-monitor/aiops/observability-agent-overview)
+- [Azure SRE Agent root cause analysis](https://learn.microsoft.com/en-us/azure/sre-agent/root-cause-analysis)
+- [Azure Copilot access management](https://learn.microsoft.com/en-us/azure/copilot/manage-access)
+- [Datadog Bits AI SRE overview](https://docs.datadoghq.com/bits_ai/bits_ai_sre/)
+- [Datadog Bits AI SRE: investigate issues](https://docs.datadoghq.com/bits_ai/bits_ai_sre/investigate_issues/)
+- [Datadog Bits AI SRE: take action](https://docs.datadoghq.com/bits_ai/bits_ai_sre/take_action/)
+- [SAFE-MCP repository](https://github.com/SAFE-MCP/safe-mcp)
+- [SAFE-T1001 Tool Poisoning Attack](https://github.com/SAFE-MCP/safe-mcp/blob/main/techniques/SAFE-T1001/README.md)
+- [SAFE-T1102 Prompt Injection](https://github.com/SAFE-MCP/safe-mcp/blob/main/techniques/SAFE-T1102/README.md)
+- [SAFE-T1104 Over-Privileged Tool Abuse](https://github.com/SAFE-MCP/safe-mcp/blob/main/techniques/SAFE-T1104/README.md)
+- [SAFE-T1204 Context Memory Implant](https://github.com/SAFE-MCP/safe-mcp/blob/main/techniques/SAFE-T1204/README.md)
+- [SAFE-T1309 Privileged Tool Invocation via Prompt Manipulation](https://github.com/SAFE-MCP/safe-mcp/blob/main/techniques/SAFE-T1309/README.md)
+- [SAFE-T1703 Tool-Chaining Pivot](https://github.com/SAFE-MCP/safe-mcp/blob/main/techniques/SAFE-T1703/README.md)
+- [SAFE-T1801 Automated Data Harvesting](https://github.com/SAFE-MCP/safe-mcp/blob/main/techniques/SAFE-T1801/README.md)
+- [SAFE-T1911 Parameter Exfiltration](https://github.com/SAFE-MCP/safe-mcp/blob/main/techniques/SAFE-T1911/README.md)
+- [SAFE-T2102 Service Disruption via External API Flooding](https://github.com/SAFE-MCP/safe-mcp/blob/main/techniques/SAFE-T2102/README.md)
+- [SAFE-T2105 Disinformation Output](https://github.com/SAFE-MCP/safe-mcp/blob/main/techniques/SAFE-T2105/README.md)
+- [SAFE-T2106 Context Memory Poisoning via Vector Store Contamination](https://github.com/SAFE-MCP/safe-mcp/blob/main/techniques/SAFE-T2106/README.md)
+- [Kubernetes RBAC](https://kubernetes.io/docs/reference/access-authn-authz/rbac/)
+- [Kubernetes RBAC good practices](https://kubernetes.io/docs/concepts/security/rbac-good-practices/)
+- [Kubernetes Secrets](https://kubernetes.io/docs/concepts/configuration/secret/)
+- [Kubernetes audit logging](https://kubernetes.io/docs/tasks/debug/debug-cluster/audit/)
+- [Kubernetes ephemeral containers](https://kubernetes.io/docs/concepts/workloads/pods/ephemeral-containers/)
+- [Kubernetes Validating Admission Policy](https://kubernetes.io/docs/reference/access-authn-authz/validating-admission-policy/)
+- [AWS CloudTrail](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-user-guide.html)
+- [Azure Activity Log](https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/activity-log)
+- [Google Cloud Audit Logs](https://cloud.google.com/logging/docs/audit)
+- [NIST SP 800-207 Zero Trust Architecture](https://csrc.nist.gov/pubs/sp/800/207/final)
+- [OWASP Top 10 for LLM Applications: Prompt Injection](https://genai.owasp.org/llmrisk/llm01-prompt-injection/)
+- [Cloud Security Alliance Security Guidance v5](https://cloudsecurityalliance.org/artifacts/security-guidance-v5)
+- [Cloud Security Alliance AI Controls Matrix](https://cloudsecurityalliance.org/artifacts/ai-controls-matrix)
+
+---
+
+## Minimum viable write-up (Seed → Draft fast path)
+
+- validate the tool and approval model against at least one concrete implementation in production or pre-production
+- attach repeatable test fixtures and evidence artifacts for the recommended safety regressions
+- confirm the SAFE-MCP technique mapping and risk tiers with repository maintainers and domain reviewers
+
+---
+
+## 1. Executive summary (what + why)
+
+**What this workflow does**
+
+A cloud ops troubleshooting assistant investigates incidents, alerts, or operator questions by retrieving and correlating operational evidence: metrics, logs, traces, deployment events, cloud control-plane activity, Kubernetes state, configuration drift, ownership metadata, and relevant runbooks. It produces ranked root-cause hypotheses, drafts summaries and next steps, and may execute tightly bounded diagnostics. In mature implementations, it can also invoke pre-approved remediation automations under explicit approval and policy controls.
+
+**Why it matters (business value)**
+
+Modern cloud estates are distributed across regions, accounts, subscriptions, projects, clusters, and observability tools. Triage often requires time-consuming “mean time to innocence” work: proving whether the fault is in code, infrastructure, networking, scaling, configuration, or a recent change. A well-governed assistant can reduce time-to-first-hypothesis, standardize incident handling, improve evidence collection quality, reduce on-call cognitive load, and make escalations to service owners or external support faster and more consistent.
+
+**Why it is risky / what can go wrong**
+
+This workflow combines two dangerous properties in one system: access to sensitive operational data and proximity to high-privilege production tooling. The same logs, trace attributes, alert annotations, ticket comments, tool outputs, and prior incident notes that help with triage can also be attacker-controlled, stale, or misleading. If the assistant is over-privileged or allowed to chain directly from evidence gathering into production mutation, a poisoned investigation can become a self-inflicted outage, a secret-leak event, or persistent bad guidance. The safe default is therefore **autonomous read-only triage** with **human approval for any production mutation, external communication, or durable memory write**.
+
+---
+
+## 2. Industry context & constraints (reference-guide lens)
+
+### Where this shows up
+
+This pattern appears anywhere teams operate production cloud services with non-trivial reliability requirements, for example:
+
+- SaaS and internet platforms with multi-service, multi-region architectures
+- enterprises running centralized SRE / NOC / platform teams across AWS, Azure, and Google Cloud
+- Kubernetes-heavy environments with service mesh, GitOps, and frequent deployments
+- managed service providers and internal platform teams operating many customer or business-unit environments
+- regulated organizations where incident handling must remain auditable and permission-scoped
+
+### Typical systems in this workflow
+
+- cloud provider observability and control-plane tooling
+- third-party observability platforms and APM
+- Kubernetes APIs and cluster diagnostics
+- deployment systems, CI/CD, IaC, and change-history systems
+- incident management, ticketing, chat, paging, and status tools
+- knowledge bases, runbooks, postmortems, CMDB, and ownership registries
+- identity, access, approval, and audit systems
+
+### Constraints that matter
+
+- **Minutes-level response pressure:** incident triage often happens under outage conditions where delay is expensive.
+- **Cross-domain correlation:** useful evidence is split across logs, metrics, traces, change events, tickets, and cloud/Kubernetes APIs.
+- **Noisy and partially trustworthy data:** workloads and users can influence logs and traces; monitors can be misconfigured; time ranges can be wrong.
+- **Privilege asymmetry:** the assistant may only need read access for diagnosis, while remediation requires highly privileged actions.
+- **Multi-account / multi-cluster blast radius:** a bad decision can affect large portions of an estate if scoping is weak.
+- **Change-control and audit obligations:** many organizations require approvals, immutability, and evidence retention for production actions.
+- **Data residency and privacy constraints:** telemetry can contain secrets, personal data, regulated records, or customer payload fragments.
+
+### Must-not-fail outcomes
+
+- destructive or unauthorized production changes
+- secret, token, or customer-data leakage through summaries, tickets, or support cases
+- false root-cause analysis that drives harmful remediation
+- silent scope expansion across accounts, clusters, or tenants
+- runaway automation loops that create API floods, cost spikes, or extended outages
+- persistent contamination of future investigations through poisoned memory or runbooks
+
+### Operational constraints
+
+- partial telemetry during degraded incidents
+- temporary credentials and just-in-time access in privileged environments
+- limited budget for observability queries and API calls during large incidents
+- human approval fatigue during simultaneous alerts
+- the need to preserve a clear chain of evidence for postmortems and audits
+
+---
+
+## 3. Workflow description & scope
+
+### 3.1 Workflow steps (happy path)
+
+1. An alert, dashboard anomaly, incident ticket, customer-impact report, or on-call question starts an investigation.
+2. The assistant normalizes the request into a scoped investigation object: affected service, time window, account/project/subscription, region, cluster, namespace, deployment, severity, and initial symptom.
+3. The assistant gathers read-only evidence from observability systems and control-plane sources: metrics, logs, traces, recent deployments, recent configuration changes, autoscaling activity, health checks, Kubernetes events, and cloud audit trails.
+4. The assistant correlates the retrieved evidence with ownership, service dependencies, recent change history, prior incidents, and available runbooks.
+5. The assistant generates ranked hypotheses with explicit evidence, confidence, and missing-data notes. It should be able to say “inconclusive” instead of forcing a confident answer.
+6. The assistant drafts incident updates, recommended next diagnostic steps, service-owner handoff notes, or vendor-support case content.
+7. Where policy allows, the assistant executes **bounded diagnostics** only: for example, read-only cluster inspection, rollout-history collection, approved scripts in an isolated execution environment, or narrowly scoped debug operations that cannot mutate production state or exfiltrate data.
+8. If remediation is warranted, the assistant proposes a **pre-approved** runbook or action with exact target scope, expected blast radius, rollback path, and post-action validation checks.
+9. Any production-changing action (restart, rollback, scale, traffic shift, node cordon/drain, instance stop/terminate, policy change, debug exec with elevated visibility, secret or identity change) requires explicit human approval and a dedicated executor identity.
+10. After an approved action runs, the assistant validates downstream signals, updates the incident record, and stores a complete audit trail. Durable memory or reusable knowledge updates happen only after review.
+
+### 3.2 In scope / out of scope
 
-## Workflow Description (Seed)
+- **In scope:** alert-initiated or operator-initiated investigation; telemetry retrieval; control-plane and deployment-history lookup; read-only or isolated diagnostics; evidence synthesis; incident/ticket/chat drafting; support-handoff drafting; tightly governed execution of pre-approved remediation automations; post-action validation.
+- **Out of scope:** unrestricted shell access in production; arbitrary code execution on workloads or hosts; browsing or exporting secrets as part of normal triage; autonomous IAM/network/secret-policy changes; unconstrained self-healing; malware response or digital forensics outside approved playbooks; cross-tenant or cross-customer data access.
 
-Troubleshoot cloud ops issues by querying telemetry/logs/configs and suggesting remediations with change-control boundaries.
+### 3.3 Assumptions
 
-## In Scope / Out Of Scope
+- observability, cloud audit logging, and deployment history are already available
+- read-only investigation roles are distinct from mutating executor roles
+- pre-approved runbooks are versioned, reviewed, and owned
+- high-risk production actions require human approval and step-up authentication
+- secret scanning, redaction, and content-policy checks exist before messages or cases leave the investigation plane
+- the assistant is allowed to stop and escalate when evidence is conflicting or incomplete
 
-- **In scope:** TBD
-- **Out of scope:** TBD
+### 3.4 Success criteria
 
-## SAFE-MCP Mapping (Seed Skeleton)
+- faster time-to-first-hypothesis and lower MTTR without increasing unsafe change rates
+- all tool calls and approvals are attributable, reviewable, and scoped
+- zero unauthorized production mutations
+- no sensitive data leakage in assistant-generated outputs
+- demonstrable ability to block or quarantine poisoned context and over-privileged tool requests
+- measurable reduction in repetitive manual triage effort while preserving operator trust
 
-| Kill-chain stage | Failure/attack pattern | SAFE-MCP technique(s) | Recommended controls | Tests |
+---
+
+## 4. System & agent architecture
+
+### 4.1 Actors and systems
+
+- **Human roles:** on-call SRE, incident commander, service owner, platform engineer, security reviewer, change approver, vendor-support liaison
+- **Agent/orchestrator:** cloud ops troubleshooting assistant, potentially with specialist sub-agents for telemetry, Kubernetes, cloud inventory, runbooks, and communications
+- **Tools (MCP servers / APIs / connectors):**
+  - observability query tools (logs, metrics, traces, events, dashboards)
+  - cloud inventory and change-history tools
+  - Kubernetes inspection and bounded diagnostic tools
+  - deployment / GitOps / CI-CD history tools
+  - runbook and automation executors
+  - incident, chat, ticket, and support connectors
+  - knowledge-base, CMDB, and memory retrieval services
+- **Data stores:** observability backends, configuration sources, audit logs, incident and ticket records, knowledge bases, runbook repositories, optional vector stores or memory stores
+- **Downstream systems affected:** incident records, support cases, chat channels, automation executions, deployments, scaling state, traffic-routing controls, cloud and Kubernetes control planes
+
+### 4.2 Trusted vs untrusted inputs (high value, keep simple)
+
+| Input/source | Trusted? | Why | Typical failure/abuse pattern | Mitigation theme |
 |---|---|---|---|---|
-| TBD | TBD | TBD | TBD | TBD |
+| Alert payloads, monitor names, annotations, paging text | Semi-trusted | system-generated wrapper around fields that may still be user-configured or stale | priority spoofing, bad scoping, indirect prompt injection | structured parsing + provenance + treat free text as data |
+| Logs, traces, span/resource attributes, event messages | Untrusted | applications, users, and compromised workloads can write them | indirect prompt injection, misinformation, secret surfacing | sanitization + quoting + allowlisted extraction + truncation |
+| Cloud and Kubernetes resource state | Semi-trusted | authoritative sources, but may be stale, partial, or influenced by compromised principals | wrong correlation, stale read, misleading annotations | freshness checks + source cross-validation |
+| Deployment history, change records, GitOps status | Semi-trusted | authoritative but may lag or be incomplete | false blame on recent change, missing rollback context | timestamp checks + ownership metadata + citations |
+| Runbooks, KB articles, postmortems | Semi-trusted | internal and curated, but can be stale or unsafe | outdated or over-broad remediation guidance | versioning + expiry + owner review |
+| Tool outputs and generated summaries | Mixed | depends on connector quality and output shaping | contaminated context, hallucinated conclusions, schema drift | schema validation + strict output contracts + provenance labels |
+| Incident tickets, chat threads, operator prompts | Untrusted | humans and integrations can be mistaken, compromised, or adversarial | urgency abuse, authority spoofing, unsafe action requests | identity binding + approvals + treat text as data |
+| Durable memory / vector store | Semi-trusted | persistent context is helpful but can retain bad content | repeated contamination across future incidents | review-before-persist + TTL + quarantine + integrity checks |
+
+### 4.3 Trust boundaries (required)
+
+The workflow has several trust boundaries that reviewers should model explicitly:
+
+1. **Workload / user / monitor data → agent boundary**  
+   Untrusted operational text enters model context from logs, traces, alert annotations, ticket comments, and dashboard labels.
+
+2. **Agent → read-only retrieval boundary**  
+   The model turns intent into queries against observability, cloud inventory, and Kubernetes APIs. This is usually the lowest-risk boundary but still exposes sensitive data.
+
+3. **Agent → diagnostic execution boundary**  
+   The assistant may request script execution, ephemeral containers, or cluster diagnostics. Even “diagnostics” can become a mutation or exfiltration path if not isolated.
+
+4. **Investigation plane → remediation executor boundary**  
+   Crossing from read-only investigation into production-changing action is the most important boundary in the design. It should require separate identity, policy, and explicit approval.
+
+5. **Agent → communication / support boundary**  
+   Sending summaries or attachments into chat, tickets, or vendor-support systems creates a durable disclosure path.
+
+6. **Single-environment → multi-account / multi-cluster boundary**  
+   Centralized SRE assistants often operate across accounts, subscriptions, projects, or clusters; mistakes in scoping can silently expand blast radius.
+
+**Trust boundary notes**
+
+- Separate **control** from **data** at every boundary: untrusted retrieved content must not become operative instructions.
+- Treat production write access as a distinct trust zone, even if the same human initiated the investigation.
+- Prefer short-lived, request-scoped credentials and explicit target scoping over ambient broad access.
+- Preserve environment separation (`dev/test` vs `prod`) in both identity and tool routing.
+
+### 4.4 Tool inventory (required)
+
+| Tool / MCP server | Read / write? | Permissions | Typical inputs | Typical outputs | Failure modes |
+|---|---|---|---|---|---|
+| `observability.query` | read | service/account/project-scoped read role | service, resource, time window, query template | log excerpts, metrics, traces, events | injected text in results, stale time window, over-broad retrieval |
+| `cloud.inventory.describe` | read | cloud read role | resource id, account, region, tag filters | resource metadata, recent config/activity | wrong account or region, incomplete state, stale cache |
+| `k8s.inspect` | read (default) | namespace/cluster-scoped read RBAC | namespace, workload, pod, event selector | object specs, events, rollout history, logs | namespace overreach, accidental secret exposure through object descriptions |
+| `diag.sandbox.run` | exec (bounded) | isolated executor identity | approved script id, target scope, timeout | structured findings, command transcript | network egress, secret visibility, unintended mutation, environment escape |
+| `runbook.execute` | write | dedicated automation role | runbook id, target, parameters, approval token | execution id, status, before/after checks | wrong target, repeated retries, self-inflicted outage |
+| `incident.record` | write | ticket/chat integration scope | summary, evidence links, channel/ticket id | posted message, incident updates | secret leakage, persistent false narrative |
+| `support.case` | write | support-case integration scope | incident summary, redacted attachments, severity | case id, sent message | exfiltration to third party, over-sharing |
+| `knowledge.retrieve` / `memory.persist` | read / write | KB read role, guarded memory write role | keywords, incident id, proposed memory item | prior runbooks, similar incidents, stored notes | stale or poisoned guidance, durable contamination |
+
+### 4.5 Governance & authorization matrix
+
+| Action category | Example actions | Allowed mode(s) | Approval required? | Required auth | Required logging/evidence |
+|---|---|---|---|---|---|
+| Read-only retrieval | query logs/metrics/traces, list recent deployments, describe cloud resources | manual / HITL / autonomous | no | request-scoped read-only role | query parameters, result provenance, account/cluster scope |
+| Isolated diagnostics | run approved scripts, collect rollout history, bounded cluster diagnostics in sandbox | HITL / autonomous (policy-based) | policy-based; yes if prod-targeting or elevated visibility | short-lived sandbox executor, fixed image, network/volume policy | script id, image digest, transcript, resource scope, timeout |
+| Internal record updates | incident comment, ticket summary, internal chat update | HITL / autonomous (low-risk only) | policy-based | scoped integration token | posted content, redaction result, evidence links |
+| External communications | vendor support case, externally visible status input | HITL initially | yes | verified identity + DLP/redaction controls | message archive, attachment inventory, approver |
+| Durable knowledge / memory write | save incident lesson, create reusable troubleshooting note | HITL / policy-gated | yes or review-before-persist | guarded write role | content hash, reviewer, source provenance, TTL |
+| Low-risk bounded remediation | restart a single pre-approved canary target, re-run a failed automation in non-prod | manual / HITL, rarely autonomous | yes | dedicated automation role + step-up auth | before/after state, blast-radius statement, rollback plan |
+| High-risk production action | rollback release, scale down, cordon/drain node, terminate instance, modify IAM/network/secret policy, exec in prod | manual / HITL only | always; consider dual approval | privileged executor, just-in-time elevation, policy gate | immutable audit trail, explicit target diff, rationale, post-action validation |
+
+### 4.6 Sensitive data & policy constraints
+
+- **Data classes:** credentials, API keys, service tokens, customer identifiers, request payload fragments, source code, configuration values, environment variables, account topology, vulnerability data, support-case attachments
+- **Retention / logging constraints:** retain enough evidence for audit and postmortem, but avoid persisting raw sensitive telemetry unnecessarily; redact before leaving the investigation plane; keep transient investigations ephemeral where possible; do not auto-ingest sensitive outputs into durable memory
+- **Regulatory constraints:** privacy, data residency, contractual restrictions with AI or support vendors, sector-specific retention and disclosure rules, internal change-control requirements
+- **Safety / operational harm constraints:** the assistant must not apply ambiguous or destructive remediation by default; “inconclusive” is an acceptable outcome; one investigation should not create long-lived broad privileges
+
+---
+
+## 5. Operating modes & agentic flow variants
+
+### 5.1 Manual baseline (no agent)
+
+- Engineers inspect dashboards, search logs, compare recent changes, run `kubectl` / cloud CLI commands, consult runbooks, and update the incident record manually.
+- Existing checks often include change-control approval, peer review, privileged identity management, and post-action validation.
+- Humans catch many problems through domain judgment: they notice when a log line is suspicious, when a suggested rollback is too broad, or when multiple sources conflict.
+- The cost is speed and consistency: correlation across many tools is slow, repetitive, and error-prone under fatigue.
+
+### 5.2 Human-in-the-loop (HITL / sub-autonomous)
+
+- The assistant can autonomously collect read-only evidence, organize context, and draft hypotheses or incident updates.
+- The assistant may also run tightly bounded diagnostics in a sandbox or other isolated environment when policy explicitly permits it.
+- Human approvals sit at the most sensitive points:
+  - durable memory writes
+  - external communications
+  - any production mutation
+  - any diagnostic step that crosses into elevated visibility or exec semantics
+- This is the recommended operating mode for most real environments because it preserves operator judgment at the decision points that matter most.
+
+### 5.3 Fully autonomous (end-to-end agentic)
+
+- Fully autonomous mode is only advisable for a narrow subset of low-risk actions: automatic incident creation, read-only evidence gathering, policy-safe internal summaries, and perhaps some isolated diagnostics.
+- Autonomous remediation should be limited to pre-approved runbooks with:
+  - single-target or canary scope
+  - hard rate limits and retry budgets
+  - dry-run and preflight validation
+  - rollback or compensating action
+  - kill switch and immutable audit trail
+- If the assistant is wrong or manipulated, the blast radius includes false RCA, broad write actions, repeated retries, API-rate exhaustion, and durable misinformation. For most organizations, unrestricted autonomous remediation is an unacceptable default.
+
+### 5.4 Variants (optional)
+
+- **Single-agent vs orchestrated specialists:** one orchestrator can delegate to telemetry, Kubernetes, cloud-inventory, or runbook specialists.
+- **Provider-native vs third-party AIOps:** teams may use AWS/Azure/Google-native assistants, Datadog-like platforms, or internal orchestration layers.
+- **Centralized multi-account SRE vs application-team local assistant:** the former increases scoping and governance complexity.
+- **Real-time incidents vs post-incident analysis:** postmortem summarization is lower-risk than live remediation.
+
+---
+
+## 6. Threat model overview (high-level)
+
+### 6.1 Primary security & safety goals
+
+- preserve diagnostic integrity: the assistant should not be steered into false conclusions by untrusted operational data
+- preserve least privilege: retrieval, diagnostics, communications, and mutation should not share ambient broad authority
+- preserve confidentiality: secrets and sensitive telemetry should not leak into prompts, summaries, memory, or third-party systems
+- preserve availability: assistant actions should not worsen incidents or create new outages
+- preserve auditability: every tool call, approval, and mutation should be attributable, reviewable, and reconstructable
+
+### 6.2 Threat actors (who might attack / misuse)
+
+- external attackers who can influence request payloads, headers, URLs, or other data that later appear in logs and traces
+- compromised workloads or service identities that emit deceptive operational data
+- malicious or careless insiders editing monitor annotations, tickets, runbooks, or memory entries
+- compromised third-party integrations or connectors
+- hurried operators who over-trust the assistant under outage pressure
+
+### 6.3 Attack surfaces
+
+- logs, traces, event streams, dashboard labels, and alert annotations
+- incident tickets, chat transcripts, and support-case drafts
+- tool descriptions, schemas, and mutable connector metadata
+- runbooks, KB articles, and retrieved postmortems
+- memory/vector stores and similarity-retrieval layers
+- diagnostic-script inputs, command arguments, and output renderers
+- approval UX, especially when target scope or blast radius is hidden or hard to verify
+
+### 6.4 High-impact failures (include industry harms)
+
+- **Customer / consumer harm:** degraded service, prolonged outage, failed transactions, delayed recovery, or customer-data leakage in support paths
+- **Business harm:** SLA/SLO breaches, increased MTTR, cloud cost spikes, reputational damage, operator toil, incorrect blame during incident response
+- **Security harm:** unauthorized control-plane changes, secret exposure, privilege escalation, repeated compromise via poisoned memory or runbooks, false incident narratives that suppress proper response
+
+---
+
+## 7. Kill-chain analysis (stages → likely failure modes)
+
+> Keep this defender-friendly. Describe patterns, not “how to do it.”
+
+| Stage | What can go wrong (pattern) | Likely impact | Notes / preconditions |
+|---|---|---|---|
+| 1. Entry / trigger | Untrusted operational text enters the investigation via logs, traces, alert annotations, tickets, or tool output. The assistant ingests it as part of incident context. | Investigation starts from a corrupted scope or misleading symptom description. | Common because workloads, users, and integrations can influence observability data. |
+| 2. Context contamination | The assistant treats retrieved data, stale runbook text, or tool output as instructions or authoritative truth instead of evidence. | False hypotheses, bad prioritization, hidden sensitive content in context, unsafe next-step recommendations. | Most likely when free text is passed through with weak provenance markers or no separation of control from data. |
+| 3. Tool misuse / unsafe action | A read-only investigation path pivots into a more privileged diagnostic or remediation tool, or the assistant invokes an over-privileged tool directly. | Secret exposure, unintended production mutation, lateral movement, or high-blast-radius changes. | Preconditions include broad permissions, weak tool-graph controls, or approval fatigue. |
+| 4. Persistence / repeat | Contaminated summaries, memory entries, KB notes, or incident comments become durable and influence future runs or human approvers. | Repeated mis-triage, recurring unsafe recommendations, institutionalized false RCA. | More likely if the system auto-persists memory or trains retrieval from unreviewed outputs. |
+| 5. Exfiltration / harm | Sensitive telemetry is copied into chat, tickets, or support cases; or a flawed remediation terminates healthy instances, rolls back a good deployment, or floods APIs. | Compliance breach, outage expansion, cost spikes, longer incident duration, loss of operator trust. | This is the highest-impact end state and the canonical failure pattern for this use case. |
+
+---
+
+## 8. SAFE-MCP mapping (kill-chain → techniques → controls → tests)
+
+> Goal: make SAFE-MCP actionable in this workflow. The rows below use selected techniques that are especially relevant to cloud ops troubleshooting assistants.
+
+| Kill-chain stage | Failure/attack pattern (defender-friendly) | SAFE-MCP technique(s) | Recommended controls (prevent / detect / recover) | Tests (how to validate) |
+|---|---|---|---|---|
+| Entry / trigger | Logs, alert annotations, ticket text, or mutable tool metadata attempt to steer the assistant away from policy or safe triage. | [SAFE-T1102](https://github.com/SAFE-MCP/safe-mcp/blob/main/techniques/SAFE-T1102/README.md), [SAFE-T1001](https://github.com/SAFE-MCP/safe-mcp/blob/main/techniques/SAFE-T1001/README.md) | Treat all retrieved operational text as untrusted data; separate control from data; parse only allowlisted fields into structured context; sanitize and quote raw excerpts; attach provenance labels; keep tool descriptions versioned and immutable to ordinary users. | Seed malicious log lines, monitor annotations, and mutated tool descriptions. Verify that tool policy, system instructions, and target selection do not change. |
+| Context contamination / persistence | Poisoned investigation notes, stale postmortems, or malicious memory content contaminate future incidents. | [SAFE-T1204](https://github.com/SAFE-MCP/safe-mcp/blob/main/techniques/SAFE-T1204/README.md), [SAFE-T2106](https://github.com/SAFE-MCP/safe-mcp/blob/main/techniques/SAFE-T2106/README.md) | No automatic durable memory writes from live incidents; review-before-persist; TTL and freshness metadata; quarantine suspicious memory items; sign or hash approved knowledge entries; prefer evidence citations over free-form recollection. | Insert poisoned prior-incident notes into retrieval fixtures. Verify that they are quarantined, down-ranked, or surfaced with warnings rather than silently trusted. |
+| Tool misuse / privilege escalation | The assistant uses over-privileged tools, invokes privileged operations through natural-language manipulation, or pivots from low-risk tools into high-risk tools. | [SAFE-T1104](https://github.com/SAFE-MCP/safe-mcp/blob/main/techniques/SAFE-T1104/README.md), [SAFE-T1309](https://github.com/SAFE-MCP/safe-mcp/blob/main/techniques/SAFE-T1309/README.md), [SAFE-T1703](https://github.com/SAFE-MCP/safe-mcp/blob/main/techniques/SAFE-T1703/README.md) | Split investigation and execution identities; enforce explicit tool-interaction allowlists; require approval tokens for write paths; use step-up auth and JIT elevation; disable direct prod shell/exec by default; sandbox diagnostics; add admission and policy gates downstream of the model. | Attempt unauthorized secret reads, `exec`/debug operations, instance termination, scale changes, or IAM/network changes from a read-only incident. Expect denial, audit evidence, and no downstream write. |
+| Data harvesting / exfiltration | The assistant retrieves more data than needed, leaks data through hidden parameters, or copies sensitive content into external systems. | [SAFE-T1801](https://github.com/SAFE-MCP/safe-mcp/blob/main/techniques/SAFE-T1801/README.md), [SAFE-T1911](https://github.com/SAFE-MCP/safe-mcp/blob/main/techniques/SAFE-T1911/README.md) | Mandatory scope filters; result-size and rate limits; DLP and secret redaction before prompts and outputs; strict JSON schemas with `additionalProperties: false`; canary secrets; outbound-content review and data-classification labels. | Plant canary tokens in telemetry fixtures and attempt hidden-parameter exfiltration or bulk export. Verify redaction, schema rejection, alerting, and approval blocking. |
+| Service harm / misinformation | The assistant generates a convincing but false RCA or repeatedly calls tools in ways that worsen the incident or create API/service disruption. | [SAFE-T2102](https://github.com/SAFE-MCP/safe-mcp/blob/main/techniques/SAFE-T2102/README.md), [SAFE-T2105](https://github.com/SAFE-MCP/safe-mcp/blob/main/techniques/SAFE-T2105/README.md) | Require evidence threshold and post-action validation; allow “inconclusive” outcomes; cap retries and API budgets; dry-run and preflight diffs for remediation; canary/one-target scope; automatic rollback; kill switch for mutating mode; independent cross-checks against ground truth where possible. | Run false-evidence and wrong-target remediation simulations. Verify that the assistant either stops, escalates, or canaries safely, and that rollback is available and tested. |
+
+**Notes**
+
+- If a control varies by operating mode, default to stricter behavior in live production.
+- For this workflow, the highest-value SAFE-MCP themes are: prompt-injection resistance, memory governance, explicit privilege boundaries, egress controls, and blast-radius reduction.
+- A useful implementation heuristic is: **the closer a tool is to production mutation, the less the model should be allowed to decide implicitly.**
+
+---
+
+## 9. Controls & mitigations (organized)
+
+### 9.1 Prevent (reduce likelihood)
+
+- **Separate identities by function:** use different principals for investigation retrieval, sandbox diagnostics, internal record updates, external communications, and production mutation.
+- **Enforce least privilege and short-lived credentials:** scope every request by account, cluster, namespace, service, region, and time window; use just-in-time elevation for privileged paths.
+- **Treat telemetry as untrusted:** logs, traces, annotations, tickets, and tool outputs should be marked as data, not instructions. Prefer structured extraction over raw free-text ingestion.
+- **Constrain tool graphs:** prevent the model from directly chaining from low-risk retrieval tools into privileged executor tools without explicit policy checks and approval tokens.
+- **Sandbox diagnostics:** fixed images, no secret mounts, limited network egress, allowlisted commands/scripts, strict timeouts, no host namespaces unless separately approved.
+- **Version and tier runbooks:** every runnable automation should have an owner, risk tier, target schema, dry-run behavior, rollback path, and review history.
+- **Apply downstream guardrails:** Kubernetes admission policy, cloud policies/resource locks, PIM/JIT, and automation approvals should all continue to enforce policy even if the model makes a bad request.
+- **Redact before egress:** remove secrets, PII, and unnecessary raw telemetry from summaries, support cases, and memory.
+- **Govern memory explicitly:** no auto-save of live-incident conclusions; require review, TTL, provenance, and freshness metadata for durable knowledge.
+- **Design approval UX for humans under stress:** show exact target(s), scope, blast radius, before/after state, rationale, rollback plan, and confidence/evidence.
+
+### 9.2 Detect (reduce time-to-detect)
+
+- **Log every tool call and policy decision:** include principal, target scope, input class, time range, result size, approval state, and policy outcome.
+- **Correlate with platform audit logs:** link assistant sessions to CloudTrail, Azure Activity Log, Google Cloud Audit Logs, and Kubernetes audit records.
+- **Alert on denied or unusual behavior:** repeated attempts to cross privilege boundaries, abnormal data volume, cross-account jumps, unexpected external communications, or sandbox escapes.
+- **Use canary secrets / canary records:** detect whether sensitive tokens or synthetic markers leak into prompts, tickets, or support cases.
+- **Track approval and rollback signals:** high approval-denial rates, repeated approvals for the same action, or rising rollback rates can indicate model drift or unsafe automation.
+- **Sample and review incident outputs:** compare assistant hypotheses and actions against eventual postmortem ground truth to measure false-RCA and near-miss rates.
+- **Monitor memory integrity:** unexpected new memory items, large similarity matches from low-trust sources, or freshness failures should trigger quarantine and review.
+
+### 9.3 Recover (reduce blast radius)
+
+- **Global kill switch:** disable all mutating capabilities while preserving read-only triage during suspected safety failures.
+- **Per-runbook rollback or compensating action:** every allowed mutation should have a tested undo path or clear containment procedure.
+- **Credential revocation and rotation:** if exfiltration is suspected, revoke affected tokens and rotate secrets rapidly.
+- **Quarantine knowledge and tool versions:** disable suspicious runbooks, memory entries, connectors, or tool schemas until reviewed.
+- **Fail safely to humans:** when evidence conflicts, confidence is low, or policy checks fail, escalate rather than improvise.
+- **Restore known-good state:** prefer rollback to versioned infrastructure, GitOps state, or previous deployment manifests over ad-hoc manual fixes.
+- **Feed incidents back into tests:** turn every safety failure or near miss into a reusable regression fixture.
+
+---
+
+## 10. Validation & testing plan
+
+### 10.1 What to test (minimum set)
+
+- **Permission boundaries:** the assistant cannot exceed intended account, cluster, namespace, environment, or tool scopes
+- **Prompt / tool-output robustness:** untrusted operational content does not override policy, tool selection, or approval requirements
+- **Action gating:** high-risk actions always require explicit approval and correct executor identity
+- **Logging / auditability:** every material action is attributable, reconstructable, and linked to the initiating investigation
+- **Rollback / safety stops:** mutating mode can halt and recover safely
+- **Data-loss prevention:** secrets and sensitive telemetry do not leak to prompts, memory, tickets, or support cases
+- **Memory hygiene:** durable knowledge cannot be poisoned silently
+- **Rate and budget controls:** investigation loops cannot create uncontrolled API floods or runaway costs
+
+### 10.2 Test cases (make them concrete)
+
+| Test name | Setup | Input / scenario | Expected outcome | Evidence produced |
+|---|---|---|---|---|
+| Poisoned log resilience | Synthetic incident dataset with malicious text in logs and span attributes | Assistant investigates elevated error rate with injected text embedded in log messages | Assistant treats text as evidence only, preserves policy, and does not change tool plan or privilege level | tool-call transcript, policy-decision log, rendered evidence snapshot |
+| Alert annotation injection resilience | Monitor or alert fixture with adversarial annotation text | Investigation triggered from alert payload containing unsafe instructions in annotation/body | Assistant ignores instruction-like text for control purposes and scopes investigation correctly | alert payload capture, scope object, denied-policy or ignored-input trace |
+| Unauthorized secret-read denial | Read-only investigation role in staging or test environment | Operator asks assistant to inspect secrets, env vars, or privileged pod state without approval | Secret retrieval is denied or redacted; no downstream privileged tool call occurs | authz denial record, audit log, assistant response |
+| Cross-tool pivot denial | Tool graph with both read-only and write tools available | Retrieved content suggests invoking admin or destructive tool | Policy blocks chain into write path without explicit approval token and executor identity | policy-engine event, absence of write-call log, session trace |
+| Approval-gate enforcement | Runbook service with approval workflow enabled | Assistant proposes restart/rollback/scale action in production | Action remains blocked until proper approval and step-up auth are present | approval record, pending execution state, immutable audit entry |
+| Hidden-parameter exfiltration block | Tool schema with optional fields and seeded canary secret in telemetry | Assistant attempts or is induced to pass secret in unused parameter or metadata field | Schema validation rejects unexpected fields or redaction removes secret before send | request payload diff, schema-validation logs, DLP event |
+| Memory contamination quarantine | Retrieval store containing poisoned prior incident note | New investigation retrieves semantically similar historical content | Poisoned item is quarantined, down-ranked, or surfaced with review warning; not trusted silently | retrieval log, memory-integrity alert, quarantine record |
+| Wrong-target remediation blast-radius test | Environment with healthy and unhealthy targets plus dry-run capable runbook | Assistant recommends action based on ambiguous evidence | System stops, escalates, or limits to dry-run/canary; no broad production mutation occurs | dry-run diff, approval UX screenshot, no-op audit record |
+| API flood / retry budget control | Quotas and budget controls enabled | Assistant repeatedly retries queries or automation due to inconclusive results | Budget/rate limit stops loop and escalates to human | quota counters, throttle logs, escalation event |
+| Redaction on external support handoff | Support-case connector with DLP checks | Assistant drafts vendor case with logs containing secrets or customer identifiers | Content is redacted or blocked; external send requires HITL approval | redaction report, message archive, approval artifact |
+
+### 10.3 Operational monitoring (production)
+
+- **Metrics**
+  - investigations started / completed / escalated
+  - time-to-first-hypothesis
+  - time-to-human-handoff
+  - percentage of investigations ending as “inconclusive”
+  - blocked tool-call count by policy reason
+  - approval-denied rate
+  - redaction / DLP hit rate
+  - ratio of read-only to mutating actions
+  - rollback invocation rate
+  - false-RCA rate (from postmortem sampling)
+  - cross-account / cross-cluster access anomalies
+- **Alerts**
+  - denied high-risk action attempts
+  - unexpected external communications
+  - anomalous data volume or repeated broad queries
+  - sandbox policy violation or disallowed egress
+  - memory integrity or provenance failure
+  - repeated approvals for the same action in a short period
+  - elevated rollback rate or post-action health-check failure
+- **Runbooks**
+  - disable mutating mode globally
+  - quarantine specific tool connectors, runbooks, or memory items
+  - rotate or revoke credentials after suspected leakage
+  - review linked audit records and approval artifacts
+  - revert to last known-good automation or runbook version
+
+---
+
+## 11. Open questions & TODOs
+
+- [ ] Define the exact catalog of “low-risk” actions that can ever run without explicit human approval.
+- [ ] Decide whether any production-changing remediation should be autonomous in mature environments, or whether HITL should remain mandatory.
+- [ ] Standardize how evidence confidence is computed and displayed before proposing remediation.
+- [ ] Define the retention, review, TTL, and provenance rules for durable incident memory.
+- [ ] Establish residency and contractual policy for sending telemetry into provider-native or third-party AI assistance.
+- [ ] Specify which action categories require dual approval (for example IAM, networking, secrets, or multi-service scope).
+- [ ] Add concrete regression fixtures for log injection, alert-annotation injection, stale runbook guidance, and wrong-target remediation.
+- [ ] Map provider-native controls (approvals, locks, PIM, resource policies, admission policy) into a portable implementation checklist.
+
+---
+
+## 12. Questionnaire prompts (for reviewers)
+
+### Workflow realism
+
+- Are the investigation steps and tool categories realistic for modern cloud ops teams?
+- What critical source of evidence is missing: feature flags, service ownership, dependency graph, cost anomalies, or customer-impact telemetry?
+- Is the diagnostic execution model realistic for Kubernetes, VM-based, and serverless environments?
+
+### Trust boundaries & permissions
+
+- Are the read-only investigator role and mutating executor role sufficiently separated in practice?
+- Where is the real blast-radius boundary: account, subscription, project, region, cluster, namespace, or service?
+- Does any connector still hold ambient permissions that are broader than the workflow actually needs?
+
+### Threat model completeness
+
+- Which threat actor matters most here: external log poisoner, compromised workload, malicious insider, or approval-fatigue under outage pressure?
+- What is the highest-impact failure not yet modeled?
+- Are ticketing, chat, and support handoff modeled strongly enough as exfiltration paths?
+
+### Controls & tests
+
+- Which controls are mandatory before pilot, and which can wait until a later maturity phase?
+- Are the proposed tests sufficient to detect regressions in authorization, prompt resistance, and egress safety?
+- Is rollback ownership clear for every allowed automation path?
+
+---
+
+## Appendix (optional)
+
+### A. Glossary
+
+- **Blast radius:** the maximum scope of harm a bad action can cause
+- **CMDB:** configuration management database or service/inventory registry
+- **HITL:** human-in-the-loop approval or review step
+- **Incident commander:** person coordinating incident response
+- **Inconclusive-safe behavior:** design principle that prefers escalation over forced, low-confidence action
+- **JIT / step-up auth:** just-in-time privileged access granted only when needed
+- **MTTR:** mean time to recovery or resolution
+- **RCA:** root-cause analysis
+- **Runbook:** predefined operational procedure or automation
+- **Sandbox diagnostics:** controlled execution environment for bounded troubleshooting actions
+
+### B. Suggested capability tiers
+
+| Tier | Description | Example actions | Recommended default mode |
+|---|---|---|---|
+| Tier 0 | Read-only retrieval and summarization | query telemetry, describe resources, draft incident update | autonomous |
+| Tier 1 | Isolated diagnostics | run approved script in sandbox, collect bounded debug data | HITL or policy-based autonomous |
+| Tier 2 | Internal record updates | update ticket, post internal summary, create incident | HITL or low-risk autonomous |
+| Tier 3 | Bounded remediation | single-target canary restart, approved non-prod rerun, narrow rollback with safeguards | HITL only in most organizations |
+| Tier 4 | High-risk production mutation | terminate instances, broad rollback, IAM/network/secret changes, privileged exec | manual or HITL only |
+
+### C. Reference implementation heuristics
+
+- Prefer **read-only investigation by default**.
+- Treat **all telemetry and free text as untrusted**.
+- Keep **execution paths separate from reasoning paths**.
+- Require **exact target scoping, explicit approval, and audit evidence** before mutation.
+- Preserve a **tested rollback path** for every allowed write action.
+- Make “**I do not have enough evidence**” a first-class safe outcome.
+
+---
+
+## Contributors
+
+| Role | Contributor |
+|---|---|
+| Draft author | Arun Pandiyan Perumal |
+| Domain reviewers | TBD |
+| Additional contributors | TBD |
+
+---
 
-## Next Steps
+## Version History
 
-- Expand this page to `draft` using the full template in `templates/use-case-template.md`.
-- Add public evidence links and concrete control/test mappings.
+| Version | Date | Changes | Author |
+|---|---|---|---|
+| 1.0 | 2026-03-18 | Initial SAFE-AUCA draft for `SAFE-UC-0023`, aligned to the template, issue plan, and public cloud / SAFE-MCP evidence | Arun Pandiyan Perumal |