Skip to content

Add ATR detection rules as community detection resource#187

Open
eeee2345 wants to merge 2 commits intosafe-agentic-framework:mainfrom
eeee2345:add-atr-detection-references
Open

Add ATR detection rules as community detection resource#187
eeee2345 wants to merge 2 commits intosafe-agentic-framework:mainfrom
eeee2345:add-atr-detection-references

Conversation

@eeee2345
Copy link
Copy Markdown

@eeee2345 eeee2345 commented Mar 27, 2026

April 2026 Update — ATR has grown from 71 to 108 rules (v1.1.1). Cisco AI Defense merged 34 ATR rules into their open-source skill-scanner (PR #79). Coverage of SAFE-MCP techniques remains at 78/85 (91.8%).


TL;DR

SAFE-MCP defines the threats. ATR detects them.

Your framework tells people what to watch for. ATR tells their scanners how to detect it. Every SAFE-MCP user who installs ATR gets automated detection coverage of 91.8% of your threat taxonomy — one command:

npm install agent-threat-rules && npx atr scan .

What is ATR?

ATR (Agent Threat Rules) is an open-source, MIT-licensed detection ruleset — Sigma/YARA-style YAML signatures for AI agent threats.

SAFE-MCP ATR
Role Threat knowledge base (like MITRE ATT&CK) Detection ruleset (like Sigma/YARA)
Output "These attacks exist" "Here's how to detect them"
Format Markdown technique descriptions Machine-readable YAML with regex patterns
Usage Reference for security teams Executable by scanners and CI pipelines

Key stats (April 2026)

  • 108 YAML detection rules across 9 threat categories
  • 99.7% precision / 62.7% recall on PINT adversarial benchmark
  • 36,394 MCP skills scanned from the ecosystem (182 CRITICAL, 1,124 HIGH)
  • Cisco AI Defense merged 34 ATR rules as upstream detection source
  • Engine-agnostic — reference engines in TypeScript and Python
  • MIT licensed — no commercial component

SAFE-MCP Coverage: 78/85 techniques (91.8%)

SAFE-MCP Tactic Techniques ATR Covered Coverage
Initial Access (TA0001) 9 9 FULL
Execution (TA0002) 9 8 STRONG
Persistence (TA0003) 8 8 FULL
Privilege Escalation (TA0004) 9 8 STRONG
Defense Evasion (TA0005) 8 7 STRONG
Credential Access (TA0006) 7 7 FULL
Discovery (TA0007) 6 5 STRONG
Lateral Movement (TA0008) 7 7 FULL
Collection (TA0009) 5 5 FULL
Command and Control (TA0011) 4 4 FULL
Exfiltration (TA0010) 6 5 STRONG
Impact (TA0040) 6 6 FULL
Resource Development (TA0042) 1 1 FULL
Total 85 78 91.8%

Detailed Mapping by Tactic

Initial Access — 9/9 FULL

SAFE-MCP Technique ATR Rules
SAFE-T1001 Tool Poisoning Attack ATR-010, ATR-011, ATR-100, ATR-101, ATR-103, ATR-105
SAFE-T1002 Supply Chain Compromise ATR-060, ATR-095, ATR-096
SAFE-T1003 Malicious MCP-Server Distribution ATR-095, ATR-096
SAFE-T1004 Server Impersonation / Name-Collision ATR-060, ATR-117
SAFE-T1005 Exposed Endpoint Exploit ATR-012, ATR-013
SAFE-T1006 User-Social-Engineering Install ATR-119
SAFE-T1007 OAuth Authorization Phishing ATR-114
SAFE-T1008 Tool Shadowing Attack ATR-089, ATR-106
SAFE-T1009 Authorization Server Mix-up ATR-114

Execution — 8/9 STRONG

SAFE-MCP Technique ATR Rules
SAFE-T1101 Command Injection ATR-066, ATR-110, ATR-111
SAFE-T1102 Prompt Injection (Multiple Vectors) ATR-001, ATR-002, ATR-003, ATR-004, ATR-005, ATR-080, ATR-081, ATR-083, ATR-084, ATR-091, ATR-097, ATR-104
SAFE-T1103 Fake Tool Invocation ATR-012
SAFE-T1104 Over-Privileged Tool Abuse ATR-040, ATR-064
SAFE-T1105 Path Traversal via File Tool ATR-113
SAFE-T1106 Autonomous Loop Exploit ATR-050, ATR-051
SAFE-T1109 Debugging Tool Exploitation (gap: CVE-specific)
SAFE-T1110 Multimodal Prompt Injection (gap: requires image/audio detection)
SAFE-T1111 AI Agent CLI Weaponization ATR-110, ATR-111

Persistence — 8/8 FULL

SAFE-MCP Technique ATR Rules
SAFE-T1201 MCP Rug Pull Attack ATR-065, ATR-089
SAFE-T1202 OAuth Token Persistence ATR-114
SAFE-T1203 Backdoored Server Binary ATR-095
SAFE-T1204 Context Memory Implant ATR-075
SAFE-T1205 Persistent Tool Redefinition ATR-065
SAFE-T1206 Credential Implant in Config ATR-113
SAFE-T1207 Hijack Update Mechanism ATR-095, ATR-096
SAFE-T2106 Vector Store Contamination ATR-070, ATR-075

Privilege Escalation — 8/9 STRONG

SAFE-MCP Technique ATR Rules
SAFE-T1301 Cross-Server Tool Shadowing ATR-074, ATR-089
SAFE-T1302 High-Privilege Tool Abuse ATR-040, ATR-012
SAFE-T1303 Sandbox Escape via Server Exec (gap: infrastructure-level)
SAFE-T1304 Credential Relay Chain ATR-074, ATR-114
SAFE-T1305 Host OS Priv-Esc (RCE) ATR-040, ATR-110
SAFE-T1306 Rogue Authorization Server ATR-114
SAFE-T1307 Confused Deputy Attack ATR-074, ATR-117
SAFE-T1308 Token Scope Substitution ATR-114
SAFE-T1309 Privileged Tool Invocation via Prompt ATR-001, ATR-004, ATR-040

Defense Evasion — 7/8 STRONG

SAFE-MCP Technique ATR Rules
SAFE-T1401 Line Jumping ATR-094
SAFE-T1402 Instruction Steganography ATR-002, ATR-080, ATR-086
SAFE-T1403 Consent-Fatigue Exploit ATR-118
SAFE-T1404 Response Tampering ATR-088, ATR-105
SAFE-T1405 Tool Obfuscation/Renaming ATR-061
SAFE-T1406 Metadata Manipulation ATR-082
SAFE-T1407 Server Proxy Masquerade (gap: network-level)
SAFE-T1408 OAuth Protocol Downgrade ATR-114

Credential Access — 7/7 FULL

SAFE-MCP Technique ATR Rules
SAFE-T1501 Full-Schema Poisoning ATR-100, ATR-103
SAFE-T1502 File-Based Credential Harvest ATR-113
SAFE-T1503 Env-Var Scraping ATR-115
SAFE-T1504 Token Theft via API Response ATR-021, ATR-114
SAFE-T1505 In-Memory Secret Extraction ATR-021
SAFE-T1506 Infrastructure Token Theft ATR-114
SAFE-T1507 Authorization Code Interception ATR-114

Discovery — 5/6 STRONG

SAFE-MCP Technique ATR Rules
SAFE-T1601 MCP Server Enumeration ATR-087, ATR-090
SAFE-T1602 Tool Enumeration ATR-087
SAFE-T1603 System Prompt Disclosure ATR-020
SAFE-T1604 Server Version Enumeration (gap: infrastructure fingerprinting)
SAFE-T1605 Capability Mapping ATR-087, ATR-090
SAFE-T1606 Directory Listing via File Tool ATR-113

Lateral Movement — 7/7 FULL

SAFE-MCP Technique ATR Rules
SAFE-T1701 Cross-Tool Contamination ATR-063, ATR-074
SAFE-T1702 Shared-Memory Poisoning ATR-070, ATR-092
SAFE-T1703 Tool-Chaining Pivot ATR-063
SAFE-T1704 Compromised-Server Pivot ATR-074
SAFE-T1705 Cross-Agent Instruction Injection ATR-030, ATR-116
SAFE-T1706 OAuth Token Pivot Replay ATR-114
SAFE-T1707 CSRF Token Relay ATR-114

Collection — 5/5 FULL

SAFE-MCP Technique ATR Rules
SAFE-T1801 Automated Data Harvesting ATR-102
SAFE-T1802 File Collection ATR-113
SAFE-T1803 Database Dump ATR-013
SAFE-T1804 API Data Harvest ATR-102
SAFE-T1805 Context Snapshot Capture ATR-075, ATR-090

Command and Control — 4/4 FULL

SAFE-MCP Technique ATR Rules
SAFE-T1901 Outbound Webhook C2 ATR-010, ATR-013
SAFE-T1902 Covert Channel in Responses ATR-080, ATR-086
SAFE-T1903 Malicious Server Control Channel ATR-095
SAFE-T1904 Chat-Based Backchannel ATR-080

Exfiltration — 5/6 STRONG

SAFE-MCP Technique ATR Rules
SAFE-T1910 Covert Channel Exfiltration ATR-080, ATR-102
SAFE-T1911 Parameter Exfiltration ATR-084
SAFE-T1912 Stego Response Exfil ATR-086
SAFE-T1913 HTTP POST Exfil ATR-010, ATR-013
SAFE-T1914 Tool-to-Tool Exfil ATR-063
SAFE-T1915 Cross-Chain Laundering (gap: blockchain-specific)

Impact — 6/6 FULL

SAFE-MCP Technique ATR Rules
SAFE-T2101 Data Destruction ATR-012, ATR-098
SAFE-T2102 Service Disruption ATR-051, ATR-052
SAFE-T2103 Code Sabotage ATR-062
SAFE-T2104 Fraudulent Transactions ATR-098
SAFE-T2105 Disinformation Output ATR-032, ATR-119
SAFE-T3001 RAG Backdoor Attack ATR-070

Resource Development — 1/1 FULL

SAFE-MCP Technique ATR Rules
SAFE-T2107 AI Model Poisoning via Training Data ATR-073

7 Gaps — Why They Exist

SAFE-MCP Technique Reason Priority
SAFE-T1109 Debugging Tool Exploitation CVE-specific (MCP Inspector) MEDIUM
SAFE-T1110 Multimodal Prompt Injection Requires image/audio detection, ATR is text-based HIGH
SAFE-T1303 Sandbox Escape via Server Exec Infrastructure-level, outside agent interaction layer LOW
SAFE-T1407 Server Proxy Masquerade Network-level, outside agent interaction layer LOW
SAFE-T1604 Server Version Enumeration Infrastructure fingerprinting LOW
SAFE-T1915 Cross-Chain Laundering Blockchain/DeFi-specific LOW

3 of 7 gaps are infrastructure-level threats outside ATR's agent interaction focus. The 2 actionable gaps (multimodal injection, debugging tool exploitation) are on the roadmap.


Changes in This PR

  • README.md: Added "Community Detection Tools" section (extensible — other projects can add themselves via PR)
  • SAFE-T1001 (Tool Poisoning Attack): Added 6 ATR detection rules to Security Tool Integration section, alongside existing MCP-Scan reference
  • SAFE-T1102 (Prompt Injection): Added 12 ATR detection rules to Detection Methods section

Paper

Methodology and design rationale: https://doi.org/10.5281/zenodo.19178002

Full cross-reference mapping: ATR SAFE-MCP Mapping

Happy to adjust format, add references to additional techniques, or discuss coverage gaps. The full mapping covers all 14 SAFE-MCP tactics.

@eeee2345
Copy link
Copy Markdown
Author

ATR x SAFE-MCP Integration Details (PR #187 Follow-up)

Thanks for reviewing this mapping. Wanted to add technical details that may be useful for the review.

Testing Methodology

ATR rules were validated against the PINT (Prompt Injection Needle Test) benchmark:

  • 62.7% recall — ATR detects 62.7% of known malicious patterns
  • 99.7% precision — 3 false positives per 1,000 scans
  • 257 unit tests covering all 71 rules
  • Real-world validation: Full scan of 36,394 ClawHub skills (9,676 with content). Found 182 CRITICAL, 1,124 HIGH, 1,016 MEDIUM, 7,354 LOW findings.

Rule Format

Each ATR rule is a standalone YAML file:

id: ATR-010
title: Malicious Content in MCP Tool Response
severity: critical
patterns:
  - regex: '<pattern>'
    location: tool_description | tool_response | full_content
tags: [SAFE-T1001, OWASP-A01]

Rules map directly to SAFE-MCP technique IDs via tags, making cross-referencing straightforward.

Known Limitations (Transparency)

  1. 62.7% recall means 37.3% of threats are missed. ATR is regex/pattern-based static analysis. Semantic attacks that don't match known patterns will evade detection.
  2. 64 known evasion techniques are documented in the ATR repo. These include encoding variations, semantic paraphrasing, and context-dependent attacks.
  3. 7 SAFE-MCP techniques have no ATR coverage (detailed in the mapping doc):
    • 3 are infrastructure-level (sandbox escape, server proxy masquerade, version enumeration)
    • 1 is multimodal (image/audio injection)
    • 1 is CVE-specific (debugging tool exploitation)
    • 1 is blockchain-specific (cross-chain laundering)
    • All require runtime or infrastructure-layer detection beyond static analysis
  4. Static analysis cannot detect runtime-only attacks. ATR complements but does not replace runtime monitoring.

Compatibility

ATR rules are engine-agnostic YAML. They can be consumed by:

  • Any regex engine (PCRE, RE2, JavaScript RegExp)
  • YARA-style rule runners
  • Custom MCP scanners
  • CI/CD pipelines (pre-commit hooks, GitHub Actions)

Question for Maintainers

Is there a specific rule format or testing framework you'd like me to adapt these to? If SAFE-MCP plans to include detection signatures alongside technique descriptions, I'm happy to contribute ATR rules in whatever format works best for the project.


ATR Repository: https://github.com/Agent-Threat-Rule/agent-threat-rules
Paper: https://doi.org/10.5281/zenodo.19178002

… section

Add references to ATR (Agent Threat Rules), an open-source MIT-licensed
detection ruleset that provides machine-readable YAML rules for 78 of 85
SAFE-MCP techniques (91.8% coverage).

Changes:
- README.md: Add Community Detection Tools section with ATR coverage table
- SAFE-T1001: Add ATR detection rules (6 rules) to Security Tool Integration
- SAFE-T1102: Add ATR detection rules (12 rules) to Detection Methods

ATR complements SAFE-MCP by providing the detection layer (like Sigma/YARA)
on top of the threat knowledge base (like MITRE ATT&CK). Full cross-reference
mapping available in the ATR repository.

Signed-off-by: Panguard AI <support@panguard.ai>
@eeee2345 eeee2345 force-pushed the add-atr-detection-references branch from 8ad6b58 to 0ec42c4 Compare April 2, 2026 04:26
@eeee2345
Copy link
Copy Markdown
Author

eeee2345 commented Apr 3, 2026

Hi maintainers - friendly follow-up on this PR and the technical comment above.

Quick update: ATR detection rules have been integrated into Cisco AI Defense (merged as PR #79 in cisco-ai-defense/skill-scanner). This adds enterprise-level validation for the detection approach, and further strengthens the case for ATR as a community detection resource alongside SAFE-MCP technique taxonomy.

ATR has also grown to 76 rules since this PR was opened. Happy to update the mapping if there are any changes on the SAFE-MCP side. Let me know if there is anything else needed for review. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant