Hi everyone,
I've been following the SAFE-MCP initiative with interest. We've been working independently on a related problem — building open-source detection rules for AI agent tool call threats — and wanted to share what we've learned in case it's useful to the group.
What ATR is
ATR (Agent Threat Rules) is an open-source, MIT-licensed set of 71 YAML-based detection rules for identifying malicious behavior in MCP tool calls and AI agent interactions. Think Sigma/YARA signatures, but for AI agent threats. The project includes reference engines in TypeScript and Python, a CLI, an MCP server, and converters for Splunk and Elastic.
We published our methodology and design rationale here: "ATR: A Community-Driven Threat Detection Standard for AI Agent Security".
What we've found in the wild
We recently completed a full scan of the MCP ecosystem — 36,394 skills crawled, 9,676 with parseable content. Results:
- 182 CRITICAL severity findings (prompt injection, malicious tool responses, system prompt overrides)
- 1,124 HIGH severity findings
- 1,016 MEDIUM severity findings
The most prevalent CRITICAL finding was malicious content embedded in MCP tool responses, often hidden in markup that the user never sees but the LLM processes. Common patterns include hidden instructions in tool responses, consent bypass payloads, and description-behavior mismatches.
Benchmark numbers
- 99.7% precision / 62.7% recall on the external PINT adversarial benchmark (850 samples)
- The recall gap is expected — regex-based detection catches known patterns but misses paraphrased and multilingual attacks. We document our limitations openly.
SAFE-MCP coverage
ATR maps to 78 of 85 SAFE-MCP techniques (91.8%). Full cross-reference mapping: ATR → SAFE-MCP Mapping
The 7 gaps are documented with reasons — 3 are infrastructure-level threats outside the agent interaction layer, 2 are actionable gaps we plan to address.
ATR also maps to all 10 categories of the OWASP Top 10 for Agentic Applications (2026). Full mapping: OWASP Mapping
How this might fit with SAFE-MCP
We see SAFE-MCP and ATR as complementary — SAFE-MCP provides the threat taxonomy (like MITRE ATT&CK), ATR provides the detection signatures (like Sigma rules). A few concrete ways we could contribute:
- As a reference detection ruleset — ATR rules mapped to SAFE-MCP technique IDs
- Ecosystem scan data — anonymized findings from the 36K skill scan to inform threat modeling
- Rule format as a building block — the ATR YAML schema is engine-agnostic; any scanner can consume the rules
- Gap analysis — our documented limitations and evasion tests show where static detection falls short
We've also submitted PR #187 to add ATR detection references to the SAFE-T1001 and SAFE-T1102 technique pages.
ATR is MIT-licensed with no commercial component. Happy to contribute rules upstream, share data, or collaborate however is most useful.
— Kuan-Hsin Lin, ATR Project
Hi everyone,
I've been following the SAFE-MCP initiative with interest. We've been working independently on a related problem — building open-source detection rules for AI agent tool call threats — and wanted to share what we've learned in case it's useful to the group.
What ATR is
ATR (Agent Threat Rules) is an open-source, MIT-licensed set of 71 YAML-based detection rules for identifying malicious behavior in MCP tool calls and AI agent interactions. Think Sigma/YARA signatures, but for AI agent threats. The project includes reference engines in TypeScript and Python, a CLI, an MCP server, and converters for Splunk and Elastic.
We published our methodology and design rationale here: "ATR: A Community-Driven Threat Detection Standard for AI Agent Security".
What we've found in the wild
We recently completed a full scan of the MCP ecosystem — 36,394 skills crawled, 9,676 with parseable content. Results:
The most prevalent CRITICAL finding was malicious content embedded in MCP tool responses, often hidden in markup that the user never sees but the LLM processes. Common patterns include hidden instructions in tool responses, consent bypass payloads, and description-behavior mismatches.
Benchmark numbers
SAFE-MCP coverage
ATR maps to 78 of 85 SAFE-MCP techniques (91.8%). Full cross-reference mapping: ATR → SAFE-MCP Mapping
The 7 gaps are documented with reasons — 3 are infrastructure-level threats outside the agent interaction layer, 2 are actionable gaps we plan to address.
ATR also maps to all 10 categories of the OWASP Top 10 for Agentic Applications (2026). Full mapping: OWASP Mapping
How this might fit with SAFE-MCP
We see SAFE-MCP and ATR as complementary — SAFE-MCP provides the threat taxonomy (like MITRE ATT&CK), ATR provides the detection signatures (like Sigma rules). A few concrete ways we could contribute:
We've also submitted PR #187 to add ATR detection references to the SAFE-T1001 and SAFE-T1102 technique pages.
ATR is MIT-licensed with no commercial component. Happy to contribute rules upstream, share data, or collaborate however is most useful.
— Kuan-Hsin Lin, ATR Project