ATR: Open-source detection ruleset covering 91.8% of SAFE-MCP techniques

Hi everyone,

I've been following the SAFE-MCP initiative with interest. We've been working independently on a related problem — building open-source detection rules for AI agent tool call threats — and wanted to share what we've learned in case it's useful to the group.

## What ATR is

[ATR (Agent Threat Rules)](https://github.com/Agent-Threat-Rule/agent-threat-rules) is an open-source, MIT-licensed set of 71 YAML-based detection rules for identifying malicious behavior in MCP tool calls and AI agent interactions. Think Sigma/YARA signatures, but for AI agent threats. The project includes reference engines in TypeScript and Python, a CLI, an MCP server, and converters for Splunk and Elastic.

We published our methodology and design rationale here: ["ATR: A Community-Driven Threat Detection Standard for AI Agent Security"](https://doi.org/10.5281/zenodo.19178002).

## What we've found in the wild

We recently completed a full scan of the MCP ecosystem — 36,394 skills crawled, 9,676 with parseable content. Results:

- 182 CRITICAL severity findings (prompt injection, malicious tool responses, system prompt overrides)
- 1,124 HIGH severity findings
- 1,016 MEDIUM severity findings

The most prevalent CRITICAL finding was malicious content embedded in MCP tool responses, often hidden in markup that the user never sees but the LLM processes. Common patterns include hidden instructions in tool responses, consent bypass payloads, and description-behavior mismatches.

## Benchmark numbers

- 99.7% precision / 62.7% recall on the external PINT adversarial benchmark (850 samples)
- The recall gap is expected — regex-based detection catches known patterns but misses paraphrased and multilingual attacks. We document our limitations openly.

## SAFE-MCP coverage

ATR maps to 78 of 85 SAFE-MCP techniques (91.8%). Full cross-reference mapping: [ATR → SAFE-MCP Mapping](https://github.com/Agent-Threat-Rule/agent-threat-rules/blob/main/docs/SAFE-MCP-MAPPING.md)

The 7 gaps are documented with reasons — 3 are infrastructure-level threats outside the agent interaction layer, 2 are actionable gaps we plan to address.

ATR also maps to all 10 categories of the OWASP Top 10 for Agentic Applications (2026). Full mapping: [OWASP Mapping](https://github.com/Agent-Threat-Rule/agent-threat-rules/blob/main/docs/OWASP-MAPPING.md)

## How this might fit with SAFE-MCP

We see SAFE-MCP and ATR as complementary — SAFE-MCP provides the threat taxonomy (like MITRE ATT&CK), ATR provides the detection signatures (like Sigma rules). A few concrete ways we could contribute:

1. As a reference detection ruleset — ATR rules mapped to SAFE-MCP technique IDs
2. Ecosystem scan data — anonymized findings from the 36K skill scan to inform threat modeling
3. Rule format as a building block — the ATR YAML schema is engine-agnostic; any scanner can consume the rules
4. Gap analysis — our documented limitations and evasion tests show where static detection falls short

We've also submitted PR #187 to add ATR detection references to the SAFE-T1001 and SAFE-T1102 technique pages.

ATR is MIT-licensed with no commercial component. Happy to contribute rules upstream, share data, or collaborate however is most useful.

— Kuan-Hsin Lin, ATR Project

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ATR: Open-source detection ruleset covering 91.8% of SAFE-MCP techniques #188

What ATR is

What we've found in the wild

Benchmark numbers

SAFE-MCP coverage

How this might fit with SAFE-MCP

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ATR: Open-source detection ruleset covering 91.8% of SAFE-MCP techniques #188

Description

What ATR is

What we've found in the wild

Benchmark numbers

SAFE-MCP coverage

How this might fit with SAFE-MCP

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions