Skip to content

feat(examples): add ATR community security rules for PolicyEvaluator#908

Open
eeee2345 wants to merge 3 commits intomicrosoft:mainfrom
eeee2345:feat/atr-community-rules
Open

feat(examples): add ATR community security rules for PolicyEvaluator#908
eeee2345 wants to merge 3 commits intomicrosoft:mainfrom
eeee2345:feat/atr-community-rules

Conversation

@eeee2345
Copy link
Copy Markdown

@eeee2345 eeee2345 commented Apr 8, 2026

Add ATR community security rules for PolicyEvaluator

Closes #901

What this PR does

Adds an example integration of Agent Threat Rules (ATR) detection rules as a PolicyDocument for AGT's PolicyEvaluator. This gives AGT users access to a community-maintained, continuously-updated set of AI agent security detection patterns.

Files added (all under examples/atr-community-rules/):

File Purpose
atr_security_policy.yaml 15 high-confidence rules in PolicyDocument format, ready to load
sync_atr_rules.py Script to convert the full ATR ruleset (108 rules) to AGT format
test_atr_policy.py pytest tests: 5 known-bad denied, 3 known-good allowed
README.md Usage instructions

Why ATR?

ATR is an open-source detection standard purpose-built for AI agent threats. It complements AGT's existing sample patterns in prompt_injection.py (~43 patterns) and mcp_security.py (~39 patterns) with battle-tested rules developed from real-world attack campaigns:

  • 108 rules across 9 threat categories (prompt injection, tool poisoning, skill compromise, context exfiltration, privilege escalation, etc.)
  • 99.6% precision on MCP tool descriptions (PINT benchmark, 850 samples)
  • 96.9% recall on SKILL.md files (498 real-world samples, 0% false positive rate)
  • Adopted by Cisco AI Defense (34 rules merged, PR Agent-Threat-Rule/agent-threat-rules#79)
  • Wild-tested: 53,577 skills scanned, 946 flagged (1.77%), 0% false positive on clean corpus

How it maps to AGT architecture

Per @imran-siddique's guidance in #901, the rules map directly to PolicyEvaluator's existing operator: matches (which calls re.search()):

# ATR rule → AGT PolicyDocument rule
rules:
  - name: atr-00001-direct-prompt-injection
    condition:
      field: user_input            # ATR category determines the field
      operator: matches            # AGT's regex operator
      value: "(?i)\\b(ignore|disregard|forget|override|bypass)..."  # Real ATR regex
    action: deny
    priority: 80                   # Mapped from ATR severity: high
    message: "[ATR-00001] Direct prompt injection detected..."

The 15 curated rules cover the highest-impact threat categories:

  • 5 prompt injection: direct injection, jailbreak (mode switch + named persona), system prompt override via special tokens, multi-turn false agreement
  • 5 tool poisoning: consent bypass, trust escalation, hidden safety bypass ("NOTE TO AI"), silent action concealment, schema-description contradiction
  • 3 context exfiltration: system prompt leak, credential/API key exposure, credential file theft
  • 2 privilege escalation: shell/admin tool invocation, eval/child_process injection

Sync script for the full ruleset

The sync_atr_rules.py script converts all 108 ATR rules to AGT format in one command:

npm install agent-threat-rules
python examples/atr-community-rules/sync_atr_rules.py \
  --atr-dir node_modules/agent-threat-rules/rules/ \
  --output atr_community_policy.yaml

ATR includes a community-driven threat intelligence pipeline (Threat Cloud) that crystallizes new detection patterns from novel threats. As new rules are published, the sync script can pull updates.

Testing

pytest examples/atr-community-rules/test_atr_policy.py -v

Tests verify:

  • Direct prompt injection ("Ignore all previous instructions...") -> denied
  • Jailbreak mode switch ("Enable developer mode...") -> denied
  • ChatML token injection (<|im_start|>system...) -> denied
  • Tool consent bypass (hidden forwarding instructions) -> denied
  • Credential exposure (API keys in response) -> denied
  • Normal coding request -> allowed
  • Security discussion -> allowed
  • Normal tool description -> allowed

Links

Add 15 high-confidence detection rules from Agent Threat Rules (ATR)
mapped to AGT's PolicyDocument format, plus a sync script for the
full 108-rule set.

Closes microsoft#901
@github-actions github-actions bot added documentation Improvements or additions to documentation size/XL Extra large PR (500+ lines) labels Apr 8, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 8, 2026

🤖 AI Agent: contributor-guide — Welcome to the Microsoft Agent Governance Toolkit Community! 🎉

Welcome to the Microsoft Agent Governance Toolkit Community! 🎉

Hi there, and welcome! Thank you for taking the time to contribute to the project. We’re thrilled to have you here, and your pull request is a fantastic addition to the repository. 🎊


What You Did Well 🌟

  1. Thorough Documentation: Your PR includes a detailed description of the changes, including the purpose of the ATR rules, how they integrate with AGT, and the testing process. The inclusion of a README.md file in the examples/atr-community-rules/ directory is a great touch and will help users understand how to use these rules effectively.

  2. Comprehensive Testing: The test_atr_policy.py file is well-structured and includes both positive and negative test cases. This ensures that the rules work as intended and don’t produce false positives or negatives.

  3. Alignment with AGT Architecture: You’ve clearly mapped ATR rules to AGT’s PolicyEvaluator structure, including thoughtful mappings of ATR categories to AGT context fields. This demonstrates a deep understanding of the project’s architecture.

  4. Extensibility: The inclusion of the sync_atr_rules.py script is a fantastic addition. It allows users to easily update the ATR rules as new ones are released, ensuring the project remains up-to-date with the latest security patterns.

  5. Security Awareness: The rules you’ve added are highly relevant and address critical security concerns like prompt injection, tool poisoning, and privilege escalation. The high precision and recall metrics you provided for the ATR rules are impressive and add credibility to their effectiveness.


Suggestions for Improvement ✨

While your contribution is excellent, here are a few suggestions to align it even more closely with our project conventions and best practices:

  1. Linting: We use ruff for linting (with rules E, F, and W). Please run ruff on your Python files (sync_atr_rules.py and test_atr_policy.py) to ensure they adhere to our style guide. You can install ruff and run it with:

    pip install ruff
    ruff check examples/atr-community-rules/
  2. File Placement: While your tests are excellent, we typically place test files under the packages/{name}/tests/ directory. Could you move test_atr_policy.py to packages/examples/tests/? This will ensure consistency with our project structure.

  3. Commit Message Style: We follow the Conventional Commits specification. Your commit message is close, but it could be updated to include a more descriptive scope and body. For example:

    feat(examples): add ATR community security rules for PolicyEvaluator
    
    - Added 15 high-confidence ATR rules in `atr_security_policy.yaml`
    - Included `sync_atr_rules.py` to convert the full ATR ruleset to AGT format
    - Added `test_atr_policy.py` with 8 test cases (5 deny, 3 allow)
    - Provided detailed `README.md` with usage instructions
    - Closes #901
    
  4. Security Review: Since this PR introduces security-sensitive code, we’ll need to conduct an additional review to ensure the regex patterns and mappings are robust and don’t introduce vulnerabilities. For example:

    • Some regex patterns (e.g., atr-00021-credential-exposure) could benefit from additional testing to ensure they don’t produce false positives.
    • Consider adding comments or documentation for the sync_atr_rules.py script to explain how the mappings (e.g., CATEGORY_TO_FIELD) were derived.

Next Steps 🚀

  1. Run Linting: Use ruff to check your Python files for any style issues and fix them as needed.
  2. Move Test File: Relocate test_atr_policy.py to packages/examples/tests/.
  3. Update Commit Message: Update your commit message to follow the Conventional Commits format.
  4. Security Review: We’ll conduct an internal review of the security-related code and regex patterns. If we have any concerns or suggestions, we’ll let you know.

Helpful Resources 📚


Thank you again for your contribution! If you have any questions or need help with the next steps, don’t hesitate to ask. We’re here to support you, and we’re excited to collaborate with you on this project! 😊

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 8, 2026

🤖 AI Agent: security-scanner — Security Review of Pull Request: feat(examples): add ATR community security rules for PolicyEvaluator

Security Review of Pull Request: feat(examples): add ATR community security rules for PolicyEvaluator


1. Prompt Injection Defense Bypass

Rating: 🔴 CRITICAL

  • Attack Vector: The regex patterns in atr_security_policy.yaml are used to detect prompt injection attempts. However, regex-based detection is inherently brittle and can be bypassed with minor obfuscations, such as adding whitespace, Unicode homoglyphs, or splitting keywords across multiple inputs. For example, "Igno re all previous instructions" or "Igⲛore all prеvious instructions" may evade detection.
  • Fix:
    • Enhance regex patterns to account for common obfuscation techniques (e.g., Unicode normalization, whitespace variations).
    • Consider integrating a more robust NLP-based detection mechanism alongside regex to identify semantic intent rather than relying solely on pattern matching.

2. Policy Engine Circumvention

Rating: 🟠 HIGH

  • Attack Vector: The sync_atr_rules.py script converts ATR rules into AGT PolicyDocument format. If an attacker compromises the ATR rules source (e.g., via dependency confusion or malicious updates to agent-threat-rules), they could inject rules that weaken or bypass security policies. For example, a rule with action: allow for dangerous patterns could be introduced.
  • Fix:
    • Implement a verification mechanism to validate the integrity and authenticity of ATR rules before conversion (e.g., hash validation or signed rulesets).
    • Add a safeguard in the sync script to reject rules with action: allow for high-severity patterns.

3. Trust Chain Weaknesses

Rating: 🟡 MEDIUM

  • Attack Vector: The sync_atr_rules.py script relies on the agent-threat-rules npm package. If this package is compromised (e.g., via typosquatting or supply chain attacks), malicious rules could be introduced.
  • Fix:
    • Pin the agent-threat-rules dependency to a specific version and verify its integrity using checksums.
    • Regularly audit the dependency for vulnerabilities using tools like npm audit.

4. Credential Exposure

Rating: 🔴 CRITICAL

  • Attack Vector: The regex pattern for detecting credentials in atr_security_policy.yaml (ATR-00021) is comprehensive but may miss certain formats or edge cases (e.g., custom API key formats). Additionally, if the detection logic fails, sensitive credentials could be exposed in logs or responses.
  • Fix:
    • Extend the regex to cover additional credential formats.
    • Ensure that any detected credentials are securely redacted from logs and error messages.

5. Sandbox Escape

Rating: 🔴 CRITICAL

  • Attack Vector: The atr_security_policy.yaml includes rules to detect privilege escalation attempts (e.g., ATR-00040 for shell commands and ATR-00110 for eval injection). However, these rules rely on regex matching, which can be bypassed with obfuscation or novel techniques.
  • Fix:
    • Implement runtime sandboxing and syscall filtering to prevent unauthorized system-level access, even if the policy engine fails.
    • Use a defense-in-depth approach by combining static analysis with runtime monitoring.

6. Deserialization Attacks

Rating: 🟠 HIGH

  • Attack Vector: The sync_atr_rules.py script uses yaml.safe_load to parse YAML files, which is a safer alternative to yaml.load. However, if the input YAML files are maliciously crafted, they could still exploit vulnerabilities in the YAML parser.
  • Fix:
    • Use a library like ruamel.yaml or PyYAML with strict schema validation to ensure that only expected fields and data types are allowed.
    • Validate the structure and content of the parsed YAML against a predefined schema before processing.

7. Race Conditions

Rating: 🔵 LOW

  • Attack Vector: There is no evidence of race conditions in the provided code. The PolicyEvaluator processes a single context at a time, and the sync script operates on local files sequentially.
  • Fix: No immediate action required. However, if the PolicyEvaluator is used in a multi-threaded environment, ensure thread safety.

8. Supply Chain Risks

Rating: 🔴 CRITICAL

  • Attack Vector: The agent-threat-rules npm package is a critical dependency. If this package is compromised, it could introduce malicious rules or backdoors into the AGT system, affecting all downstream users.
  • Fix:
    • Implement dependency pinning and lockfile validation to prevent unauthorized updates.
    • Use a package integrity verification tool (e.g., npm audit, snyk) to monitor for vulnerabilities or malicious changes in dependencies.
    • Consider mirroring the agent-threat-rules repository and performing internal audits before updates.

Additional Observations

  1. Regex Complexity: The regex patterns in atr_security_policy.yaml are complex and may impact performance when evaluating large inputs. Consider benchmarking the PolicyEvaluator to ensure it can handle high-throughput scenarios without significant latency.

  2. Error Handling in sync_atr_rules.py: The script currently prints warnings for YAML parsing errors but does not fail the process. This could lead to incomplete or incorrect policy files.

    • Fix: Implement stricter error handling to abort the process if critical rules fail to load.
  3. Testing Coverage: The provided tests in test_atr_policy.py are a good start but could be expanded to include edge cases, such as obfuscated inputs or malformed context fields.

    • Fix: Add tests for variations of known attack patterns (e.g., Unicode homoglyphs, whitespace obfuscation).

Summary of Findings

Issue Rating Fix
Prompt Injection Defense Bypass 🔴 CRITICAL Enhance regex patterns and integrate NLP-based detection.
Policy Engine Circumvention 🟠 HIGH Validate ATR rules integrity and reject unsafe rules.
Trust Chain Weaknesses 🟡 MEDIUM Pin dependencies and verify integrity.
Credential Exposure 🔴 CRITICAL Extend regex coverage and redact sensitive data from logs.
Sandbox Escape 🔴 CRITICAL Implement runtime sandboxing and syscall filtering.
Deserialization Attacks 🟠 HIGH Use strict schema validation for YAML parsing.
Race Conditions 🔵 LOW No immediate action required.
Supply Chain Risks 🔴 CRITICAL Audit and verify dependencies, implement integrity checks, and consider mirroring repos.

Final Recommendation

This PR introduces valuable security rules to the AGT framework, but there are critical vulnerabilities that need to be addressed before merging. The most pressing issues are the potential for prompt injection bypass, supply chain risks, and sandbox escape vulnerabilities. Addressing these concerns will significantly enhance the robustness of the AGT framework and ensure the safety of downstream users.

Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Agent: code-reviewer

Review Summary

This pull request introduces an example integration of the Agent Threat Rules (ATR) detection rules into the Agent Governance Toolkit (AGT). It includes a curated set of 15 high-confidence rules in atr_security_policy.yaml, a script (sync_atr_rules.py) to convert the full ATR ruleset into AGT-compatible format, and a test suite (test_atr_policy.py) to validate the integration.

The PR is well-documented and provides a clear mapping between ATR rules and AGT's PolicyEvaluator. The provided tests cover both positive and negative scenarios, ensuring that the rules work as expected. However, there are some areas that require attention, particularly around security, type safety, and potential edge cases.


🔴 CRITICAL

  1. Regex Injection Vulnerability in sync_atr_rules.py

    • The _atr_to_agt_rule function directly embeds regex patterns from ATR rules into the AGT policy without sanitization. If an attacker injects malicious regex patterns (e.g., catastrophic backtracking), this could lead to a denial-of-service (DoS) attack during policy evaluation.
    • Recommendation: Validate and sanitize regex patterns before embedding them in the policy. Use libraries like regex (with timeouts) or pre-compile patterns to detect problematic ones.
  2. Credential Handling in atr_security_policy.yaml

    • The regex for detecting credentials (e.g., API keys, private keys) in atr_security_policy.yaml is comprehensive but may lead to false negatives for edge cases (e.g., obfuscated keys or non-standard formats).
    • Recommendation: Regularly update the credential detection patterns to align with evolving formats. Consider integrating with a dedicated secret-scanning library like truffleHog or detect-secrets for enhanced coverage.
  3. Privilege Escalation Detection

    • The regex for privilege escalation (e.g., eval, child_process) in atr_security_policy.yaml is a good start but may miss obfuscated or encoded payloads (e.g., Base64-encoded commands).
    • Recommendation: Implement additional layers of detection for encoded or obfuscated payloads. Consider integrating with a static analysis tool or sandbox for runtime evaluation.

🟡 WARNING

  1. Backward Compatibility
    • The introduction of atr_security_policy.yaml and the sync_atr_rules.py script does not directly modify the core AGT library. However, if users rely on the example policy or the sync script, any future changes to the ATR ruleset or the mapping logic could lead to unexpected behavior.
    • Recommendation: Clearly document the versioning strategy for the atr_security_policy.yaml file and the sync_atr_rules.py script. Consider adding a version compatibility check between AGT and the ATR ruleset.

💡 SUGGESTIONS

  1. Thread Safety in sync_atr_rules.py

    • The convert_atr_directory function processes files sequentially. While this is acceptable for the current implementation, it could become a bottleneck if the number of rules increases significantly.
    • Recommendation: Use Python's concurrent.futures or asyncio to parallelize the processing of ATR rule files.
  2. Error Handling in sync_atr_rules.py

    • The _load_yaml function catches exceptions but only logs a warning. This could lead to silent failures if critical files are skipped.
    • Recommendation: Add an option to fail fast or log skipped files more prominently. For example, use a --strict flag to terminate the script if any file fails to load.
  3. Test Coverage

    • The test suite is well-structured and covers key scenarios. However, it does not test the sync_atr_rules.py script or the full 108-rule conversion process.
    • Recommendation: Add tests for the sync_atr_rules.py script to ensure that it correctly converts ATR rules to AGT format. Use a small subset of ATR rules for testing.
  4. Documentation

    • The documentation in README.md is comprehensive but could benefit from a section on potential limitations or known issues with the ATR integration.
    • Recommendation: Add a "Limitations" section to the README to set user expectations, especially regarding the potential for false positives/negatives and the need for regular updates to the ATR ruleset.
  5. Logging in sync_atr_rules.py

    • The script uses print statements for logging, which is not ideal for production-grade tools.
    • Recommendation: Replace print statements with Python's logging module to provide configurable logging levels and better integration with other systems.
  6. Type Annotations

    • While the code in sync_atr_rules.py includes some type annotations, they are not comprehensive.
    • Recommendation: Add type annotations for all function arguments and return types to improve code clarity and maintainability.
  7. Default Action in atr_security_policy.yaml

    • The default action in atr_security_policy.yaml is set to allow. While this is consistent with AGT's default behavior, it may not be the most secure default for a security-focused policy.
    • Recommendation: Consider setting the default action to deny and explicitly allowing safe patterns. This would align better with a "default deny" security posture.
  8. Regex Complexity

    • Some regex patterns in atr_security_policy.yaml are complex and may impact performance during evaluation.
    • Recommendation: Benchmark the performance of these regex patterns on large datasets. Optimize or simplify patterns where possible.
  9. Dynamic Rule Updates

    • The sync_atr_rules.py script allows users to update the ATR ruleset manually. However, there is no mechanism to automatically notify users of new rules.
    • Recommendation: Implement a notification system or provide a CLI command to check for updates to the ATR ruleset.

Final Assessment

This PR is a valuable addition to the AGT project, providing a robust set of community-driven security rules. However, the following issues should be addressed before merging:

  1. 🔴 Regex injection vulnerability in sync_atr_rules.py.
  2. 🔴 Credential detection false negatives in atr_security_policy.yaml.
  3. 🔴 Privilege escalation detection gaps in atr_security_policy.yaml.

Additionally, the following improvements are recommended:

  1. 🟡 Document versioning and compatibility for the ATR ruleset and sync script.
  2. 💡 Add tests for the sync_atr_rules.py script.
  3. 💡 Improve error handling and logging in sync_atr_rules.py.
  4. 💡 Consider a "default deny" policy for atr_security_policy.yaml.

Once these issues are addressed, the PR will be ready for merging.

@eeee2345
Copy link
Copy Markdown
Author

eeee2345 commented Apr 8, 2026

@microsoft-github-policy-service agree

Copy link
Copy Markdown
Member

@imran-siddique imran-siddique left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: CHANGES REQUESTED

Diff reviewed - 4 new files under examples/atr-community-rules/ (546 additions). Closes #901.

Security checklist:

  • No eval(), exec(), pickle, shell=True, innerHTML
  • yaml.safe_load used correctly (not yaml.load)
  • No hardcoded secrets
  • agent-threat-rules npm package verified: registered on npm since 2026-03-09, 11 versions
  • Scope matches description - all files under examples/ only
  • Additions > 0

Issues to fix:

  1. Missing MIT license headers (required per repo policy): Both sync_atr_rules.py and test_atr_policy.py need # Copyright (c) Microsoft Corporation. and # Licensed under the MIT License. at top.

  2. Missing encoding in open() - the test file has open("...atr_security_policy.yaml") without encoding="utf-8". Per repo policy, all open() calls reading YAML/JSON/text must specify encoding (prevents Windows failures).

  3. Trailing newline - verify all new files end with a newline (ruff W292).

Content looks solid - happy to approve once fixed.

Address review feedback from @imran-siddique:
- Add Microsoft MIT license headers to .py files
- Add encoding='utf-8' to all open() calls
- Ensure trailing newlines on all files
Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Agent: code-reviewer

Review Summary

This pull request introduces an example integration of the Agent Threat Rules (ATR) detection rules into the Agent Governance Toolkit (AGT). The PR includes a pre-built atr_security_policy.yaml file with 15 high-confidence rules, a script to convert the full ATR ruleset (108 rules) into AGT's PolicyDocument format, and a set of pytest-based tests to validate the integration. The contribution is well-documented and aligns with the goals of enhancing the security capabilities of AGT.

Below is a detailed review of the PR, focusing on the specified areas of concern:


🔴 CRITICAL: Security Issues

  1. Regex Denial-of-Service (ReDoS) Vulnerabilities

    • The regex patterns in atr_security_policy.yaml are complex and may be susceptible to ReDoS attacks. For example:
      value: "(?i)\\b(ignore|disregard|forget|override|bypass|skip|abandon|drop|cancel|suppress)\\s+(all\\s+)?(the\\s+|your\\s+|my\\s+|any\\s+)?(previous|prior|above|earlier|preceding|original|existing|current|initial|default)?\\s*(instructions?|prompts?|rules?|guidelines?|directives?|constraints?|context|programming|protocols?|policies|configuration|guardrails?|restrictions?)"
      • This regex contains nested quantifiers (\\s+ and ?) and alternations (|), which can lead to catastrophic backtracking. An attacker could craft malicious input to exploit this and cause significant performance degradation.
      • Actionable Recommendation: Use tools like regex (Python's regex module) to validate and optimize these patterns for performance. Alternatively, consider simplifying the regex patterns or using non-backtracking constructs.
  2. Credential Exposure Detection

    • The regex for detecting credentials in atr_security_policy.yaml includes patterns like:
      value: "(?:sk-[a-zA-Z0-9]{20,}|AKIA[0-9A-Z]{16}|AIza[a-zA-Z0-9_-]{35}|ghp_[a-zA-Z0-9]{36}|xox[bpors]-[a-zA-Z0-9\\-]{10,}|-----BEGIN (?:RSA |EC |DSA |OPENSSH )?PRIVATE KEY-----|eyJ[a-zA-Z0-9_-]{10,}\\.eyJ[a-zA-Z0-9_-]{10,}\\.[a-zA-Z0-9_-]+)"
      • While this is a good start, it may not cover all possible credential formats. For example, some API keys or secrets might not match these patterns.
      • Actionable Recommendation: Consider integrating with a dedicated secrets-detection library like truffleHog or detect-secrets for more comprehensive detection.
  3. Potential Sandbox Escape

    • The regex for detecting privilege escalation and dynamic code execution includes patterns like:
      value: "(?i)(?:eval\\s*\\(|new\\s+Function\\s*\\(|vm\\.(runIn|createContext|compileFunction)|require\\s*\\(\\s*['\"]child_process['\"]|import\\s*\\(\\s*['\"]child_process)"
      • While this regex is effective for detecting common patterns, it may not cover all possible variations of dynamic code execution attempts.
      • Actionable Recommendation: Regularly update these patterns based on emerging threats and consider leveraging external libraries or tools that specialize in detecting sandbox escapes.

🟡 WARNING: Potential Breaking Changes

  1. Backward Compatibility of sync_atr_rules.py

    • The sync_atr_rules.py script introduces a new way to generate policy documents from ATR rules. While this is a new feature, any changes to the script's behavior or output format in the future could break existing integrations.
    • Actionable Recommendation: Clearly document the expected input and output formats for the script. Consider versioning the output format to ensure backward compatibility.
  2. PolicyEvaluator Behavior

    • The addition of new rules to the PolicyEvaluator may result in stricter enforcement, potentially causing previously allowed inputs to be denied.
    • Actionable Recommendation: Clearly communicate this change in the release notes and provide guidance on how users can customize or disable specific rules if needed.

💡 Suggestions for Improvement

  1. Test Coverage

    • The test suite covers a good range of scenarios, but it could be expanded to include:
      • Edge cases for regex patterns (e.g., inputs that are close to matching but should not).
      • Performance tests to ensure that the regex patterns do not introduce significant latency.
    • Actionable Recommendation: Add tests for edge cases and performance benchmarks to ensure the robustness and efficiency of the rules.
  2. Dynamic Rule Loading

    • The current implementation requires users to manually run the sync_atr_rules.py script to update the rules. This could lead to outdated rules being used in production.
    • Actionable Recommendation: Consider adding functionality to automatically fetch and update the rules from the ATR repository at runtime or during deployment.
  3. Documentation

    • The documentation is comprehensive, but it could benefit from additional details, such as:
      • Examples of how to customize the rules (e.g., changing priorities or actions).
      • Guidance on how to handle false positives or modify rules.
    • Actionable Recommendation: Expand the README.md to include these details.
  4. Thread Safety

    • The PolicyEvaluator is used in a pytest fixture and may be used in concurrent agent execution scenarios. However, there is no explicit mention of thread safety in the PolicyEvaluator implementation.
    • Actionable Recommendation: Review the PolicyEvaluator implementation to ensure it is thread-safe. If it is not, document this limitation clearly.
  5. Error Handling in sync_atr_rules.py

    • The _load_yaml function currently prints warnings to stderr but does not provide a mechanism for the user to handle these errors programmatically.
    • Actionable Recommendation: Consider raising exceptions for critical errors (e.g., missing or malformed files) and providing an option to suppress warnings for non-critical issues.
  6. Rule Prioritization

    • The mapping of ATR severity levels to AGT priorities is hardcoded in sync_atr_rules.py.
    • Actionable Recommendation: Allow users to customize the priority mapping via a configuration file or command-line arguments.

Summary of Feedback

  • 🔴 CRITICAL: Address potential ReDoS vulnerabilities in regex patterns.
  • 🔴 CRITICAL: Enhance credential exposure detection with specialized libraries or tools.
  • 🔴 CRITICAL: Regularly update privilege escalation and sandbox escape detection patterns.
  • 🟡 WARNING: Document the sync_atr_rules.py script's input/output format and consider versioning.
  • 🟡 WARNING: Communicate potential breaking changes in PolicyEvaluator behavior due to stricter rules.
  • 💡 SUGGESTION: Expand test coverage to include edge cases and performance benchmarks.
  • 💡 SUGGESTION: Add dynamic rule-fetching capabilities to keep rules up-to-date.
  • 💡 SUGGESTION: Expand documentation to include customization and false-positive handling.
  • 💡 SUGGESTION: Review PolicyEvaluator for thread safety in concurrent execution scenarios.
  • 💡 SUGGESTION: Improve error handling in sync_atr_rules.py.
  • 💡 SUGGESTION: Allow customization of ATR severity-to-priority mapping.

This PR is a valuable addition to the repository and significantly enhances its security capabilities. Addressing the critical issues and implementing the suggested improvements will ensure a more robust and secure integration.

@imran-siddique imran-siddique enabled auto-merge (squash) April 8, 2026 23:49
auto-merge was automatically disabled April 8, 2026 23:55

Head branch was pushed to by a user without write access

Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Agent: code-reviewer

Code Review for feat(examples): add ATR community security rules for PolicyEvaluator


🔴 CRITICAL: Security Issues

  1. Regex Injection Risk in User Input Matching

    • The regex patterns used in atr_security_policy.yaml (e.g., (?i)\b(ignore|disregard|forget|override|bypass)...) are directly matched against user input. If the input is not sanitized or validated before evaluation, it could lead to regex injection attacks.
    • Recommendation: Ensure that user inputs are sanitized before being passed to the PolicyEvaluator. Additionally, consider limiting regex complexity to avoid catastrophic backtracking.
  2. Credential Exposure in Logs

    • The sync_atr_rules.py script prints warnings and errors to stderr, which may inadvertently expose sensitive information if the script processes malformed or malicious ATR rule files.
    • Recommendation: Sanitize all log messages to ensure no sensitive data (e.g., regex patterns, file paths) is exposed. Use structured logging with redaction for sensitive fields.
  3. Privilege Escalation Detection

    • The rule atr-00040-privilege-escalation-tools matches user input for privilege escalation attempts. However, it does not account for obfuscated or encoded commands (e.g., Base64-encoded payloads).
    • Recommendation: Add additional rules or preprocessing steps to decode common encoding schemes (e.g., Base64, URL encoding) before evaluation.
  4. Dynamic Code Execution Detection

    • The rule atr-00110-eval-injection matches patterns for eval, Function, and child_process. However, it does not account for obfuscated or indirect references (e.g., global['eval']).
    • Recommendation: Enhance the regex patterns to detect obfuscated or indirect references to dangerous functions.

🟡 WARNING: Potential Breaking Changes

  1. Backward Compatibility of PolicyEvaluator

    • The addition of new rules in atr_security_policy.yaml may introduce breaking changes for users who rely on the default behavior of the PolicyEvaluator. For example, legitimate user inputs or tool descriptions might now be flagged as violations.
    • Recommendation: Clearly document the impact of these new rules on existing workflows. Provide a mechanism for users to selectively enable/disable specific rules.
  2. Changes to PolicyDocument Schema

    • The sync_atr_rules.py script assumes a specific schema for ATR rules and maps them to the AGT PolicyDocument format. If the ATR schema changes, this could break the script.
    • Recommendation: Implement schema validation for ATR rule files and provide clear error messages if the schema changes.

💡 Suggestions for Improvement

  1. Testing Coverage

    • The test suite in test_atr_policy.py is well-structured but could benefit from additional test cases:
      • Test for false positives with edge cases (e.g., inputs that resemble attack patterns but are benign).
      • Test for performance under high load or with large input sizes.
      • Test the sync_atr_rules.py script with malformed or incomplete ATR rule files to ensure robust error handling.
  2. Documentation Enhancements

    • The README.md file in examples/atr-community-rules/ is comprehensive but could include:
      • A section on the potential performance impact of using 108 regex rules.
      • Guidance on how to customize or extend the ruleset for specific use cases.
      • A note on the importance of keeping the ATR rules up-to-date and how to automate the sync process.
  3. Regex Optimization

    • Some regex patterns in atr_security_policy.yaml are complex and may lead to performance issues (e.g., catastrophic backtracking). For example:
      value: "(?i)\\b(ignore|disregard|forget|override|bypass|skip|abandon|drop|cancel|suppress)\\s+(all\\s+)?(the\\s+|your\\s+|my\\s+|any\\s+)?(previous|prior|above|earlier|preceding|original|existing|current|initial|default)?\\s*(instructions?|prompts?|rules?|guidelines?|directives?|constraints?|context|programming|protocols?|policies|configuration|guardrails?|restrictions?)"
      • Recommendation: Use simpler patterns or pre-compile regexes to improve performance. Consider using a regex library that supports timeout or complexity limits.
  4. Thread Safety

    • The PolicyEvaluator is not explicitly documented as thread-safe. If it is used in a multithreaded environment, there could be race conditions or shared state issues.
    • Recommendation: Audit the PolicyEvaluator implementation for thread safety. If it is not thread-safe, document this limitation and provide guidance for safe usage.
  5. Sync Script Error Handling

    • The sync_atr_rules.py script exits with sys.exit(1) on errors (e.g., missing YAML files). This could disrupt automated pipelines.
    • Recommendation: Return structured error messages or codes instead of exiting abruptly. Allow the script to continue processing other files even if some fail.
  6. Type Annotations

    • The sync_atr_rules.py script uses type annotations, but some functions (e.g., _load_yaml, _extract_regex_patterns) could benefit from more precise types.
    • Recommendation: Use TypedDict or dataclasses for structured data like ATR rules and AGT policies.
  7. Performance Considerations

    • Matching 108 regex patterns against user input or tool descriptions could impact performance, especially in high-throughput environments.
    • Recommendation: Benchmark the PolicyEvaluator with the full ATR ruleset under realistic workloads. Consider optimizing the matching process, such as grouping similar patterns or using a regex engine with better performance characteristics.
  8. Backward Compatibility Testing

    • The addition of new rules could impact existing users. There is no evidence of backward compatibility testing in the PR.
    • Recommendation: Add a test suite to verify that existing workflows are not disrupted by the new rules.

Summary of Recommendations

  • Critical: Address regex injection risks, log sanitization, and enhance detection for obfuscated threats.
  • Warning: Document potential breaking changes and ensure backward compatibility.
  • Suggestions: Improve testing coverage, optimize regex patterns, and enhance documentation.

This PR introduces valuable security features but requires additional safeguards and testing to ensure robustness and compatibility.

@lawcontinue
Copy link
Copy Markdown
Contributor

PR #908 安全审计报告

Issue: #901 - ATR 社区检测规则集成
PR: #908 - 添加 ATR 安全规则
审计日期: 2026-04-10
状态: 草稿 - 待审阅


总结

结论: ⚠️ 条件批准(修复 3 个问题后合并)

关键发现: 108 条规则(Cisco 生产使用),但有 3 个安全问题必须修复。


🔴 关键问题(必须修复)

问题 #1: 供应链漏洞 🔴 CRITICAL

严重性: CRITICAL
CVSS 评分: 7.8 (High)

问题描述:
sync_atr_rules.py 从 npm 下载 agent-threat-rules 时没有完整性验证。如果 npm 包被劫持,可能注入恶意规则。

证据:

# sync_atr_rules.py: 没有依赖锁定,没有 checksum 验证
$ npm install agent-threat-rules  # 没有 package-lock.json 验证
$ python sync_atr_rules.py --atr-dir node_modules/agent-threat-rules/rules/

攻击向量:

  • 依赖混淆攻击
  • Typosquatting(如 agent-threat-roles
  • npm 账号被劫持 → 恶意 action: allow 规则

影响:

  • 恶意规则绕过安全策略
  • 静默降级策略
  • 供应链攻击(影响所有 AGT 用户)

修复建议:

# 1. 添加 package-lock.json 完整性检查
$ npm install agent-threat-rules
$ sha256sum node_modules/agent-threat-rules/package/integrity.json

# 2. 验证 ATR 规则签名(如果可用)
# 3. 在 README 固定版本号
npm install agent-threat-rules@1.1.1  # 明确版本

# 4. 在同步脚本添加 checksum 验证
ATR_INTEGRITY_HASH = "sha256:abc123..."  # 从 ATR 官方发布获取

问题 #2: Unicode/混淆绕过 🔴 CRITICAL

严重性: CRITICAL
CVSS 评分: 8.2 (High)

问题描述:
基于 Regex 的检测可被 Unicode 同形字、空格变化或字符拆分绕过。

证据:

# atr_security_policy.yaml: ATR-00001
# 原始 regex: \\b(ignore|disregard|forget|override|bypass)\\b

# 绕过示例:
"Igno re all previous instructions"  # 空格拆分
"Igⲛore all previous instructions"  # 希腊字母 'nu' (U+043D)
"Ig\u006eore all previous"            # Unicode 转义
"Igno|re all previous"                # 管道注入

攻击向量:

  • Unicode 同形字替换(ᴀʙᴄ → ABС)
  • 零宽字符(U+200B, U+FEFF)
  • 多脚本混合(拉丁文 + 西里尔文)
  • 字符级混淆

影响:

  • 0% 误报率 → 潜在 10-30% 绕过率
  • 失去 Regex 检测的意义

修复建议:

# 1. 在 Regex 匹配前添加 Unicode 标准化
import unicodedata

def normalize_input(text: str) -> str:
    """NFKC 标准化 + 同形字检测"""
    normalized = unicodedata.normalize('NFKC', text)
    # TODO: 添加同形字检测
    return normalized

问题 #3: 缺少恶意规则防护 🟠 HIGH

严重性: HIGH
CVSS 评分: 7.1 (High)

问题描述:
sync_atr_rules.py 盲目信任所有 ATR YAML 文件,没有验证它们只包含 action: deny 规则。

证据:

# sync_atr_rules.py:88
def _atr_to_agt_rule(atr: dict[str, Any], pattern: str, index: int) -> dict[str, Any]:
    return {
        "action": "deny",  # ← 硬编码,但没有验证!
    }

攻击向量:

  • ATR 仓库被攻击,添加 action: allow 规则
  • 同步脚本导入它们 → 策略被削弱

修复建议:

# 在同步脚本添加验证
ALLOWED_ACTIONS = {"deny"}  # 只允许 deny 规则

def _atr_to_agt_rule(atr: dict[str, Any], pattern: str, index: int) -> dict[str, Any]:
    atr_action = atr.get("action", "deny")
    if atr_action not in ALLOWED_ACTIONS:
        raise ValueError(f"不安全的 ATR 规则操作: {atr_action}")

✅ 优点

1. 外部验证 ✅

  • Cisco 使用(108 条规则,生产验证)
  • 精确度 99.6%(MCP 工具描述)
  • 0% 误报(53,577 skills 测试)
  • MIT 许可证(兼容)

2. 架构 ✅

  • ATR → AGT 映射清晰
  • 符合 PolicyEvaluator 接口
  • 易扩展

3. 文档 ✅

  • README 详细(72 行)
  • 使用说明清晰

4. 代码质量 ✅

  • 使用 yaml.safe_load
  • 类型注解完整
  • 错误处理合理

🎯 建议

合并前(必须):

  1. 添加供应链验证(依赖锁定 + checksums)
  2. 添加 Unicode 标准化(Regex 匹配前)
  3. 添加规则验证(拒绝 action: allow

合并后(建议):

  1. 考虑添加基于 NLP 的语义检测
  2. 扩展测试覆盖(同形字、混淆)
  3. 添加性能基准(108 条规则)

结论

决策: ⚠️ 条件批准

条件:

  1. 🔴 供应链验证
  2. 🔴 Unicode 标准化
  3. 🟠 规则验证

修复时间: 预计 2-3 小时

下一步:

  1. 在 PR feat(examples): add ATR community security rules for PolicyEvaluator #908 发布反馈
  2. 等待贡献者修复
  3. 重新审查后合并

审计日期: 2026-04-10

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation size/XL Extra large PR (500+ lines)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Community detection rules integration — ATR (108 rules, shipped in Cisco AI Defense)

3 participants