Skip to content

docs: add policy-as-code tutorial chapters 6-7 (testing & versioning)#916

Merged
imran-siddique merged 4 commits intomicrosoft:mainfrom
harinarayansrivatsan:docs/706-chapters-6-7
Apr 11, 2026
Merged

docs: add policy-as-code tutorial chapters 6-7 (testing & versioning)#916
imran-siddique merged 4 commits intomicrosoft:mainfrom
harinarayansrivatsan:docs/706-chapters-6-7

Conversation

@harinarayansrivatsan
Copy link
Copy Markdown
Contributor

Summary

  • Adds chapter 6 (policy testing): structural validation with Pydantic, declarative YAML test scenarios, cross-policy test matrices, and regression detection
  • Adds chapter 7 (policy versioning): side-by-side version comparison, structural diffing, behavioral regression detection, and deploy gates
  • Normalizes MIT license headers across chapters 1-4 markdown and YAML files for consistency with chapters 5-7
  • Updates README to link chapters 6-7 and removes "coming soon" notice

New files

  • docs/tutorials/policy-as-code/06-policy-testing.md
  • docs/tutorials/policy-as-code/07-policy-versioning.md
  • docs/tutorials/policy-as-code/examples/06_policy_testing.py
  • docs/tutorials/policy-as-code/examples/06_test_policy.yaml
  • docs/tutorials/policy-as-code/examples/06_test_scenarios.yaml
  • docs/tutorials/policy-as-code/examples/07_policy_versioning.py
  • docs/tutorials/policy-as-code/examples/07_policy_v1.yaml
  • docs/tutorials/policy-as-code/examples/07_policy_v2.yaml

Test plan

  • Run python docs/tutorials/policy-as-code/examples/06_policy_testing.py — all 4 parts pass, 8/8 scenarios pass, matrix finds expected surprise
  • Run python docs/tutorials/policy-as-code/examples/07_policy_versioning.py — diff accurate, regression correctly identified
  • Verify chapters 1-4 examples still run after adding license headers to YAML files
  • Verify all markdown links between chapters resolve correctly

Ref #706

Note: Stacked on #911 (chapter 5). Diff currently includes ch5 changes — once #911 merges and this branch is rebased onto main, the diff will show only chapters 6-7 and license header normalization.

Copilot AI review requested due to automatic review settings April 10, 2026 13:45
@github-actions
Copy link
Copy Markdown

Welcome to the Agent Governance Toolkit! Thanks for your first pull request.
Please ensure tests pass, code follows style (ruff check), and you have signed the CLA.
See our Contributing Guide.

@github-actions github-actions bot added documentation Improvements or additions to documentation size/XL Extra large PR (500+ lines) labels Apr 10, 2026
Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Agent: code-reviewer

Feedback on Pull Request: docs: add policy-as-code tutorial chapters 6-7 (testing & versioning)


🔴 CRITICAL: Security Issues

  1. Escalation Timeout Default Action

    • The tutorial suggests setting DefaultTimeoutAction.ALLOW for less critical actions in certain scenarios. This is a dangerous recommendation, as it could lead to silent approvals of sensitive actions if the timeout expires. Even if the action is deemed "less critical," allowing it by default without human review introduces a potential security bypass.
    • Action: Remove or strongly discourage the use of DefaultTimeoutAction.ALLOW in the documentation. Defaulting to DENY is the safer and more appropriate choice for all escalation scenarios.
  2. Policy Rule Priority Conflicts

    • The tutorial does not address potential priority conflicts between rules. For example, if two rules apply to the same tool and have overlapping conditions, the higher-priority rule will take precedence. However, if the priority order is misconfigured, this could lead to unintended decisions (e.g., escalation bypassed by a lower-priority rule).
    • Action: Add a section to the tutorial explaining how to handle rule priority conflicts and the importance of testing for such scenarios.
  3. Escalation Request Spoofing

    • The tutorial does not mention any mechanisms to prevent spoofing of escalation requests. For example, an attacker could potentially craft a fake escalation request with a forged agent_id or action field to bypass security checks.
    • Action: Document the need for cryptographic signing or authentication mechanisms for escalation requests to ensure their integrity and authenticity.

🟡 WARNING: Potential Breaking Changes

  1. Backward Compatibility of Policy Schema
    • The addition of new rules and features (e.g., escalation handling) may break existing policies that do not conform to the updated schema. For example, older policies without the message field or escalation rules might fail validation.
    • Action: Ensure backward compatibility by providing migration tools or fallback mechanisms for older policy versions. Update the tutorial to include instructions for migrating legacy policies.

💡 Suggestions for Improvement

  1. Automated Policy Testing

    • The tutorial introduces policy testing but does not mention integration with CI/CD pipelines. Automated policy tests should be run as part of the CI/CD process to catch regressions early.
    • Action: Add a section on integrating policy tests with CI/CD pipelines, including examples of how to run tests using pytest or similar frameworks.
  2. Thread Safety in Escalation Handling

    • The tutorial uses InMemoryApprovalQueue, which may not be thread-safe in concurrent environments. While this is acceptable for demonstration purposes, production systems should use thread-safe or distributed backends.
    • Action: Add a note in the tutorial recommending thread-safe or distributed backends (e.g., Redis, RabbitMQ) for production use.
  3. Policy Diffing and Versioning

    • Chapter 7 introduces policy versioning but does not provide examples of how to handle structural diffs in YAML files. This could be useful for detecting unintended changes in policies.
    • Action: Expand Chapter 7 to include examples of YAML diffing tools or libraries that can be used to compare policy versions.
  4. Error Handling in Policy Validation

    • The tutorial demonstrates how to catch validation errors but does not provide guidance on how to handle them effectively (e.g., logging, notifying developers).
    • Action: Add best practices for handling validation errors, including logging and alerting mechanisms.
  5. OWASP Agentic Top 10 Compliance

    • The tutorial does not explicitly address compliance with OWASP Agentic Top 10 guidelines, such as ensuring audit trails for all policy decisions and escalation requests.
    • Action: Add a section mapping the tutorial's features to OWASP Agentic Top 10 compliance, emphasizing auditability and traceability.
  6. Improved Documentation Navigation

    • The tutorial chapters are linked sequentially, but navigating between sections could be improved with a table of contents or sidebar navigation.
    • Action: Add a table of contents or sidebar navigation to improve usability.

Summary

This pull request introduces valuable documentation for policy testing and versioning. However, it raises critical security concerns, particularly around escalation timeout defaults and spoofing risks. Addressing these issues is essential to ensure the library remains secure and compliant with best practices. Additionally, improving backward compatibility, thread safety, and documentation usability will enhance the overall quality of the tutorial.

Priority Actions:

  • Remove or discourage DefaultTimeoutAction.ALLOW.
  • Document cryptographic signing for escalation requests.
  • Address rule priority conflicts in the tutorial.

Suggested Enhancements:

  • Integrate policy testing into CI/CD pipelines.
  • Recommend thread-safe backends for production.
  • Expand policy versioning examples to include YAML diffing.
  • Map features to OWASP Agentic Top 10 compliance.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 10, 2026

🤖 AI Agent: security-scanner — Security Analysis of the Pull Request

Security Analysis of the Pull Request

This pull request primarily adds documentation and examples for chapters 6 and 7 of the "Policy-as-Code" tutorial. While the changes are mainly educational and do not directly alter the core functionality of the microsoft/agent-governance-toolkit, it is still important to evaluate the provided examples and documentation for potential security issues, as they could influence how users implement and test policies.


Findings

1. Prompt Injection Defense Bypass

  • Risk: None identified in this PR. The changes are limited to documentation and examples, and there is no direct interaction with user-provided prompts or inputs in the core library.
  • Rating: 🔵 LOW
  • Recommendation: No changes needed.

2. Policy Engine Circumvention

  • Risk: The examples provided in 06_policy_testing.py and 07_policy_versioning.py demonstrate how to test policies using the PolicyEvaluator and PolicyDocument classes. There is no indication of any mechanism to bypass the policy engine in these examples.
  • Rating: 🔵 LOW
  • Recommendation: No changes needed.

3. Trust Chain Weaknesses

  • Risk: None identified. The changes do not involve SPIFFE/SVID validation, certificate pinning, or other trust chain mechanisms.
  • Rating: 🔵 LOW
  • Recommendation: No changes needed.

4. Credential Exposure

  • Risk: None identified. The examples and documentation do not include any hardcoded credentials or sensitive information.
  • Rating: 🔵 LOW
  • Recommendation: No changes needed.

5. Sandbox Escape

  • Risk: None identified. The examples do not involve any execution of untrusted code or system-level operations that could lead to sandbox escapes.
  • Rating: 🔵 LOW
  • Recommendation: No changes needed.

6. Deserialization Attacks

  • Risk: The examples use yaml.safe_load() for loading YAML files, which is a safe method for deserialization. However, it is important to ensure that users do not replace this with yaml.load() in their own implementations, as it could lead to deserialization vulnerabilities.
  • Rating: 🔵 LOW
  • Recommendation: Add a note in the documentation explicitly warning users to always use yaml.safe_load() for deserialization to prevent potential security risks.

7. Race Conditions

  • Risk: None identified. The examples and documentation do not involve concurrent execution or time-of-check-to-time-of-use (TOCTOU) scenarios.
  • Rating: 🔵 LOW
  • Recommendation: No changes needed.

8. Supply Chain

  • Risk: The examples rely on the pydantic and pyyaml libraries. While these are widely used and generally considered secure, it is important to ensure that the versions used are up-to-date and free from known vulnerabilities.
  • Rating: 🟡 MEDIUM
  • Recommendation: Add a note in the documentation to encourage users to keep dependencies updated. Additionally, consider adding dependency scanning to the CI/CD pipeline to detect outdated or vulnerable dependencies.

Summary of Findings

Finding Risk Recommendation
Prompt injection defense bypass 🔵 LOW No changes needed.
Policy engine circumvention 🔵 LOW No changes needed.
Trust chain weaknesses 🔵 LOW No changes needed.
Credential exposure 🔵 LOW No changes needed.
Sandbox escape 🔵 LOW No changes needed.
Deserialization attacks 🔵 LOW Add a note warning users to always use yaml.safe_load() for deserialization.
Race conditions 🔵 LOW No changes needed.
Supply chain 🟡 MEDIUM Add a note about keeping dependencies updated and consider adding dependency scanning.

Additional Recommendations

  1. Testing for Edge Cases: While the examples provided are comprehensive, it would be beneficial to include edge cases in the test scenarios, such as:

    • Empty or malformed context inputs.
    • Policies with conflicting rules of the same priority.
    • Policies with missing or extra fields.
  2. Error Handling: Ensure that the examples demonstrate robust error handling for scenarios like missing files, invalid YAML, or unexpected evaluation results.

  3. Security Best Practices: Include a section in the documentation emphasizing security best practices when writing policies, such as avoiding overly permissive defaults and carefully reviewing changes to escalation rules.

  4. Versioning Guidance: Since Chapter 7 introduces policy versioning, it would be helpful to include guidance on securely managing policy versions, such as using cryptographic signatures to verify the integrity of policy files.


Final Assessment

This pull request primarily focuses on documentation and examples, with no changes to the core library. The examples provided are well-structured and demonstrate good practices for policy testing and versioning. However, there are minor recommendations to improve the security posture of the documentation and examples, particularly around deserialization and dependency management.

Overall Risk Rating for this PR: 🔵 LOW

Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Agent: code-reviewer

Pull Request Review: docs: add policy-as-code tutorial chapters 6-7 (testing & versioning)


🔴 CRITICAL Issues

1. Escalation Timeout Default Action

  • The tutorial suggests using DefaultTimeoutAction.ALLOW as an alternative to DENY. While this is presented as a rare case, it introduces a potential security bypass. If an escalation request times out and defaults to ALLOW, malicious or unintended actions could proceed without human review.
  • Recommendation: Clearly document that DefaultTimeoutAction.ALLOW should only be used for non-critical actions. Add safeguards in the library to warn or prevent its use for high-risk actions.

2. Policy Rule Priority Conflicts

  • The example policies rely heavily on rule priorities to determine outcomes. If two rules have overlapping conditions and the same priority, the behavior is undefined. This could lead to security vulnerabilities if a lower-priority rule inadvertently overrides a higher-priority rule.
  • Recommendation: Add validation logic to detect and prevent overlapping rules with identical priorities. This should be highlighted in the tutorial as a potential pitfall.

3. EscalationHandler Timeout Behavior

  • The EscalationHandler timeout mechanism relies on a default action, but the tutorial does not address what happens if the timeout value is set to None or an extremely high value (e.g., timeout_seconds=999999). This could lead to agents waiting indefinitely, potentially causing denial-of-service scenarios.
  • Recommendation: Enforce a reasonable maximum timeout value in the library and document this limitation in the tutorial.

🟡 WARNING: Potential Breaking Changes

1. License Header Normalization

  • Adding MIT license headers to existing files (chapters 1-4) changes the file content. If any external systems or scripts rely on exact file hashes or content, this could cause issues.
  • Recommendation: Ensure that all downstream dependencies are updated to account for these changes. Communicate this update clearly in the release notes.

2. Policy Versioning Diff Behavior

  • The tutorial introduces structural diffing for policy versioning. If users rely on the current behavior of policy evaluation without versioning, this could lead to unexpected results when deploying new policies.
  • Recommendation: Provide clear migration guidance for users transitioning to policy versioning. Include examples of how to handle backward compatibility.

💡 Suggestions for Improvement

1. Thread Safety in EscalationHandler

  • The InMemoryApprovalQueue is used in the examples, but it is unclear whether it is thread-safe. If multiple agents are running concurrently, race conditions could occur when accessing or modifying the queue.
  • Recommendation: Ensure that InMemoryApprovalQueue is thread-safe or explicitly document that it is not suitable for concurrent use. Consider providing examples with a thread-safe backend, such as a database or message broker.

2. Policy Testing Coverage

  • The tutorial introduces policy testing but does not emphasize edge cases, such as testing for unintended rule overlaps or missing escalation tags. These are critical for security-focused applications.
  • Recommendation: Expand the tutorial to include examples of edge case testing, such as overlapping rules, missing escalation tags, or malformed policies.

3. SPIFFE/SVID Integration

  • The tutorial does not mention how policies interact with SPIFFE/SVID for agent identity verification. This is a missed opportunity to highlight the library's security features.
  • Recommendation: Add a section to the tutorial explaining how SPIFFE/SVID can be used to authenticate agents and enforce identity-based policies.

4. Behavioral Regression Detection

  • The tutorial mentions behavioral regression detection but does not provide concrete examples of how to implement it.
  • Recommendation: Include a detailed example of a regression test that compares policy behavior across versions, highlighting potential issues like changes in escalation behavior or default actions.

5. Markdown Link Verification

  • The test plan includes verifying markdown links between chapters but does not automate this step.
  • Recommendation: Add a script to the CI pipeline that checks for broken links in markdown files.

6. Type Safety and Pydantic Validation

  • The tutorial demonstrates Pydantic validation for policies but does not emphasize type safety for test scenarios.
  • Recommendation: Extend the tutorial to include type-safe definitions for test scenarios, ensuring that all inputs are validated before execution.

Summary

This pull request introduces valuable documentation for policy testing and versioning, but it raises critical security concerns and potential backward compatibility issues. Addressing these concerns will improve the library's robustness and ensure compliance with security best practices.

Actions Required:

  1. Address 🔴 CRITICAL issues related to escalation timeout behavior and rule priority conflicts.
  2. Mitigate 🟡 WARNING risks by documenting changes and providing migration guidance.
  3. Implement 💡 SUGGESTIONS to enhance thread safety, testing coverage, and SPIFFE/SVID integration.

Once these issues are resolved, the documentation will be a strong addition to the repository.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds new “Policy-as-Code” tutorial content covering automated policy testing (chapter 6) and policy versioning/regression detection (chapter 7), plus updates navigation and normalizes MIT license headers across earlier chapters/examples.

Changes:

  • Add chapters 6–7 markdown + runnable Python/YAML examples for scenario testing, test matrices, and v1/v2 comparisons.
  • Update tutorial README + chapter navigation links to include the new chapters (and chapter 5 due to stack).
  • Normalize MIT license headers across chapters 1–4 markdown and YAML example files.

Reviewed changes

Copilot reviewed 23 out of 23 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
docs/tutorials/policy-as-code/README.md Links chapters 5–7 (removes “coming soon”).
docs/tutorials/policy-as-code/examples/07_policy_versioning.py New runnable example to diff/test v1 vs v2 and flag regressions.
docs/tutorials/policy-as-code/examples/07_policy_v2.yaml New v2 policy used for structural/behavioral comparison demo.
docs/tutorials/policy-as-code/examples/07_policy_v1.yaml New v1 baseline policy used for comparison demo.
docs/tutorials/policy-as-code/examples/06_test_scenarios.yaml New declarative scenario set for CLI-based policy testing.
docs/tutorials/policy-as-code/examples/06_test_policy.yaml New combined test policy used by scenario runner/matrix.
docs/tutorials/policy-as-code/examples/06_policy_testing.py New runnable example covering validation, scenarios, matrix, regression check.
docs/tutorials/policy-as-code/examples/05_approval_workflows.py (Stacked) Runnable example for human-in-the-loop escalation.
docs/tutorials/policy-as-code/examples/05_approval_policy.yaml (Stacked) YAML policy for approval workflow chapter.
docs/tutorials/policy-as-code/examples/04_support_team_policy.yaml Add MIT header to existing example YAML.
docs/tutorials/policy-as-code/examples/04_global_policy.yaml Add MIT header to existing example YAML.
docs/tutorials/policy-as-code/examples/04_env_policy.yaml Add MIT header to existing example YAML.
docs/tutorials/policy-as-code/examples/03_rate_limit_policy.yaml Add MIT header to existing example YAML.
docs/tutorials/policy-as-code/examples/02_reader_policy.yaml Add MIT header to existing example YAML.
docs/tutorials/policy-as-code/examples/02_admin_policy.yaml Add MIT header to existing example YAML.
docs/tutorials/policy-as-code/examples/01_first_policy.yaml Add MIT header to existing example YAML.
docs/tutorials/policy-as-code/07-policy-versioning.md New chapter 7 tutorial doc (diffing + regression gates).
docs/tutorials/policy-as-code/06-policy-testing.md New chapter 6 tutorial doc (validation, scenarios, matrices).
docs/tutorials/policy-as-code/05-approval-workflows.md (Stacked) New chapter 5 tutorial doc (escalation workflows).
docs/tutorials/policy-as-code/04-conditional-policies.md Add MIT header + update “Next” link to chapter 5.
docs/tutorials/policy-as-code/03-rate-limiting.md Add MIT header.
docs/tutorials/policy-as-code/02-capability-scoping.md Add MIT header.
docs/tutorials/policy-as-code/01-your-first-policy.md Add MIT header.

Comment thread docs/tutorials/policy-as-code/examples/06_policy_testing.py
Comment thread docs/tutorials/policy-as-code/examples/06_test_scenarios.yaml
Comment thread docs/tutorials/policy-as-code/06-policy-testing.md
Comment thread docs/tutorials/policy-as-code/07-policy-versioning.md
Comment thread docs/tutorials/policy-as-code/examples/07_policy_v2.yaml
Comment thread docs/tutorials/policy-as-code/examples/07_policy_versioning.py
Comment thread docs/tutorials/policy-as-code/06-policy-testing.md
@imran-siddique
Copy link
Copy Markdown
Member

@harinarayansrivatsan This PR has merge conflicts. Chapter 5 (#911) has been merged. Please rebase onto latest main so ch6-7 diff is clean. You can run: git fetch upstream && git rebase upstream/main then force-push. Once rebased, we will merge promptly.

Normalize license headers across all tutorial chapters.
Chapters 5-7 already had them; this adds them to chapters
1-4 markdown and YAML example files for consistency.
Covers structural validation with Pydantic, declarative YAML
test scenarios, cross-policy test matrices, and regression
detection. Includes runnable Python example and test fixtures.
Covers side-by-side version comparison, structural diffing,
behavioral regression detection, and deploy gates. Updates
README to link chapters 6-7 and fixes chapter 5 nav link.
Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Agent: code-reviewer

Review Summary

This pull request introduces chapters 6 and 7 of the "Policy-as-Code" tutorial, focusing on policy testing and versioning. The additions include detailed explanations, examples, and Python scripts for validating, testing, and comparing policies. The changes are primarily documentation and example files, with no modifications to the core library code. The tutorial content is well-structured and demonstrates best practices for policy testing and versioning.

Below are the detailed observations and recommendations:


🔴 CRITICAL: Security Issues

  1. Policy Escalation Keyword Dependency

    • The tutorial highlights a critical issue where the removal of the "requires human approval" keyword from the transfer_funds rule silently changes the behavior from "escalate" to "deny." This dependency on a specific string for escalation detection is fragile and prone to errors.
    • Recommendation: Refactor the escalation logic to use a dedicated field (e.g., action: escalate) instead of relying on string matching in the message field. This will make the policy schema more robust and reduce the risk of accidental security bypasses.
  2. Default Allow Action

    • The default action in the example policy (06_test_policy.yaml) is set to allow. This is risky, as it could lead to unintended access if no specific rule matches.
    • Recommendation: Default actions should be set to deny unless explicitly required. If allow is necessary, ensure that the tutorial emphasizes the risks and provides guidance on how to mitigate them.

🟡 WARNING: Potential Breaking Changes

  1. License Header Normalization
    • Adding license headers to existing files (chapters 1-4) could potentially break workflows or scripts that rely on specific file formats or content. While this is unlikely, it is worth verifying that these changes do not affect any downstream processes.
    • Recommendation: Confirm that the added license headers do not interfere with any automated parsing or processing of these files.

💡 Suggestions for Improvement

  1. Thread Safety in Policy Evaluation

    • The examples use the PolicyEvaluator class to evaluate policies. If this class is used in a concurrent environment (e.g., multiple agents evaluating policies simultaneously), ensure that it is thread-safe.
    • Recommendation: Add explicit documentation about thread safety or include tests to verify concurrent usage.
  2. Type Safety and Pydantic Validation

    • The tutorial demonstrates the use of PolicyDocument.model_validate() for validating policy structures. While this is a good practice, consider adding examples of custom validators for complex fields (e.g., condition or action) to ensure type safety and prevent invalid configurations.
    • Recommendation: Extend the tutorial to include advanced validation techniques using Pydantic.
  3. Backward Compatibility

    • The tutorial introduces new CLI commands (validate and test) and Python APIs (PolicyEvaluator, PolicyDocument.from_yaml). Ensure these additions are backward-compatible with existing workflows.
    • Recommendation: Add explicit notes in the documentation about compatibility with earlier versions of the library.
  4. Sandbox Escape Vectors

    • While the tutorial does not directly address sandboxing, policies often interact with external systems. Ensure that the PolicyEvaluator and related components are designed to prevent sandbox escapes (e.g., by validating inputs rigorously and avoiding unsafe operations).
    • Recommendation: Add a section in the tutorial about securing policy evaluation against malicious inputs.
  5. OWASP Agentic Top 10 Compliance

    • The tutorial does not explicitly address OWASP Agentic Top 10 risks, such as "Policy Injection" or "Unintended Policy Interactions."
    • Recommendation: Add a dedicated section in the tutorial to discuss these risks and how the library mitigates them.
  6. Testing Coverage

    • The tutorial provides examples for testing policies, but it does not include tests for edge cases (e.g., malformed YAML files, conflicting rules, or extreme priority values).
    • Recommendation: Add more comprehensive test scenarios to cover edge cases and unexpected interactions.
  7. Behavioral Regression Detection

    • The tutorial introduces regression detection but does not provide guidance on integrating this into CI/CD pipelines.
    • Recommendation: Include a section on how to automate regression detection using GitHub Actions or other CI/CD tools.

Conclusion

The pull request is a valuable addition to the repository, providing detailed guidance on policy testing and versioning. However, it introduces critical security risks related to escalation keyword dependency and default allow actions. Addressing these issues will significantly improve the robustness and security of the library.

Once the recommendations are implemented, the tutorial will be a strong resource for users adopting policy-as-code practices.

Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Agent: code-reviewer

Review Summary

This pull request adds two new chapters (6 and 7) to the "Policy-as-Code" tutorial, focusing on policy testing and versioning. It also normalizes license headers across existing chapters and updates the README. The tutorial content is well-written and provides actionable examples for validating, testing, and versioning policies. However, there are areas that require attention to ensure correctness, security, and maintainability.


🔴 CRITICAL Issues

  1. Validation Error Handling in PolicyDocument:

    • The PolicyDocument.from_yaml() method uses Pydantic for validation, which is good for catching structural errors. However, there is no mention of how the system handles malicious inputs or unexpected edge cases (e.g., deeply nested YAML files, extremely large files, or invalid types). Ensure that the validation logic includes safeguards against potential denial-of-service (DoS) attacks caused by maliciously crafted YAML files.
    • Action: Add tests for edge cases such as deeply nested YAML, large files, and invalid types to ensure the validation mechanism is robust.
  2. PolicyEvaluator Rule Merging:

    • The merging of policies in PolicyEvaluator is based on priority. However, there is no mention of how conflicting rules are resolved when they have the same priority. This could lead to unpredictable behavior in production.
    • Action: Implement deterministic tie-breaking logic for rules with the same priority. Document this behavior clearly in the tutorial.
  3. Escalation Keyword Dependency:

    • The escalation system relies on the presence of the exact phrase "requires human approval" in the message field to distinguish between escalation and hard deny. This is fragile and prone to errors during policy updates.
    • Action: Replace the reliance on string matching with a dedicated field in the schema, such as escalation: true. Update the tutorial and examples accordingly.

🟡 WARNING Issues

  1. Backward Compatibility:

    • Adding the escalation: true field (as suggested above) would break backward compatibility for existing policies that rely on the "requires human approval" message.
    • Action: Provide a migration guide or fallback mechanism to handle older policies gracefully.
  2. CLI Exit Codes:

    • The CLI commands (validate, test) use exit codes for success/failure. Ensure that these exit codes are consistent across all tools and documented clearly. Any changes to exit code behavior could break CI pipelines relying on them.
    • Action: Add explicit tests for CLI exit codes and document them in the tutorial.

💡 Suggestions for Improvement

  1. Thread Safety in PolicyEvaluator:

    • The PolicyEvaluator merges policies and evaluates decisions, but there is no mention of thread safety. If agents execute concurrently, ensure that shared resources (e.g., policy objects) are not modified during evaluation.
    • Action: Add thread-safety tests and document whether PolicyEvaluator is safe for concurrent use.
  2. OWASP Agentic Top 10 Compliance:

    • The tutorial does not address OWASP Agentic Top 10 risks explicitly (e.g., sandbox escape vectors, privilege escalation). For example, the "allow-development" rule could inadvertently enable unsafe actions.
    • Action: Add a section in the tutorial to discuss security best practices for policy design, including OWASP compliance.
  3. Type Safety and Pydantic Models:

    • The tutorial demonstrates Pydantic validation but does not specify whether strict type enforcement is enabled (e.g., strict=True in Pydantic models). This could lead to silent type coercion.
    • Action: Enable strict type enforcement in Pydantic models and update the tutorial examples.
  4. Policy Diffing in Chapter 7:

    • The diffing mechanism in Chapter 7 is not fully shown in the truncated diff. Ensure that the diffing logic accounts for semantic changes (e.g., escalation vs hard deny) and not just structural differences.
    • Action: Add examples of semantic diffing and regression detection in Chapter 7.
  5. Sandbox Escape Prevention:

    • Policies that allow actions in development environments (e.g., allow-development) could inadvertently enable sandbox escapes. Ensure that policies explicitly restrict actions that could compromise the sandbox.
    • Action: Add examples of sandbox escape prevention in the tutorial.
  6. Testing Coverage:

    • The tutorial provides excellent examples for testing policies but does not mention coverage metrics. Ensure that the test suite covers all possible decision paths.
    • Action: Add a section on measuring test coverage for policies.
  7. Documentation Links:

    • The PR mentions verifying markdown links between chapters but does not confirm whether this was completed. Broken links could confuse users.
    • Action: Run a link checker on the documentation and fix any broken links.

Summary of Actions

Critical

  • Improve validation error handling in PolicyDocument.
  • Add deterministic tie-breaking logic for rules with the same priority.
  • Replace escalation keyword dependency with a dedicated schema field.

Warning

  • Address backward compatibility for schema changes.
  • Test and document CLI exit codes.

Suggestions

  • Ensure thread safety in PolicyEvaluator.
  • Address OWASP Agentic Top 10 risks in the tutorial.
  • Enable strict type enforcement in Pydantic models.
  • Enhance semantic diffing in Chapter 7.
  • Add sandbox escape prevention examples.
  • Include test coverage metrics.
  • Verify markdown links between chapters.

Final Notes

This PR significantly improves the tutorial and provides valuable guidance for policy testing and versioning. Addressing the critical and warning issues will ensure the robustness and security of the library, while the suggestions will enhance its usability and compliance with best practices.

@harinarayansrivatsan
Copy link
Copy Markdown
Contributor Author

@harinarayansrivatsan This PR has merge conflicts. Chapter 5 (#911) has been merged. Please rebase onto latest main so ch6-7 diff is clean. You can run: git fetch upstream && git rebase upstream/main then force-push. Once rebased, we will merge promptly.

Thanks Imran, this is done.

@imran-siddique imran-siddique merged commit 89f0206 into microsoft:main Apr 11, 2026
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation size/XL Extra large PR (500+ lines)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants