docs: add policy-as-code tutorial chapters 6-7 (testing & versioning) by harinarayansrivatsan · Pull Request #916 · microsoft/agent-governance-toolkit

harinarayansrivatsan · 2026-04-10T13:45:14Z

Summary

Adds chapter 6 (policy testing): structural validation with Pydantic, declarative YAML test scenarios, cross-policy test matrices, and regression detection
Adds chapter 7 (policy versioning): side-by-side version comparison, structural diffing, behavioral regression detection, and deploy gates
Normalizes MIT license headers across chapters 1-4 markdown and YAML files for consistency with chapters 5-7
Updates README to link chapters 6-7 and removes "coming soon" notice

New files

docs/tutorials/policy-as-code/06-policy-testing.md
docs/tutorials/policy-as-code/07-policy-versioning.md
docs/tutorials/policy-as-code/examples/06_policy_testing.py
docs/tutorials/policy-as-code/examples/06_test_policy.yaml
docs/tutorials/policy-as-code/examples/06_test_scenarios.yaml
docs/tutorials/policy-as-code/examples/07_policy_versioning.py
docs/tutorials/policy-as-code/examples/07_policy_v1.yaml
docs/tutorials/policy-as-code/examples/07_policy_v2.yaml

Test plan

Run python docs/tutorials/policy-as-code/examples/06_policy_testing.py — all 4 parts pass, 8/8 scenarios pass, matrix finds expected surprise
Run python docs/tutorials/policy-as-code/examples/07_policy_versioning.py — diff accurate, regression correctly identified
Verify chapters 1-4 examples still run after adding license headers to YAML files
Verify all markdown links between chapters resolve correctly

Note: Stacked on #911 (chapter 5). Diff currently includes ch5 changes — once #911 merges and this branch is rebased onto main, the diff will show only chapters 6-7 and license header normalization.

github-actions · 2026-04-10T13:46:32Z

Welcome to the Agent Governance Toolkit! Thanks for your first pull request.
Please ensure tests pass, code follows style (ruff check), and you have signed the CLA.
See our Contributing Guide.

github-actions

🤖 AI Agent: code-reviewer

Feedback on Pull Request: docs: add policy-as-code tutorial chapters 6-7 (testing & versioning)

🔴 CRITICAL: Security Issues

Escalation Timeout Default Action
- The tutorial suggests setting DefaultTimeoutAction.ALLOW for less critical actions in certain scenarios. This is a dangerous recommendation, as it could lead to silent approvals of sensitive actions if the timeout expires. Even if the action is deemed "less critical," allowing it by default without human review introduces a potential security bypass.
- Action: Remove or strongly discourage the use of DefaultTimeoutAction.ALLOW in the documentation. Defaulting to DENY is the safer and more appropriate choice for all escalation scenarios.
Policy Rule Priority Conflicts
- The tutorial does not address potential priority conflicts between rules. For example, if two rules apply to the same tool and have overlapping conditions, the higher-priority rule will take precedence. However, if the priority order is misconfigured, this could lead to unintended decisions (e.g., escalation bypassed by a lower-priority rule).
- Action: Add a section to the tutorial explaining how to handle rule priority conflicts and the importance of testing for such scenarios.
Escalation Request Spoofing
- The tutorial does not mention any mechanisms to prevent spoofing of escalation requests. For example, an attacker could potentially craft a fake escalation request with a forged agent_id or action field to bypass security checks.
- Action: Document the need for cryptographic signing or authentication mechanisms for escalation requests to ensure their integrity and authenticity.

🟡 WARNING: Potential Breaking Changes

Backward Compatibility of Policy Schema
- The addition of new rules and features (e.g., escalation handling) may break existing policies that do not conform to the updated schema. For example, older policies without the message field or escalation rules might fail validation.
- Action: Ensure backward compatibility by providing migration tools or fallback mechanisms for older policy versions. Update the tutorial to include instructions for migrating legacy policies.

💡 Suggestions for Improvement

Automated Policy Testing
- The tutorial introduces policy testing but does not mention integration with CI/CD pipelines. Automated policy tests should be run as part of the CI/CD process to catch regressions early.
- Action: Add a section on integrating policy tests with CI/CD pipelines, including examples of how to run tests using pytest or similar frameworks.
Thread Safety in Escalation Handling
- The tutorial uses InMemoryApprovalQueue, which may not be thread-safe in concurrent environments. While this is acceptable for demonstration purposes, production systems should use thread-safe or distributed backends.
- Action: Add a note in the tutorial recommending thread-safe or distributed backends (e.g., Redis, RabbitMQ) for production use.
Policy Diffing and Versioning
- Chapter 7 introduces policy versioning but does not provide examples of how to handle structural diffs in YAML files. This could be useful for detecting unintended changes in policies.
- Action: Expand Chapter 7 to include examples of YAML diffing tools or libraries that can be used to compare policy versions.
Error Handling in Policy Validation
- The tutorial demonstrates how to catch validation errors but does not provide guidance on how to handle them effectively (e.g., logging, notifying developers).
- Action: Add best practices for handling validation errors, including logging and alerting mechanisms.
OWASP Agentic Top 10 Compliance
- The tutorial does not explicitly address compliance with OWASP Agentic Top 10 guidelines, such as ensuring audit trails for all policy decisions and escalation requests.
- Action: Add a section mapping the tutorial's features to OWASP Agentic Top 10 compliance, emphasizing auditability and traceability.
Improved Documentation Navigation
- The tutorial chapters are linked sequentially, but navigating between sections could be improved with a table of contents or sidebar navigation.
- Action: Add a table of contents or sidebar navigation to improve usability.

Summary

This pull request introduces valuable documentation for policy testing and versioning. However, it raises critical security concerns, particularly around escalation timeout defaults and spoofing risks. Addressing these issues is essential to ensure the library remains secure and compliant with best practices. Additionally, improving backward compatibility, thread safety, and documentation usability will enhance the overall quality of the tutorial.

Priority Actions:

Remove or discourage DefaultTimeoutAction.ALLOW.
Document cryptographic signing for escalation requests.
Address rule priority conflicts in the tutorial.

Suggested Enhancements:

Integrate policy testing into CI/CD pipelines.
Recommend thread-safe backends for production.
Expand policy versioning examples to include YAML diffing.
Map features to OWASP Agentic Top 10 compliance.

github-actions · 2026-04-10T13:46:53Z

🤖 AI Agent: security-scanner — Security Analysis of the Pull Request

Security Analysis of the Pull Request

This pull request primarily adds documentation and examples for chapters 6 and 7 of the "Policy-as-Code" tutorial. While the changes are mainly educational and do not directly alter the core functionality of the microsoft/agent-governance-toolkit, it is still important to evaluate the provided examples and documentation for potential security issues, as they could influence how users implement and test policies.

Findings

1. Prompt Injection Defense Bypass

Risk: None identified in this PR. The changes are limited to documentation and examples, and there is no direct interaction with user-provided prompts or inputs in the core library.
Rating: 🔵 LOW
Recommendation: No changes needed.

2. Policy Engine Circumvention

Risk: The examples provided in 06_policy_testing.py and 07_policy_versioning.py demonstrate how to test policies using the PolicyEvaluator and PolicyDocument classes. There is no indication of any mechanism to bypass the policy engine in these examples.
Rating: 🔵 LOW
Recommendation: No changes needed.

3. Trust Chain Weaknesses

Risk: None identified. The changes do not involve SPIFFE/SVID validation, certificate pinning, or other trust chain mechanisms.
Rating: 🔵 LOW
Recommendation: No changes needed.

4. Credential Exposure

Risk: None identified. The examples and documentation do not include any hardcoded credentials or sensitive information.
Rating: 🔵 LOW
Recommendation: No changes needed.

5. Sandbox Escape

Risk: None identified. The examples do not involve any execution of untrusted code or system-level operations that could lead to sandbox escapes.
Rating: 🔵 LOW
Recommendation: No changes needed.

6. Deserialization Attacks

Risk: The examples use yaml.safe_load() for loading YAML files, which is a safe method for deserialization. However, it is important to ensure that users do not replace this with yaml.load() in their own implementations, as it could lead to deserialization vulnerabilities.
Rating: 🔵 LOW
Recommendation: Add a note in the documentation explicitly warning users to always use yaml.safe_load() for deserialization to prevent potential security risks.

7. Race Conditions

Risk: None identified. The examples and documentation do not involve concurrent execution or time-of-check-to-time-of-use (TOCTOU) scenarios.
Rating: 🔵 LOW
Recommendation: No changes needed.

8. Supply Chain

Risk: The examples rely on the pydantic and pyyaml libraries. While these are widely used and generally considered secure, it is important to ensure that the versions used are up-to-date and free from known vulnerabilities.
Rating: 🟡 MEDIUM
Recommendation: Add a note in the documentation to encourage users to keep dependencies updated. Additionally, consider adding dependency scanning to the CI/CD pipeline to detect outdated or vulnerable dependencies.

Summary of Findings

Finding	Risk	Recommendation
Prompt injection defense bypass	🔵 LOW	No changes needed.
Policy engine circumvention	🔵 LOW	No changes needed.
Trust chain weaknesses	🔵 LOW	No changes needed.
Credential exposure	🔵 LOW	No changes needed.
Sandbox escape	🔵 LOW	No changes needed.
Deserialization attacks	🔵 LOW	Add a note warning users to always use `yaml.safe_load()` for deserialization.
Race conditions	🔵 LOW	No changes needed.
Supply chain	🟡 MEDIUM	Add a note about keeping dependencies updated and consider adding dependency scanning.

Additional Recommendations

Testing for Edge Cases: While the examples provided are comprehensive, it would be beneficial to include edge cases in the test scenarios, such as:
- Empty or malformed context inputs.
- Policies with conflicting rules of the same priority.
- Policies with missing or extra fields.
Error Handling: Ensure that the examples demonstrate robust error handling for scenarios like missing files, invalid YAML, or unexpected evaluation results.
Security Best Practices: Include a section in the documentation emphasizing security best practices when writing policies, such as avoiding overly permissive defaults and carefully reviewing changes to escalation rules.
Versioning Guidance: Since Chapter 7 introduces policy versioning, it would be helpful to include guidance on securely managing policy versions, such as using cryptographic signatures to verify the integrity of policy files.

Final Assessment

This pull request primarily focuses on documentation and examples, with no changes to the core library. The examples provided are well-structured and demonstrate good practices for policy testing and versioning. However, there are minor recommendations to improve the security posture of the documentation and examples, particularly around deserialization and dependency management.

Overall Risk Rating for this PR: 🔵 LOW

github-actions

🤖 AI Agent: code-reviewer

Pull Request Review: docs: add policy-as-code tutorial chapters 6-7 (testing & versioning)

🔴 CRITICAL Issues

1. Escalation Timeout Default Action

The tutorial suggests using DefaultTimeoutAction.ALLOW as an alternative to DENY. While this is presented as a rare case, it introduces a potential security bypass. If an escalation request times out and defaults to ALLOW, malicious or unintended actions could proceed without human review.
Recommendation: Clearly document that DefaultTimeoutAction.ALLOW should only be used for non-critical actions. Add safeguards in the library to warn or prevent its use for high-risk actions.

2. Policy Rule Priority Conflicts

The example policies rely heavily on rule priorities to determine outcomes. If two rules have overlapping conditions and the same priority, the behavior is undefined. This could lead to security vulnerabilities if a lower-priority rule inadvertently overrides a higher-priority rule.
Recommendation: Add validation logic to detect and prevent overlapping rules with identical priorities. This should be highlighted in the tutorial as a potential pitfall.

3. EscalationHandler Timeout Behavior

The EscalationHandler timeout mechanism relies on a default action, but the tutorial does not address what happens if the timeout value is set to None or an extremely high value (e.g., timeout_seconds=999999). This could lead to agents waiting indefinitely, potentially causing denial-of-service scenarios.
Recommendation: Enforce a reasonable maximum timeout value in the library and document this limitation in the tutorial.

🟡 WARNING: Potential Breaking Changes

1. License Header Normalization

Adding MIT license headers to existing files (chapters 1-4) changes the file content. If any external systems or scripts rely on exact file hashes or content, this could cause issues.
Recommendation: Ensure that all downstream dependencies are updated to account for these changes. Communicate this update clearly in the release notes.

2. Policy Versioning Diff Behavior

The tutorial introduces structural diffing for policy versioning. If users rely on the current behavior of policy evaluation without versioning, this could lead to unexpected results when deploying new policies.
Recommendation: Provide clear migration guidance for users transitioning to policy versioning. Include examples of how to handle backward compatibility.

💡 Suggestions for Improvement

1. Thread Safety in EscalationHandler

The InMemoryApprovalQueue is used in the examples, but it is unclear whether it is thread-safe. If multiple agents are running concurrently, race conditions could occur when accessing or modifying the queue.
Recommendation: Ensure that InMemoryApprovalQueue is thread-safe or explicitly document that it is not suitable for concurrent use. Consider providing examples with a thread-safe backend, such as a database or message broker.

2. Policy Testing Coverage

The tutorial introduces policy testing but does not emphasize edge cases, such as testing for unintended rule overlaps or missing escalation tags. These are critical for security-focused applications.
Recommendation: Expand the tutorial to include examples of edge case testing, such as overlapping rules, missing escalation tags, or malformed policies.

3. SPIFFE/SVID Integration

The tutorial does not mention how policies interact with SPIFFE/SVID for agent identity verification. This is a missed opportunity to highlight the library's security features.
Recommendation: Add a section to the tutorial explaining how SPIFFE/SVID can be used to authenticate agents and enforce identity-based policies.

4. Behavioral Regression Detection

The tutorial mentions behavioral regression detection but does not provide concrete examples of how to implement it.
Recommendation: Include a detailed example of a regression test that compares policy behavior across versions, highlighting potential issues like changes in escalation behavior or default actions.

5. Markdown Link Verification

The test plan includes verifying markdown links between chapters but does not automate this step.
Recommendation: Add a script to the CI pipeline that checks for broken links in markdown files.

6. Type Safety and Pydantic Validation

The tutorial demonstrates Pydantic validation for policies but does not emphasize type safety for test scenarios.
Recommendation: Extend the tutorial to include type-safe definitions for test scenarios, ensuring that all inputs are validated before execution.

Summary

This pull request introduces valuable documentation for policy testing and versioning, but it raises critical security concerns and potential backward compatibility issues. Addressing these concerns will improve the library's robustness and ensure compliance with security best practices.

Actions Required:

Address 🔴 CRITICAL issues related to escalation timeout behavior and rule priority conflicts.
Mitigate 🟡 WARNING risks by documenting changes and providing migration guidance.
Implement 💡 SUGGESTIONS to enhance thread safety, testing coverage, and SPIFFE/SVID integration.

Once these issues are resolved, the documentation will be a strong addition to the repository.

Copilot

Pull request overview

Adds new “Policy-as-Code” tutorial content covering automated policy testing (chapter 6) and policy versioning/regression detection (chapter 7), plus updates navigation and normalizes MIT license headers across earlier chapters/examples.

Changes:

Add chapters 6–7 markdown + runnable Python/YAML examples for scenario testing, test matrices, and v1/v2 comparisons.
Update tutorial README + chapter navigation links to include the new chapters (and chapter 5 due to stack).
Normalize MIT license headers across chapters 1–4 markdown and YAML example files.

Reviewed changes

Copilot reviewed 23 out of 23 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
docs/tutorials/policy-as-code/README.md	Links chapters 5–7 (removes “coming soon”).
docs/tutorials/policy-as-code/examples/07_policy_versioning.py	New runnable example to diff/test v1 vs v2 and flag regressions.
docs/tutorials/policy-as-code/examples/07_policy_v2.yaml	New v2 policy used for structural/behavioral comparison demo.
docs/tutorials/policy-as-code/examples/07_policy_v1.yaml	New v1 baseline policy used for comparison demo.
docs/tutorials/policy-as-code/examples/06_test_scenarios.yaml	New declarative scenario set for CLI-based policy testing.
docs/tutorials/policy-as-code/examples/06_test_policy.yaml	New combined test policy used by scenario runner/matrix.
docs/tutorials/policy-as-code/examples/06_policy_testing.py	New runnable example covering validation, scenarios, matrix, regression check.
docs/tutorials/policy-as-code/examples/05_approval_workflows.py	(Stacked) Runnable example for human-in-the-loop escalation.
docs/tutorials/policy-as-code/examples/05_approval_policy.yaml	(Stacked) YAML policy for approval workflow chapter.
docs/tutorials/policy-as-code/examples/04_support_team_policy.yaml	Add MIT header to existing example YAML.
docs/tutorials/policy-as-code/examples/04_global_policy.yaml	Add MIT header to existing example YAML.
docs/tutorials/policy-as-code/examples/04_env_policy.yaml	Add MIT header to existing example YAML.
docs/tutorials/policy-as-code/examples/03_rate_limit_policy.yaml	Add MIT header to existing example YAML.
docs/tutorials/policy-as-code/examples/02_reader_policy.yaml	Add MIT header to existing example YAML.
docs/tutorials/policy-as-code/examples/02_admin_policy.yaml	Add MIT header to existing example YAML.
docs/tutorials/policy-as-code/examples/01_first_policy.yaml	Add MIT header to existing example YAML.
docs/tutorials/policy-as-code/07-policy-versioning.md	New chapter 7 tutorial doc (diffing + regression gates).
docs/tutorials/policy-as-code/06-policy-testing.md	New chapter 6 tutorial doc (validation, scenarios, matrices).
docs/tutorials/policy-as-code/05-approval-workflows.md	(Stacked) New chapter 5 tutorial doc (escalation workflows).
docs/tutorials/policy-as-code/04-conditional-policies.md	Add MIT header + update “Next” link to chapter 5.
docs/tutorials/policy-as-code/03-rate-limiting.md	Add MIT header.
docs/tutorials/policy-as-code/02-capability-scoping.md	Add MIT header.
docs/tutorials/policy-as-code/01-your-first-policy.md	Add MIT header.

imran-siddique · 2026-04-11T15:38:29Z

@harinarayansrivatsan This PR has merge conflicts. Chapter 5 (#911) has been merged. Please rebase onto latest main so ch6-7 diff is clean. You can run: git fetch upstream && git rebase upstream/main then force-push. Once rebased, we will merge promptly.

Normalize license headers across all tutorial chapters. Chapters 5-7 already had them; this adds them to chapters 1-4 markdown and YAML example files for consistency.

Covers structural validation with Pydantic, declarative YAML test scenarios, cross-policy test matrices, and regression detection. Includes runnable Python example and test fixtures.

Covers side-by-side version comparison, structural diffing, behavioral regression detection, and deploy gates. Updates README to link chapters 6-7 and fixes chapter 5 nav link.

github-actions

🤖 AI Agent: code-reviewer

Review Summary

This pull request introduces chapters 6 and 7 of the "Policy-as-Code" tutorial, focusing on policy testing and versioning. The additions include detailed explanations, examples, and Python scripts for validating, testing, and comparing policies. The changes are primarily documentation and example files, with no modifications to the core library code. The tutorial content is well-structured and demonstrates best practices for policy testing and versioning.

Below are the detailed observations and recommendations:

🔴 CRITICAL: Security Issues

Policy Escalation Keyword Dependency
- The tutorial highlights a critical issue where the removal of the "requires human approval" keyword from the transfer_funds rule silently changes the behavior from "escalate" to "deny." This dependency on a specific string for escalation detection is fragile and prone to errors.
- Recommendation: Refactor the escalation logic to use a dedicated field (e.g., action: escalate) instead of relying on string matching in the message field. This will make the policy schema more robust and reduce the risk of accidental security bypasses.
Default Allow Action
- The default action in the example policy (06_test_policy.yaml) is set to allow. This is risky, as it could lead to unintended access if no specific rule matches.
- Recommendation: Default actions should be set to deny unless explicitly required. If allow is necessary, ensure that the tutorial emphasizes the risks and provides guidance on how to mitigate them.

🟡 WARNING: Potential Breaking Changes

License Header Normalization
- Adding license headers to existing files (chapters 1-4) could potentially break workflows or scripts that rely on specific file formats or content. While this is unlikely, it is worth verifying that these changes do not affect any downstream processes.
- Recommendation: Confirm that the added license headers do not interfere with any automated parsing or processing of these files.

💡 Suggestions for Improvement

Thread Safety in Policy Evaluation
- The examples use the PolicyEvaluator class to evaluate policies. If this class is used in a concurrent environment (e.g., multiple agents evaluating policies simultaneously), ensure that it is thread-safe.
- Recommendation: Add explicit documentation about thread safety or include tests to verify concurrent usage.
Type Safety and Pydantic Validation
- The tutorial demonstrates the use of PolicyDocument.model_validate() for validating policy structures. While this is a good practice, consider adding examples of custom validators for complex fields (e.g., condition or action) to ensure type safety and prevent invalid configurations.
- Recommendation: Extend the tutorial to include advanced validation techniques using Pydantic.
Backward Compatibility
- The tutorial introduces new CLI commands (validate and test) and Python APIs (PolicyEvaluator, PolicyDocument.from_yaml). Ensure these additions are backward-compatible with existing workflows.
- Recommendation: Add explicit notes in the documentation about compatibility with earlier versions of the library.
Sandbox Escape Vectors
- While the tutorial does not directly address sandboxing, policies often interact with external systems. Ensure that the PolicyEvaluator and related components are designed to prevent sandbox escapes (e.g., by validating inputs rigorously and avoiding unsafe operations).
- Recommendation: Add a section in the tutorial about securing policy evaluation against malicious inputs.
OWASP Agentic Top 10 Compliance
- The tutorial does not explicitly address OWASP Agentic Top 10 risks, such as "Policy Injection" or "Unintended Policy Interactions."
- Recommendation: Add a dedicated section in the tutorial to discuss these risks and how the library mitigates them.
Testing Coverage
- The tutorial provides examples for testing policies, but it does not include tests for edge cases (e.g., malformed YAML files, conflicting rules, or extreme priority values).
- Recommendation: Add more comprehensive test scenarios to cover edge cases and unexpected interactions.
Behavioral Regression Detection
- The tutorial introduces regression detection but does not provide guidance on integrating this into CI/CD pipelines.
- Recommendation: Include a section on how to automate regression detection using GitHub Actions or other CI/CD tools.

Conclusion

The pull request is a valuable addition to the repository, providing detailed guidance on policy testing and versioning. However, it introduces critical security risks related to escalation keyword dependency and default allow actions. Addressing these issues will significantly improve the robustness and security of the library.

Once the recommendations are implemented, the tutorial will be a strong resource for users adopting policy-as-code practices.

github-actions

🤖 AI Agent: code-reviewer

Review Summary

This pull request adds two new chapters (6 and 7) to the "Policy-as-Code" tutorial, focusing on policy testing and versioning. It also normalizes license headers across existing chapters and updates the README. The tutorial content is well-written and provides actionable examples for validating, testing, and versioning policies. However, there are areas that require attention to ensure correctness, security, and maintainability.

🔴 CRITICAL Issues

Validation Error Handling in PolicyDocument:
- The PolicyDocument.from_yaml() method uses Pydantic for validation, which is good for catching structural errors. However, there is no mention of how the system handles malicious inputs or unexpected edge cases (e.g., deeply nested YAML files, extremely large files, or invalid types). Ensure that the validation logic includes safeguards against potential denial-of-service (DoS) attacks caused by maliciously crafted YAML files.
- Action: Add tests for edge cases such as deeply nested YAML, large files, and invalid types to ensure the validation mechanism is robust.
PolicyEvaluator Rule Merging:
- The merging of policies in PolicyEvaluator is based on priority. However, there is no mention of how conflicting rules are resolved when they have the same priority. This could lead to unpredictable behavior in production.
- Action: Implement deterministic tie-breaking logic for rules with the same priority. Document this behavior clearly in the tutorial.
Escalation Keyword Dependency:
- The escalation system relies on the presence of the exact phrase "requires human approval" in the message field to distinguish between escalation and hard deny. This is fragile and prone to errors during policy updates.
- Action: Replace the reliance on string matching with a dedicated field in the schema, such as escalation: true. Update the tutorial and examples accordingly.

🟡 WARNING Issues

Backward Compatibility:
- Adding the escalation: true field (as suggested above) would break backward compatibility for existing policies that rely on the "requires human approval" message.
- Action: Provide a migration guide or fallback mechanism to handle older policies gracefully.
CLI Exit Codes:
- The CLI commands (validate, test) use exit codes for success/failure. Ensure that these exit codes are consistent across all tools and documented clearly. Any changes to exit code behavior could break CI pipelines relying on them.
- Action: Add explicit tests for CLI exit codes and document them in the tutorial.

💡 Suggestions for Improvement

Thread Safety in PolicyEvaluator:
- The PolicyEvaluator merges policies and evaluates decisions, but there is no mention of thread safety. If agents execute concurrently, ensure that shared resources (e.g., policy objects) are not modified during evaluation.
- Action: Add thread-safety tests and document whether PolicyEvaluator is safe for concurrent use.
OWASP Agentic Top 10 Compliance:
- The tutorial does not address OWASP Agentic Top 10 risks explicitly (e.g., sandbox escape vectors, privilege escalation). For example, the "allow-development" rule could inadvertently enable unsafe actions.
- Action: Add a section in the tutorial to discuss security best practices for policy design, including OWASP compliance.
Type Safety and Pydantic Models:
- The tutorial demonstrates Pydantic validation but does not specify whether strict type enforcement is enabled (e.g., strict=True in Pydantic models). This could lead to silent type coercion.
- Action: Enable strict type enforcement in Pydantic models and update the tutorial examples.
Policy Diffing in Chapter 7:
- The diffing mechanism in Chapter 7 is not fully shown in the truncated diff. Ensure that the diffing logic accounts for semantic changes (e.g., escalation vs hard deny) and not just structural differences.
- Action: Add examples of semantic diffing and regression detection in Chapter 7.
Sandbox Escape Prevention:
- Policies that allow actions in development environments (e.g., allow-development) could inadvertently enable sandbox escapes. Ensure that policies explicitly restrict actions that could compromise the sandbox.
- Action: Add examples of sandbox escape prevention in the tutorial.
Testing Coverage:
- The tutorial provides excellent examples for testing policies but does not mention coverage metrics. Ensure that the test suite covers all possible decision paths.
- Action: Add a section on measuring test coverage for policies.
Documentation Links:
- The PR mentions verifying markdown links between chapters but does not confirm whether this was completed. Broken links could confuse users.
- Action: Run a link checker on the documentation and fix any broken links.

Summary of Actions

Critical

Improve validation error handling in PolicyDocument.
Add deterministic tie-breaking logic for rules with the same priority.
Replace escalation keyword dependency with a dedicated schema field.

Warning

Address backward compatibility for schema changes.
Test and document CLI exit codes.

Suggestions

Ensure thread safety in PolicyEvaluator.
Address OWASP Agentic Top 10 risks in the tutorial.
Enable strict type enforcement in Pydantic models.
Enhance semantic diffing in Chapter 7.
Add sandbox escape prevention examples.
Include test coverage metrics.
Verify markdown links between chapters.

Final Notes

This PR significantly improves the tutorial and provides valuable guidance for policy testing and versioning. Addressing the critical and warning issues will ensure the robustness and security of the library, while the suggestions will enhance its usability and compliance with best practices.

harinarayansrivatsan · 2026-04-11T15:58:16Z

@harinarayansrivatsan This PR has merge conflicts. Chapter 5 (#911) has been merged. Please rebase onto latest main so ch6-7 diff is clean. You can run: git fetch upstream && git rebase upstream/main then force-push. Once rebased, we will merge promptly.

Thanks Imran, this is done.

Copilot AI review requested due to automatic review settings April 10, 2026 13:45

github-actions bot added documentation Improvements or additions to documentation size/XL Extra large PR (500+ lines) labels Apr 10, 2026

github-actions bot reviewed Apr 10, 2026

View reviewed changes

Copilot started reviewing on behalf of harinarayansrivatsan April 10, 2026 13:47 View session

github-actions bot reviewed Apr 10, 2026

View reviewed changes

Copilot AI reviewed Apr 10, 2026

View reviewed changes

harinarayansrivatsan added 3 commits April 11, 2026 10:44

docs: add MIT license headers to chapters 1-4 files

2f10dd1

Normalize license headers across all tutorial chapters. Chapters 5-7 already had them; this adds them to chapters 1-4 markdown and YAML example files for consistency.

docs: add policy-as-code tutorial chapter 6 (policy testing)

6c7512f

Covers structural validation with Pydantic, declarative YAML test scenarios, cross-policy test matrices, and regression detection. Includes runnable Python example and test fixtures.

docs: add policy-as-code tutorial chapter 7 (policy versioning)

2bc1ee1

Covers side-by-side version comparison, structural diffing, behavioral regression detection, and deploy gates. Updates README to link chapters 6-7 and fixes chapter 5 nav link.

harinarayansrivatsan force-pushed the docs/706-chapters-6-7 branch from 7e761e0 to 2bc1ee1 Compare April 11, 2026 15:44

github-actions bot reviewed Apr 11, 2026

View reviewed changes

Merge branch 'main' into docs/706-chapters-6-7

10e030d

github-actions bot reviewed Apr 11, 2026

View reviewed changes

imran-siddique merged commit 89f0206 into microsoft:main Apr 11, 2026
6 of 7 checks passed

Conversation

harinarayansrivatsan commented Apr 10, 2026

Summary

New files

Test plan

Uh oh!

github-actions bot commented Apr 10, 2026

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

🤖 AI Agent: code-reviewer

Feedback on Pull Request: docs: add policy-as-code tutorial chapters 6-7 (testing & versioning)

🔴 CRITICAL: Security Issues

🟡 WARNING: Potential Breaking Changes

💡 Suggestions for Improvement

Summary

Uh oh!

github-actions bot commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Security Analysis of the Pull Request

Findings

1. Prompt Injection Defense Bypass

2. Policy Engine Circumvention

3. Trust Chain Weaknesses

4. Credential Exposure

5. Sandbox Escape

6. Deserialization Attacks

7. Race Conditions

8. Supply Chain

Summary of Findings

Additional Recommendations

Final Assessment

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

🤖 AI Agent: code-reviewer

Pull Request Review: docs: add policy-as-code tutorial chapters 6-7 (testing & versioning)

🔴 CRITICAL Issues

1. Escalation Timeout Default Action

2. Policy Rule Priority Conflicts

3. EscalationHandler Timeout Behavior

🟡 WARNING: Potential Breaking Changes

1. License Header Normalization

2. Policy Versioning Diff Behavior

💡 Suggestions for Improvement

1. Thread Safety in EscalationHandler

2. Policy Testing Coverage

3. SPIFFE/SVID Integration

4. Behavioral Regression Detection

5. Markdown Link Verification

6. Type Safety and Pydantic Validation

Summary

Actions Required:

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

imran-siddique commented Apr 11, 2026

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

🤖 AI Agent: code-reviewer

Review Summary

🔴 CRITICAL: Security Issues

🟡 WARNING: Potential Breaking Changes

💡 Suggestions for Improvement

Conclusion

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

🤖 AI Agent: code-reviewer

github-actions bot commented Apr 10, 2026 •

edited

Loading