Skip to content

feat(examples): defense-in-depth security analyzer#2472

Open
Fieldnote-Echo wants to merge 5 commits intoOpenHands:mainfrom
Fieldnote-Echo:feat/defense-in-depth-security-analyzer
Open

feat(examples): defense-in-depth security analyzer#2472
Fieldnote-Echo wants to merge 5 commits intoOpenHands:mainfrom
Fieldnote-Echo:feat/defense-in-depth-security-analyzer

Conversation

@Fieldnote-Echo
Copy link

@Fieldnote-Echo Fieldnote-Echo commented Mar 16, 2026

Summary

Single-file custom SecurityAnalyzerBase example with four layers: whitelisted extraction preserving field boundaries, Unicode normalization (NFKC + invisible/bidi stripping), segment-aware deterministic policy rails, and regex pattern scanning with max-severity ensemble fusion.

Two design decisions intentionally diverge from #2067:

  • Max-severity fusion instead of noisy-OR — analyzers are correlated (same input) and the SDK boundary is categorical
  • Stdlib-only Unicode normalization instead of homoglyph replacement — TR39 confusable detection is documented as a known limitation via strict xfails, not silently omitted

Test coverage: baseline suite (93 tests) covering each pipeline layer, adversarial suite (32 tests) with TDD bug fixes, strict xfails for irreducible limitations, and hostile-input stress tests.

Closes #2067

Checklist

  • If the PR is changing/adding functionality, are there tests to reflect this?
  • If there is an example, have you run the example to make sure that it works?
  • If there are instructions on how to run the code, have you followed the instructions and made sure that it works?
  • If the feature is significant enough to require documentation, is there a PR open on the OpenHands/docs repository with the same branch name?
  • Is the github CI passing?

…al tests

Single-file example implementing layered security analysis for agent tool
calls: whitelisted extraction with field boundaries, Unicode normalization
(NFKC + invisible/bidi stripping), segment-aware deterministic policy rails,
regex pattern scanning, and max-severity ensemble fusion.

Includes baseline test suite (93 tests) covering extraction, normalization,
rails, pattern classification, ensemble fusion, and confirmation policy.
Adversarial test suite (32 tests) with TDD-driven bug fixes, strict xfails
documenting irreducible limitations, and hostile input stress tests.

Closes OpenHands#2067
Fieldnote-Echo and others added 3 commits March 16, 2026 17:15
The 30k-char extraction cap silently drops content past the limit.
Integrators who don't read the adversarial test suite won't know
they're exposed if their agent processes large inputs.
The digit-prefixed example filename (45_...) requires importlib to
import. Move the hack into conftest.py so it runs once at collection
time; both test files now just reference sys.modules.
@enyst
Copy link
Collaborator

enyst commented Mar 16, 2026

@OpenHands Do a /codereview on this PR. Investigate deeply the relevant code in the codebase.

@openhands-ai
Copy link

openhands-ai bot commented Mar 16, 2026

I'm on it! enyst can track my progress at all-hands.dev

Copy link
Collaborator

@enyst enyst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a deeper pass on the example + tests and found two important correctness issues that look worth fixing before merge. Current targeted tests/examples pass, but both of the cases below reproduce locally with small ActionEvent probes.

[examples/01_standalone_sdk/45_defense_in_depth_security.py, Lines 153-161 and 563-577] 🔍 Correctness: _extract_segments() feeds thought, reasoning_content, and summary into the same corpus that shell-destructive rails/patterns scan. That means non-executable reasoning text can flip an otherwise safe action to HIGH. Repro: an action whose actual command is ls /tmp but whose thought says I should avoid rm -rf / currently returns HIGH in both PatternSecurityAnalyzer and EnsembleSecurityAnalyzer. That seems to undercut the stated goal of avoiding reasoning-related false positives. I think the safest fix is to keep shell/permission detection scoped to executable fields (tool_call.arguments, maybe tool metadata) and, if you still want thought/summary coverage, handle that in a separate prompt-injection/text-only path.

[examples/01_standalone_sdk/45_defense_in_depth_security.py, Lines 345-346 and 446] 🛠️ Correctness: the raw-disk dd detection only matches if=... of=/dev/.... dd operands are order-independent, so common destructive forms like dd of=/dev/sda if=/dev/zero and dd bs=1M of=/dev/sda if=/dev/zero currently fall through as LOW end-to-end (rail PASS, pattern LOW, ensemble LOW). Please make this check order-independent and add a regression test for of= before if=.

@openhands-ai

This comment was marked as duplicate.

@Fieldnote-Echo
Copy link
Author

Thank you @enyst for the feedback. I am drafting fixes and will push again when they are ready.

Finding 1: shell-destructive patterns now scan only executable fields
(tool_name, tool_call.name, tool_call.arguments). Thought, reasoning,
and summary are scanned only for injection/social-engineering patterns.
Prevents false positives when an agent reasons about dangerous commands
it chose not to run.

Finding 2: dd raw-disk detection now matches of=/dev/ regardless of
operand order. Adds regression tests for reversed and interleaved
operands.
@Fieldnote-Echo
Copy link
Author

@enyst Thanks OpenHands for the thorough review. Both findings are fixed in c398cf2: extraction now splits into executable and text corpora so shell patterns never see reasoning content, and the dd rail matches of=/dev/ regardless of operand order.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

docs(examples): Custom SecurityAnalyzerBase with defense-in-depth patterns

2 participants