feat(examples): defense-in-depth security analyzer#2472
feat(examples): defense-in-depth security analyzer#2472Fieldnote-Echo wants to merge 5 commits intoOpenHands:mainfrom
Conversation
…al tests Single-file example implementing layered security analysis for agent tool calls: whitelisted extraction with field boundaries, Unicode normalization (NFKC + invisible/bidi stripping), segment-aware deterministic policy rails, regex pattern scanning, and max-severity ensemble fusion. Includes baseline test suite (93 tests) covering extraction, normalization, rails, pattern classification, ensemble fusion, and confirmation policy. Adversarial test suite (32 tests) with TDD-driven bug fixes, strict xfails documenting irreducible limitations, and hostile input stress tests. Closes OpenHands#2067
The 30k-char extraction cap silently drops content past the limit. Integrators who don't read the adversarial test suite won't know they're exposed if their agent processes large inputs.
The digit-prefixed example filename (45_...) requires importlib to import. Move the hack into conftest.py so it runs once at collection time; both test files now just reference sys.modules.
|
@OpenHands Do a /codereview on this PR. Investigate deeply the relevant code in the codebase. |
|
I'm on it! enyst can track my progress at all-hands.dev |
enyst
left a comment
There was a problem hiding this comment.
I did a deeper pass on the example + tests and found two important correctness issues that look worth fixing before merge. Current targeted tests/examples pass, but both of the cases below reproduce locally with small ActionEvent probes.
[examples/01_standalone_sdk/45_defense_in_depth_security.py, Lines 153-161 and 563-577] 🔍 Correctness: _extract_segments() feeds thought, reasoning_content, and summary into the same corpus that shell-destructive rails/patterns scan. That means non-executable reasoning text can flip an otherwise safe action to HIGH. Repro: an action whose actual command is ls /tmp but whose thought says I should avoid rm -rf / currently returns HIGH in both PatternSecurityAnalyzer and EnsembleSecurityAnalyzer. That seems to undercut the stated goal of avoiding reasoning-related false positives. I think the safest fix is to keep shell/permission detection scoped to executable fields (tool_call.arguments, maybe tool metadata) and, if you still want thought/summary coverage, handle that in a separate prompt-injection/text-only path.
[examples/01_standalone_sdk/45_defense_in_depth_security.py, Lines 345-346 and 446] 🛠️ Correctness: the raw-disk dd detection only matches if=... of=/dev/.... dd operands are order-independent, so common destructive forms like dd of=/dev/sda if=/dev/zero and dd bs=1M of=/dev/sda if=/dev/zero currently fall through as LOW end-to-end (rail PASS, pattern LOW, ensemble LOW). Please make this check order-independent and add a regression test for of= before if=.
This comment was marked as duplicate.
This comment was marked as duplicate.
|
Thank you @enyst for the feedback. I am drafting fixes and will push again when they are ready. |
Finding 1: shell-destructive patterns now scan only executable fields (tool_name, tool_call.name, tool_call.arguments). Thought, reasoning, and summary are scanned only for injection/social-engineering patterns. Prevents false positives when an agent reasons about dangerous commands it chose not to run. Finding 2: dd raw-disk detection now matches of=/dev/ regardless of operand order. Adds regression tests for reversed and interleaved operands.
Summary
Single-file custom
SecurityAnalyzerBaseexample with four layers: whitelisted extraction preserving field boundaries, Unicode normalization (NFKC + invisible/bidi stripping), segment-aware deterministic policy rails, and regex pattern scanning with max-severity ensemble fusion.Two design decisions intentionally diverge from #2067:
Test coverage: baseline suite (93 tests) covering each pipeline layer, adversarial suite (32 tests) with TDD bug fixes, strict xfails for irreducible limitations, and hostile-input stress tests.
Closes #2067
Checklist