test: adversarial payload library for pipeline validation by VibhorGautam · Pull Request #5 · c2siorg/acf-sdk

VibhorGautam · 2026-03-17T06:42:07Z

Adds the initial adversarial payload library from issue #2

This is test data and coverage mapping, not implementation - wanted to get the attack taxonomy locked down so it can be used to validate whichever architecture and policy language the team converges on

What's included

14 attack payloads across 4 pipeline layers (prompt, context, normalization, memory/provenance)
Each payload specifies the expected detection layer and enforcement action
Coverage matrix tracking which attack categories map to which layers
Multi-turn stateful attack pattern (PI-004) for testing temporal context
Realistic encoding evasion payloads (Unicode homoglyphs, Base64, zero-width chars, leetspeak)

Payload schema

Every payload has: id, name, description, payload, expected_detection_layer, expected_action, severity, tags

The schema is framework-agnostic so payloads work regardless of whether the pipeline uses LangGraph, LangChain, or a custom loop

Closes #2

Initial set of 14 attack payloads organized by pipeline layer (prompt, context, normalization, memory/provenance). Each payload specifies which detection layer should catch it and the expected enforcement action, so we can track coverage gaps as modules get built. Ref: c2siorg#2

eddymontana · 2026-03-17T07:10:54Z

Great work on this taxonomy, @VibhorGautam . This is exactly the 'Ground Truth' we need for the PDP Evaluation Pipeline.

I’ve particularly noted the Memory Layer payloads; these will be the primary test cases for the Stateful Aggregator logic we’ve defined in the Phase-1 Architecture Contract.

I’ll be referencing this library in my GSoC proposal as the baseline for our Phase 2 validation. This ensures that every scanner we build has a clear target to hit.

AdityaCJaiswal · 2026-03-17T11:01:04Z

Excellent engineering, @VibhorGautam . This taxonomy is exactly what we need to ensure our pipeline is mathematically sound.

While this will definitely serve as the ground truth for the Phase 2 scanners, we actually need to use this immediately in Phase 1.

Before these payloads even reach the PDP Evaluation Pipeline, they have to survive the UDS IPC handshake. Your inclusion of encoding evasion payloads (Unicode homoglyphs, zero-width chars, Base64) is critical here. We need to guarantee that the binary packing in the PEP SDK interceptor (PR #4) does not corrupt or strip these obfuscations during the socket transfer before the scanners can evaluate them.

I am pulling this branch down locally today. I will pipe these exact 14 payloads through the /tmp/acf.sock UDS transport we are building to stress-test the byte-offsets and ensure lossless transmission to the sidecar.

Outstanding work providing the exact telemetry exhaust we need to validate the OTel audit plane.

Ananya44444 · 2026-03-19T14:09:54Z

This is a really strong baseline, especially the coverage of RAG poisoning, tool reinjection, and multi-turn drift.

While going through the payloads, I noticed a few potential extensions that could improve real-world robustness:

Cross-layer attacks (e.g., encoded payloads that only become malicious after normalization)
False positive cases to validate precision (benign inputs containing trigger phrases)
State/control-plane injection attempts (overriding flags like is_safe)
Combined obfuscation techniques (homoglyph + zero-width)

Happy to contribute a small set of payloads covering these if that aligns with the direction.

VibhorGautam · 2026-03-19T17:26:40Z

Good suggestions @Ananya44444 - cross-layer attacks are definitely a gap right now. Payloads that look benign pre-normalization but become malicious after decoding would catch a whole class of bugs where stage ordering matters

False positive cases are a good call too, if we only test with malicious inputs we have no idea what the precision looks like

Happy to have you contribute those, open a PR against the same payloads directory and i can review, or push them into this branch directly - either way works

Also going to restructure the existing payloads to align with Tharindu's v0.2 pipeline stages (Validate, Normalise, Scan, Aggregate) so everything maps to the canonical architecture

Ananya44444 · 2026-03-19T18:49:05Z

@VibhorGautam Thanks! I’ve added cross-layer payloads and a false positive case in a follow-up PR #11 , built on top of this branch.
Would love your feedback . Happy to iterate, especially with the upcoming v0.2 pipeline restructuring.

VibhorGautam · 2026-05-26T12:47:24Z

superseded by #26

Ananya44444 mentioned this pull request Mar 19, 2026

test: add cross-layer, false positive, and mixed obfuscation adversarial payloads #11

Open

Pranjal0410 mentioned this pull request Mar 20, 2026

feat: adversarial benchmark harness: 50 payloads, 27 tests, policy matrix coverage #12

Open

4 tasks

VibhorGautam mentioned this pull request Mar 28, 2026

test(policies): adversarial test fixtures for v1 hook policies #26

Merged

VibhorGautam closed this May 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: adversarial payload library for pipeline validation#5

test: adversarial payload library for pipeline validation#5
VibhorGautam wants to merge 1 commit into
c2siorg:mainfrom
VibhorGautam:adversarial-test-taxonomy

VibhorGautam commented Mar 17, 2026

Uh oh!

eddymontana commented Mar 17, 2026

Uh oh!

AdityaCJaiswal commented Mar 17, 2026

Uh oh!

Ananya44444 commented Mar 19, 2026

Uh oh!

VibhorGautam commented Mar 19, 2026 •

edited

Loading

Uh oh!

Ananya44444 commented Mar 19, 2026

Uh oh!

VibhorGautam commented May 26, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

VibhorGautam commented Mar 17, 2026

What's included

Payload schema

Uh oh!

eddymontana commented Mar 17, 2026

Uh oh!

AdityaCJaiswal commented Mar 17, 2026

Uh oh!

Ananya44444 commented Mar 19, 2026

Uh oh!

VibhorGautam commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Ananya44444 commented Mar 19, 2026

Uh oh!

VibhorGautam commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

VibhorGautam commented Mar 19, 2026 •

edited

Loading

VibhorGautam commented May 26, 2026 •

edited

Loading