Skip to content

test: adversarial payload library for pipeline validation#5

Closed
VibhorGautam wants to merge 1 commit into
c2siorg:mainfrom
VibhorGautam:adversarial-test-taxonomy
Closed

test: adversarial payload library for pipeline validation#5
VibhorGautam wants to merge 1 commit into
c2siorg:mainfrom
VibhorGautam:adversarial-test-taxonomy

Conversation

@VibhorGautam

Copy link
Copy Markdown
Contributor

Adds the initial adversarial payload library from issue #2

This is test data and coverage mapping, not implementation - wanted to get the attack taxonomy locked down so it can be used to validate whichever architecture and policy language the team converges on

What's included

  • 14 attack payloads across 4 pipeline layers (prompt, context, normalization, memory/provenance)
  • Each payload specifies the expected detection layer and enforcement action
  • Coverage matrix tracking which attack categories map to which layers
  • Multi-turn stateful attack pattern (PI-004) for testing temporal context
  • Realistic encoding evasion payloads (Unicode homoglyphs, Base64, zero-width chars, leetspeak)

Payload schema

Every payload has: id, name, description, payload, expected_detection_layer, expected_action, severity, tags

The schema is framework-agnostic so payloads work regardless of whether the pipeline uses LangGraph, LangChain, or a custom loop

Closes #2

Initial set of 14 attack payloads organized by pipeline layer
(prompt, context, normalization, memory/provenance).

Each payload specifies which detection layer should catch it
and the expected enforcement action, so we can track coverage
gaps as modules get built.

Ref: c2siorg#2
@eddymontana

Copy link
Copy Markdown

Great work on this taxonomy, @VibhorGautam . This is exactly the 'Ground Truth' we need for the PDP Evaluation Pipeline.

I’ve particularly noted the Memory Layer payloads; these will be the primary test cases for the Stateful Aggregator logic we’ve defined in the Phase-1 Architecture Contract.

I’ll be referencing this library in my GSoC proposal as the baseline for our Phase 2 validation. This ensures that every scanner we build has a clear target to hit.

@AdityaCJaiswal

Copy link
Copy Markdown

Excellent engineering, @VibhorGautam . This taxonomy is exactly what we need to ensure our pipeline is mathematically sound.

While this will definitely serve as the ground truth for the Phase 2 scanners, we actually need to use this immediately in Phase 1.

Before these payloads even reach the PDP Evaluation Pipeline, they have to survive the UDS IPC handshake. Your inclusion of encoding evasion payloads (Unicode homoglyphs, zero-width chars, Base64) is critical here. We need to guarantee that the binary packing in the PEP SDK interceptor (PR #4) does not corrupt or strip these obfuscations during the socket transfer before the scanners can evaluate them.

I am pulling this branch down locally today. I will pipe these exact 14 payloads through the /tmp/acf.sock UDS transport we are building to stress-test the byte-offsets and ensure lossless transmission to the sidecar.

Outstanding work providing the exact telemetry exhaust we need to validate the OTel audit plane.

@Ananya44444

Copy link
Copy Markdown

This is a really strong baseline, especially the coverage of RAG poisoning, tool reinjection, and multi-turn drift.

While going through the payloads, I noticed a few potential extensions that could improve real-world robustness:

  • Cross-layer attacks (e.g., encoded payloads that only become malicious after normalization)
  • False positive cases to validate precision (benign inputs containing trigger phrases)
  • State/control-plane injection attempts (overriding flags like is_safe)
  • Combined obfuscation techniques (homoglyph + zero-width)

Happy to contribute a small set of payloads covering these if that aligns with the direction.

@VibhorGautam

VibhorGautam commented Mar 19, 2026

Copy link
Copy Markdown
Contributor Author

Good suggestions @Ananya44444 - cross-layer attacks are definitely a gap right now. Payloads that look benign pre-normalization but become malicious after decoding would catch a whole class of bugs where stage ordering matters

False positive cases are a good call too, if we only test with malicious inputs we have no idea what the precision looks like

Happy to have you contribute those, open a PR against the same payloads directory and i can review, or push them into this branch directly - either way works

Also going to restructure the existing payloads to align with Tharindu's v0.2 pipeline stages (Validate, Normalise, Scan, Aggregate) so everything maps to the canonical architecture

@Ananya44444

Copy link
Copy Markdown

@VibhorGautam Thanks! I’ve added cross-layer payloads and a false positive case in a follow-up PR #11 , built on top of this branch.
Would love your feedback . Happy to iterate, especially with the upcoming v0.2 pipeline restructuring.

@VibhorGautam

VibhorGautam commented May 26, 2026

Copy link
Copy Markdown
Contributor Author

superseded by #26

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Testing] Adversarial Validation Module - Red-Team Test Taxonomy for Pipeline Layers

4 participants