Proposal: Add Puppetry Policy Detector as Optional Heuristic Module

Hello Rebuff team,
I have developed a small open-source tool called Puppetry Detector. It detects policy puppetry and prompt injection attempts in LLM prompts using regular expressions. The tool is modular and already includes integration with Rebuff, so it can be adapted as an optional heuristic module. 
If somebody gives it a look an confirms possibility of integration, I'd be happy.

I would be happy to prepare a pull request to add this as an optional feature, if you think it could be useful. Please let me know if you are open to this idea or if you have any suggestions before I make a PR.
Thank you for your time and for your great work on Rebuff!

The tool itself is here: https://github.com/metawake/puppetry-detector

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Add Puppetry Policy Detector as Optional Heuristic Module #118

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Proposal: Add Puppetry Policy Detector as Optional Heuristic Module #118

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions