Skip to content

Add kp-detect-v23 submission#157

Open
kprofundis wants to merge 2 commits into
liamdugan:mainfrom
kprofundis:kp-detect-v23-submission
Open

Add kp-detect-v23 submission#157
kprofundis wants to merge 2 commits into
liamdugan:mainfrom
kprofundis:kp-detect-v23-submission

Conversation

@kprofundis
Copy link
Copy Markdown

Submission

Detector: kp-detect-v23
Author: Kareem Elsamadicy (Independent Researcher)
Contact: kelsamadicy@gmail.com

Ensemble of a transformer-based semantic classifier (DeBERTa-v3-base, 768-D mean-pooled embeddings + logistic regression) and an attack-feature gradient boosting detector (41-D engineered features, 5-seed ensemble, global temperature calibration). Routing by predicted attack type: semantic classifier weighted more heavily for paraphrase attacks, feature-based detector used alone for character-manipulation attacks.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 5, 2026

Eval run succeeded! Link to run: link

Here are the results of the submission(s):

Kareem Elsamadicy (Independent Researcher)

Release date: 2026-05-23

I've committed detailed results of this detector's performance on the test set to this PR.

On the RAID dataset as a whole (aggregated across all generation models, domains, decoding strategies, repetition penalties, and adversarial attacks), it achieved an AUROC of 93.37 and a TPR of 89.76% at FPR=5% and 79.46% at FPR=1%.
Without adversarial attacks, it achieved AUROC of 94.62 and a TPR of 92.14% at FPR=5% and 82.30% at FPR=1%.

If all looks well, a maintainer will come by soon to merge this PR and your entry/entries will appear on the leaderboard. If you need to make any changes, feel free to push new commits to this PR. Thanks for submitting to RAID!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant