generated from guardrails-ai/validator-template
-
Notifications
You must be signed in to change notification settings - Fork 5
Open
Description
As of writing, there's only one threshold for the zero-shot topics that's used as a cutoff for whether a topic is considered 'found' or not. Having separate thresholds for the positive and negative side of the equation would allow for us to perform more nuanced filtering, like: "It might not be about sports, but it's definitely not about travel."
Consider the case where our threshold is 0.5, the default. If we assume the false-positive rate here 4%[1] then adding ten negative topics means our odds of accidentally flagging something is 1-((1-0.04)...(1-0.04)), or 33%.
It would be nice to be able to tune that.
I imagine the change would be something akin to:
candidate_topics = model_input["valid_topics"] + model_input["invalid_topics"]
thresholds = [self._zero_shot_threshold_valid]*len(model_input["valid_topics"]) + [self._zero_shot_threshold_invalid]*len(model_input["invalid_topics"])
result = self._classifier(text, candidate_topics)
topics = result["labels"]
scores = result["scores"]
found_topics = []
for topic, score, threshold in zip(topics, scores, thresholds):
if score > threshold:
found_topics.append(topic)
[1] Source: lost the original link so the new source is 'trust me, friendo'.
Metadata
Metadata
Assignees
Labels
No labels