FIX: Use `underlying_model_name` for evaluation identifier and add `param_fallbacks` by jsong468 · Pull Request #1647 · microsoft/PyRIT

jsong468 · 2026-04-23T20:05:04Z

There existed a bug where ScorerEvaluationIdentifier used model_name instead of underlying_model_name as originally intended. Thus, the eval hash in our JSONL scorer metrics registries is computed incorrectly using model_name (which is often a deployment name).

This PR addresses the bug by using underlying_model_name in ScorerEvaluationIdentifier while also introducing param_fallbacks field to ChildEvalRule to use model_name as a fallback for underlying_model_name when it is None or "" (for cases where the user doesn't provide a separate underlying_model_name to the LLM prompt target used in a scorer since it now needs to be passed in explicitly and not automatically from environment variables as a result of #1590)

Description

Eval hash now uses underlying_model_name instead of model_name to group scorer evaluations by the actual model (e.g., "gpt-4o") rather than the deployment name (e.g., "my-azure-deploy")
Added param_fallbacks field to ChildEvalRule to handle cases where underlying_model_name is empty — falls back to model_name
- This is necessary because users don't often explicitly set underlying_model, so without fallback, all targets would hash to the same empty string
Fallback logic uses explicit is None / == "" checks instead of truthiness to avoid incorrectly triggering on valid values like 0 or False
Updated ScorerEvaluationIdentifier and AtomicAttackEvaluationIdentifier rules to use underlying_model_name with model_name fallback
Change is contained entirely in the eval layer; there is no impact on full ComponentIdentifier

Tests and Documentation

Added TestParamFallbacks class in test_evaluation_identifier.py covering: primary param used when present, fallback on empty string, fallback on missing key, and no fallback when param_fallbacks is None
Added TestScorerEvalHashFallback class in test_scorer_evaluation_identifier.py covering: underlying model used when present, fallback to model_name when empty/missing, and different models produce different hashes
Updated existing tests in test_scorer_evaluation_identifier.py and test_atomic_attack_identifier.py to reflect the new underlying_model_name + param_fallbacks rules

TODO

re-run evaluate_scorers.py script and re-populate all JSONL registries since every eval hash will be different.

adrian-gavrila

Other than the one comment this looks great

add param_fallbacks

f66cef4

adrian-gavrila reviewed Apr 23, 2026

View reviewed changes

Comment thread pyrit/identifiers/evaluation_identifier.py

rlundeen2 approved these changes Apr 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIX: Use `underlying_model_name` for evaluation identifier and add `param_fallbacks`#1647

FIX: Use `underlying_model_name` for evaluation identifier and add `param_fallbacks`#1647
jsong468 wants to merge 1 commit intomicrosoft:mainfrom
jsong468:underlying_model_fix

jsong468 commented Apr 23, 2026 •

edited

Loading

Uh oh!

adrian-gavrila left a comment •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jsong468 commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests and Documentation

TODO

Uh oh!

adrian-gavrila left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jsong468 commented Apr 23, 2026 •

edited

Loading

adrian-gavrila left a comment •

edited

Loading