FIX: Use underlying_model_name for evaluation identifier and add param_fallbacks#1647
Open
jsong468 wants to merge 1 commit intomicrosoft:mainfrom
Open
FIX: Use underlying_model_name for evaluation identifier and add param_fallbacks#1647jsong468 wants to merge 1 commit intomicrosoft:mainfrom
underlying_model_name for evaluation identifier and add param_fallbacks#1647jsong468 wants to merge 1 commit intomicrosoft:mainfrom
Conversation
rlundeen2
approved these changes
Apr 23, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
There existed a bug where
ScorerEvaluationIdentifierusedmodel_nameinstead ofunderlying_model_nameas originally intended. Thus, the eval hash in our JSONL scorer metrics registries is computed incorrectly usingmodel_name(which is often a deployment name).This PR addresses the bug by using
underlying_model_nameinScorerEvaluationIdentifierwhile also introducingparam_fallbacksfield toChildEvalRuleto usemodel_nameas a fallback forunderlying_model_namewhen it is None or "" (for cases where the user doesn't provide a separateunderlying_model_nameto the LLM prompt target used in a scorer since it now needs to be passed in explicitly and not automatically from environment variables as a result of #1590)Description
underlying_model_nameinstead ofmodel_nameto group scorer evaluations by the actual model (e.g.,"gpt-4o") rather than the deployment name (e.g.,"my-azure-deploy")param_fallbacksfield toChildEvalRuleto handle cases whereunderlying_model_nameis empty — falls back tomodel_nameunderlying_model, so without fallback, all targets would hash to the same empty stringis None/== ""checks instead of truthiness to avoid incorrectly triggering on valid values like0orFalseScorerEvaluationIdentifierandAtomicAttackEvaluationIdentifierrules to useunderlying_model_namewithmodel_namefallbackComponentIdentifierTests and Documentation
TestParamFallbacksclass intest_evaluation_identifier.pycovering: primary param used when present, fallback on empty string, fallback on missing key, and no fallback whenparam_fallbacksisNoneTestScorerEvalHashFallbackclass intest_scorer_evaluation_identifier.pycovering: underlying model used when present, fallback tomodel_namewhen empty/missing, and different models produce different hashestest_scorer_evaluation_identifier.pyandtest_atomic_attack_identifier.pyto reflect the newunderlying_model_name+param_fallbacksrulesTODO