Fix GroupBy().Select(entity) followed by a second projection#38436
Open
ajcvickers wants to merge 1 commit into
Open
Fix GroupBy().Select(entity) followed by a second projection#38436ajcvickers wants to merge 1 commit into
ajcvickers wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Fixes EF Core query translation/shaper issues related to selecting entire entities from GroupBy queries (Issue #31209), and enables previously skipped coverage with provider-specific SQL baselines.
Changes:
- Enabled previously skipped
GroupBy“entire entity” tests in the specification test base and added new regression tests for additional shapes. - Added/updated SQL Server functional test baselines for the newly-enabled/added scenarios.
- Fixed
SelectExpression.ReplaceProjectionto clear stale client projections when switching back to member-based projection mapping.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| test/EFCore.SqlServer.FunctionalTests/Query/NorthwindGroupByQuerySqlServerTest.cs | Adds concrete AssertSql baselines for the now-enabled GroupBy entity-projection scenarios. |
| test/EFCore.Specification.Tests/Query/NorthwindGroupByQueryTestBase.cs | Unskips two tests and adds three new regression tests covering additional GroupBy + entity projection shapes (#31209). |
| test/EFCore.InMemory.FunctionalTests/Query/NorthwindGroupByQueryInMemoryTest.cs | Explicitly skips the newly-enabled/added scenarios for InMemory provider (still not supported per #31209). |
| src/EFCore.Relational/Query/SqlExpressions/SelectExpression.cs | Clears stale client projections/aliases when replacing projection mapping to avoid shaper remapping failures (#31209). |
…31209, dotnet#26748) - Member-mode ReplaceProjection left stale _clientProjections, producing an invalid mixed projection state. - Made the member-mode overload clear _clientProjections, mirroring the index-mode overload. - Fixes member Select / second GroupBy composed on top of GroupBy(k).Select(g => g.First()). - Enables 4 previously-skipped tests and adds 4 regression tests; InMemory's separate limitation stays skipped. - Verified across ~27k query tests (SQLite + SqlServer) with zero regressions. A SelectExpression holds its projection as either a projection-member mapping or a client-projection (index) list, never both. The two ReplaceProjection overloads were asymmetric: the index-mode overload cleared both representations, but the member-mode overload cleared only the member mapping. So a member-mode Select following an index-mode Select left stale client projections behind, and ApplyProjection then took the index path over a member-based shaper and threw. This change makes the member-mode overload clear the client-projection list too, restoring the invariant. This is a partial follow-up to dotnet#38140. It covers the member-Select / second-GroupBy family on top of the "entity per group" idiom. InMemory has an independent limitation in the same area and remains skipped. ## The original issue dotnet#31209 (and the older, more general dotnet#26748) collect a large family of GroupBy translation failures, all sharing one shape: a GroupBy whose Select projects an entire entity per group (the "latest row per group" idiom), with a further operator composed on top: db.Set<Order>() .GroupBy(o => o.CustomerID) .Select(g => g.OrderByDescending(x => x.Date).First()) // one whole entity per group .<further operator> // the trigger Reporters hit it through many surface forms - a second Select projecting a member, a final Where/OrderBy, a result-selector GroupBy, a second GroupBy, FirstOrDefault instead of First. The original repros ended in a second GroupBy (dotnet#31209) or in Count/Select/OrderBy over a composite (anonymous-type) key (dotnet#26748). The symptom was a confusing KeyNotFoundException: The given key 'EmptyProjectionMember' was not present in the dictionary, thrown deep in projection post-processing rather than at the offending operator, which is why the threads accumulated years of dead-end workarounds (FromSqlRaw round-trips, query-tag interceptors, ToList().AsQueryable()). dotnet#38140 fixed part of this by cloning the subquery SelectExpression in RelationalSqlTranslatingExpressionVisitor.BindProperty, which resolved the Where/OrderBy/ anonymous-type and Count cases. It explicitly left the member-Select-after case unsolved, and after that fix the remaining cases surfaced a different error, InvalidOperationException: Nullable object must have a value. ## What is actually being fixed Root cause is an invariant violation in SelectExpression. A projection is represented as exactly one of: - _projectionMapping - keyed by ProjectionMember (member mode), or - _clientProjections - an index list (client/index mode) GetProjection reads one or the other depending on whether the binding carries a ProjectionMember or an Index; the two are never meant to coexist. ReplaceProjection has two overloads, and they were asymmetric: - ReplaceProjection(IReadOnlyList<Expression>) clears _projectionMapping AND _clientProjections - ReplaceProjection(IReadOnlyDictionary<...>) cleared ONLY _projectionMapping <-- bug Sequence that breaks: 1. .Select(g => g.First()) falls back to index-based binding -> populates _clientProjections. 2. .Select(r => r.EmployeeID) translates in member mode -> repopulates _projectionMapping via the Dictionary overload, but leaves _clientProjections stale and non-empty. 3. ApplyProjection sees _clientProjections.Count > 0, takes the index path, and the ClientProjectionRemappingExpressionVisitor walks the member-based shaper (Index == null) -> "Nullable object must have a value" (was EmptyProjectionMember before dotnet#38140). The fix makes the Dictionary overload also clear _clientProjections and _aliasForClientProjections, making it symmetric with the List overload and restoring the one-representation invariant. ## Challenges - Misleading symptom. The exception is thrown in post-processing (ApplyProjection), far from the Select that creates the bad state, and it had already mutated once (EmptyProjectionMember -> Nullable-must-have-a-value) across dotnet#38140, so the stack trace points away from the cause. - Two plausible fix sites. The contributor on dotnet#38140 pursued the upstream NavigationExpandingExpressionVisitor.ProcessSelect path (making the GroupBy Select return a pending selector) and found it cascades into broad regressions requiring a redesign of pending- selector handling. The actual defect is downstream, in the relational SelectExpression, where a one-line symmetry restoration suffices. - Shared, hot code path. ReplaceProjection(IReadOnlyDictionary) is called from ~7 sites across the relational, SqlServer, and Sqlite providers. The change had to be proven safe everywhere, not just for the reported query. ## Outcome Production change: clear _clientProjections / _aliasForClientProjections in the member-mode ReplaceProjection overload. The clear is a no-op whenever the list is already empty (the normal case), so the only behavioral change is in the previously-invalid mixed state. Tests: - Un-skipped GroupBy_Select_Entire_Entity_Select and _Where_Select. - Added regression tests: GroupBy_Select_Entire_Entity_FirstOrDefault_Where, GroupBy_ResultSelector_Entire_Entity_Where (result-selector entry point), GroupBy_Select_Entire_Entity_GroupBy (the literal dotnet#31209 repro), GroupBy_Select_Entire_Entity_composite_key_Select (the literal dotnet#26748 repro). - Added SqlServer AssertSql baselines for all of the above (clean correlated-subquery / nested GROUP BY SQL). - InMemory keeps these skipped - it has a separate limitation in the same area. ## Review and validation Three independent reviewers analysed the change for blast radius, cross-shape consequences, and coverage gaps; their proposed experiments were then run. - Blast radius: all 7 callers of the member-mode overload traced. Six provably have an empty _clientProjections at call time (fresh select, just-cleared, or member-mode by construction), so the clear is a literal no-op; the seventh is the bug's target. - One genuine behavioral delta - the set-operation guard (SetOperationsNotAllowedAfterClientEvaluation). Verified empirically: * GroupBy(k).Select(g => g.First()).Select(o => o.OrderID).Union(...) previously threw spuriously and now returns correct results - corrective. * GroupBy(k).Select(g => g.First()).Union(...) (no intervening member Select) still correctly throws - the fix does not suppress the legitimate guard. - Structural invariant confirmed: when a kept value depends on a collection / correlated subquery / JSON node / entity held in _clientProjections, member-mode translation bails to the index fallback (List overload), so the Dictionary clear never fires destructively. No silent drop of collections, includes, JSON, or split-query metadata. - Pre-existing limitation, not a regression: projecting a navigation off the picked entity (GroupBy(k).Select(g => g.First()).Select(r => r.Customer.City)) fails as "ProjectionBindingExpression: 0 could not be translated". Confirmed by reverting the fix that it fails identically with and without this change. Separate, deeper issue. - dotnet#26748 cross-check: its Count and OrderBy repros were already resolved by dotnet#38140; its remaining member-Select repro (composite/anonymous key) failed with the exact "Nullable object must have a value" error and is resolved by this change. Verified by removing the fix and re-running. - Regression suites: ~12.1k SQLite and ~14.9k SqlServer query tests pass with zero failures; the NorthwindGroupBy class is green on SQLite, SqlServer, and InMemory. Fixes dotnet#31209. Fixes dotnet#26748. Follow-up to dotnet#38140.
53191ab to
356b677
Compare
Contributor
Author
|
Also fixes #26748. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #31209
A SelectExpression holds its projection as either a projection-member mapping or a client-projection (index) list, never both. The two ReplaceProjection overloads were asymmetric: the index-mode overload cleared both representations, but the member-mode overload cleared only the member mapping. So a member-mode Select following an index-mode Select left stale client projections behind, and ApplyProjection then took the index path over a member-based shaper and threw. This change makes the member-mode overload clear the client-projection list too, restoring the invariant.
This is a partial follow-up to #38140. It covers the member-Select / second-GroupBy family on top of the "entity per group" idiom. InMemory has an independent limitation in the same area and remains skipped.
The original issue
#31209 collects a large family of GroupBy translation failures, all sharing one shape: a GroupBy whose Select projects an entire entity per group (the "latest row per group" idiom), with a further operator composed on top:
Reporters hit it through many surface forms - a second Select projecting a member, a final Where/OrderBy, a result-selector GroupBy, a second GroupBy, FirstOrDefault instead of First. The original repro ended in a second GroupBy. The symptom was a confusing KeyNotFoundException: The given key 'EmptyProjectionMember' was not present in the dictionary, thrown deep in projection post-processing rather than at the offending operator, which is why the thread accumulated years of dead-end workarounds (FromSqlRaw round-trips, query-tag interceptors, ToList().AsQueryable()).
#38140 fixed part of this by cloning the subquery SelectExpression in RelationalSqlTranslatingExpressionVisitor.BindProperty, which resolved the Where/OrderBy/ anonymous-type cases. It explicitly left the member-Select-after case unsolved, and after that fix the remaining cases surfaced a different error, InvalidOperationException: Nullable object must have a value.
What is actually being fixed
Root cause is an invariant violation in SelectExpression. A projection is represented as exactly one of:
GetProjection reads one or the other depending on whether the binding carries a ProjectionMember or an Index; the two are never meant to coexist.
ReplaceProjection has two overloads, and they were asymmetric:
Sequence that breaks:
The fix makes the Dictionary overload also clear _clientProjections and _aliasForClientProjections, making it symmetric with the List overload and restoring the one-representation invariant.
Challenges
Misleading symptom. The exception is thrown in post-processing (ApplyProjection), far from the Select that creates the bad state, and it had already mutated once (EmptyProjectionMember -> Nullable-must-have-a-value) across Fix GroupBy.Select EmptyProjectionMember #38140, so the stack trace points away from the cause.
Two plausible fix sites. The contributor on Fix GroupBy.Select EmptyProjectionMember #38140 pursued the upstream NavigationExpandingExpressionVisitor.ProcessSelect path (making the GroupBy Select return a pending selector) and found it cascades into broad regressions requiring a redesign of pending- selector handling. The actual defect is downstream, in the relational SelectExpression, where a one-line symmetry restoration suffices.
Shared, hot code path. ReplaceProjection(IReadOnlyDictionary) is called from ~7 sites across the relational, SqlServer, and Sqlite providers. The change had to be proven safe everywhere, not just for the reported query.
Outcome
Production change: clear _clientProjections / _aliasForClientProjections in the member-mode ReplaceProjection overload. The clear is a no-op whenever the list is already empty (the normal case), so the only behavioral change is in the previously-invalid mixed state.
Tests:
Review and validation
Three independent reviewers analysed the change for blast radius, cross-shape consequences, and coverage gaps; their proposed experiments were then run.
Blast radius: all 7 callers of the member-mode overload traced. Six provably have an empty _clientProjections at call time (fresh select, just-cleared, or member-mode by construction), so the clear is a literal no-op; the seventh is the bug's target.
One genuine behavioral delta - the set-operation guard (SetOperationsNotAllowedAfterClientEvaluation). Verified empirically:
Structural invariant confirmed: when a kept value depends on a collection / correlated subquery / JSON node / entity held in _clientProjections, member-mode translation bails to the index fallback (List overload), so the Dictionary clear never fires destructively. No silent drop of collections, includes, JSON, or split-query metadata.
Pre-existing limitation, not a regression: projecting a navigation off the picked entity (GroupBy(k).Select(g => g.First()).Select(r => r.Customer.City)) fails as "ProjectionBindingExpression: 0 could not be translated". Confirmed by reverting the fix that it fails identically with and without this change. Separate, deeper issue.
Regression suites: ~12.1k SQLite and ~14.9k SqlServer query tests pass with zero failures; the NorthwindGroupBy class is green on SQLite, SqlServer, and InMemory.
Fixes part of #31209. Follow-up to #38140.