Skip to content

Fix GroupBy().Select(entity) followed by a second projection#38436

Open
ajcvickers wants to merge 1 commit into
dotnet:mainfrom
ajcvickers:fix/31209-groupby-select-entity-select
Open

Fix GroupBy().Select(entity) followed by a second projection#38436
ajcvickers wants to merge 1 commit into
dotnet:mainfrom
ajcvickers:fix/31209-groupby-select-entity-select

Conversation

@ajcvickers

Copy link
Copy Markdown
Contributor

Fixes #31209

  • Member-mode ReplaceProjection left stale _clientProjections, producing an invalid mixed projection state.
  • Made the member-mode overload clear _clientProjections, mirroring the index-mode overload.
  • Fixes member Select / second GroupBy composed on top of GroupBy(k).Select(g => g.First()).
  • Enables 4 previously-skipped tests and adds 3 regression tests; InMemory's separate limitation stays skipped.
  • Verified across ~27k query tests (SQLite + SqlServer) with zero regressions.

A SelectExpression holds its projection as either a projection-member mapping or a client-projection (index) list, never both. The two ReplaceProjection overloads were asymmetric: the index-mode overload cleared both representations, but the member-mode overload cleared only the member mapping. So a member-mode Select following an index-mode Select left stale client projections behind, and ApplyProjection then took the index path over a member-based shaper and threw. This change makes the member-mode overload clear the client-projection list too, restoring the invariant.

This is a partial follow-up to #38140. It covers the member-Select / second-GroupBy family on top of the "entity per group" idiom. InMemory has an independent limitation in the same area and remains skipped.


The original issue

#31209 collects a large family of GroupBy translation failures, all sharing one shape: a GroupBy whose Select projects an entire entity per group (the "latest row per group" idiom), with a further operator composed on top:

    db.Set<Order>()
      .GroupBy(o => o.CustomerID)
      .Select(g => g.OrderByDescending(x => x.Date).First())  // one whole entity per group
      .<further operator>                                       // the trigger

Reporters hit it through many surface forms - a second Select projecting a member, a final Where/OrderBy, a result-selector GroupBy, a second GroupBy, FirstOrDefault instead of First. The original repro ended in a second GroupBy. The symptom was a confusing KeyNotFoundException: The given key 'EmptyProjectionMember' was not present in the dictionary, thrown deep in projection post-processing rather than at the offending operator, which is why the thread accumulated years of dead-end workarounds (FromSqlRaw round-trips, query-tag interceptors, ToList().AsQueryable()).

#38140 fixed part of this by cloning the subquery SelectExpression in RelationalSqlTranslatingExpressionVisitor.BindProperty, which resolved the Where/OrderBy/ anonymous-type cases. It explicitly left the member-Select-after case unsolved, and after that fix the remaining cases surfaced a different error, InvalidOperationException: Nullable object must have a value.

What is actually being fixed

Root cause is an invariant violation in SelectExpression. A projection is represented as exactly one of:

  • _projectionMapping - keyed by ProjectionMember (member mode), or
  • _clientProjections - an index list (client/index mode)

GetProjection reads one or the other depending on whether the binding carries a ProjectionMember or an Index; the two are never meant to coexist.

ReplaceProjection has two overloads, and they were asymmetric:

  • ReplaceProjection(IReadOnlyList) clears _projectionMapping AND _clientProjections
  • ReplaceProjection(IReadOnlyDictionary<...>) cleared ONLY _projectionMapping <-- bug

Sequence that breaks:

  1. .Select(g => g.First()) falls back to index-based binding -> populates _clientProjections.
  2. .Select(r => r.EmployeeID) translates in member mode -> repopulates _projectionMapping via the Dictionary overload, but leaves _clientProjections stale and non-empty.
  3. ApplyProjection sees _clientProjections.Count > 0, takes the index path, and the ClientProjectionRemappingExpressionVisitor walks the member-based shaper (Index == null) -> "Nullable object must have a value" (was EmptyProjectionMember before Fix GroupBy.Select EmptyProjectionMember #38140).

The fix makes the Dictionary overload also clear _clientProjections and _aliasForClientProjections, making it symmetric with the List overload and restoring the one-representation invariant.

Challenges

  • Misleading symptom. The exception is thrown in post-processing (ApplyProjection), far from the Select that creates the bad state, and it had already mutated once (EmptyProjectionMember -> Nullable-must-have-a-value) across Fix GroupBy.Select EmptyProjectionMember #38140, so the stack trace points away from the cause.

  • Two plausible fix sites. The contributor on Fix GroupBy.Select EmptyProjectionMember #38140 pursued the upstream NavigationExpandingExpressionVisitor.ProcessSelect path (making the GroupBy Select return a pending selector) and found it cascades into broad regressions requiring a redesign of pending- selector handling. The actual defect is downstream, in the relational SelectExpression, where a one-line symmetry restoration suffices.

  • Shared, hot code path. ReplaceProjection(IReadOnlyDictionary) is called from ~7 sites across the relational, SqlServer, and Sqlite providers. The change had to be proven safe everywhere, not just for the reported query.

Outcome

Production change: clear _clientProjections / _aliasForClientProjections in the member-mode ReplaceProjection overload. The clear is a no-op whenever the list is already empty (the normal case), so the only behavioral change is in the previously-invalid mixed state.

Tests:

  • Un-skipped GroupBy_Select_Entire_Entity_Select and _Where_Select.
  • Added regression tests: GroupBy_Select_Entire_Entity_FirstOrDefault_Where, GroupBy_ResultSelector_Entire_Entity_Where (result-selector entry point), GroupBy_Select_Entire_Entity_GroupBy (the literal original-issue repro).
  • Added SqlServer AssertSql baselines for all of the above (clean correlated-subquery / nested GROUP BY SQL).
  • InMemory keeps these skipped - it has a separate limitation in the same area.

Review and validation

Three independent reviewers analysed the change for blast radius, cross-shape consequences, and coverage gaps; their proposed experiments were then run.

  • Blast radius: all 7 callers of the member-mode overload traced. Six provably have an empty _clientProjections at call time (fresh select, just-cleared, or member-mode by construction), so the clear is a literal no-op; the seventh is the bug's target.

  • One genuine behavioral delta - the set-operation guard (SetOperationsNotAllowedAfterClientEvaluation). Verified empirically:

    • GroupBy(k).Select(g => g.First()).Select(o => o.OrderID).Union(...) previously threw spuriously and now returns correct results - corrective.
    • GroupBy(k).Select(g => g.First()).Union(...) (no intervening member Select) still correctly throws - the fix does not suppress the legitimate guard.
  • Structural invariant confirmed: when a kept value depends on a collection / correlated subquery / JSON node / entity held in _clientProjections, member-mode translation bails to the index fallback (List overload), so the Dictionary clear never fires destructively. No silent drop of collections, includes, JSON, or split-query metadata.

  • Pre-existing limitation, not a regression: projecting a navigation off the picked entity (GroupBy(k).Select(g => g.First()).Select(r => r.Customer.City)) fails as "ProjectionBindingExpression: 0 could not be translated". Confirmed by reverting the fix that it fails identically with and without this change. Separate, deeper issue.

  • Regression suites: ~12.1k SQLite and ~14.9k SqlServer query tests pass with zero failures; the NorthwindGroupBy class is green on SQLite, SqlServer, and InMemory.

Fixes part of #31209. Follow-up to #38140.

Copilot AI review requested due to automatic review settings June 16, 2026 13:39
@ajcvickers ajcvickers requested a review from a team as a code owner June 16, 2026 13:39

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Fixes EF Core query translation/shaper issues related to selecting entire entities from GroupBy queries (Issue #31209), and enables previously skipped coverage with provider-specific SQL baselines.

Changes:

  • Enabled previously skipped GroupBy “entire entity” tests in the specification test base and added new regression tests for additional shapes.
  • Added/updated SQL Server functional test baselines for the newly-enabled/added scenarios.
  • Fixed SelectExpression.ReplaceProjection to clear stale client projections when switching back to member-based projection mapping.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
test/EFCore.SqlServer.FunctionalTests/Query/NorthwindGroupByQuerySqlServerTest.cs Adds concrete AssertSql baselines for the now-enabled GroupBy entity-projection scenarios.
test/EFCore.Specification.Tests/Query/NorthwindGroupByQueryTestBase.cs Unskips two tests and adds three new regression tests covering additional GroupBy + entity projection shapes (#31209).
test/EFCore.InMemory.FunctionalTests/Query/NorthwindGroupByQueryInMemoryTest.cs Explicitly skips the newly-enabled/added scenarios for InMemory provider (still not supported per #31209).
src/EFCore.Relational/Query/SqlExpressions/SelectExpression.cs Clears stale client projections/aliases when replacing projection mapping to avoid shaper remapping failures (#31209).

…31209, dotnet#26748)

- Member-mode ReplaceProjection left stale _clientProjections, producing an invalid mixed projection state.
- Made the member-mode overload clear _clientProjections, mirroring the index-mode overload.
- Fixes member Select / second GroupBy composed on top of GroupBy(k).Select(g => g.First()).
- Enables 4 previously-skipped tests and adds 4 regression tests; InMemory's separate limitation stays skipped.
- Verified across ~27k query tests (SQLite + SqlServer) with zero regressions.

A SelectExpression holds its projection as either a projection-member mapping or a
client-projection (index) list, never both. The two ReplaceProjection overloads were
asymmetric: the index-mode overload cleared both representations, but the member-mode
overload cleared only the member mapping. So a member-mode Select following an index-mode
Select left stale client projections behind, and ApplyProjection then took the index path
over a member-based shaper and threw. This change makes the member-mode overload clear the
client-projection list too, restoring the invariant.

This is a partial follow-up to dotnet#38140. It covers the member-Select / second-GroupBy family
on top of the "entity per group" idiom. InMemory has an independent limitation in the same
area and remains skipped.

## The original issue

dotnet#31209 (and the older, more general dotnet#26748) collect a large family of GroupBy translation
failures, all sharing one shape: a GroupBy whose Select projects an entire entity per group
(the "latest row per group" idiom), with a further operator composed on top:

    db.Set<Order>()
      .GroupBy(o => o.CustomerID)
      .Select(g => g.OrderByDescending(x => x.Date).First())  // one whole entity per group
      .<further operator>                                       // the trigger

Reporters hit it through many surface forms - a second Select projecting a member, a final
Where/OrderBy, a result-selector GroupBy, a second GroupBy, FirstOrDefault instead of First.
The original repros ended in a second GroupBy (dotnet#31209) or in Count/Select/OrderBy over a
composite (anonymous-type) key (dotnet#26748). The symptom was a confusing
KeyNotFoundException: The given key 'EmptyProjectionMember' was not present in the dictionary,
thrown deep in projection post-processing rather than at the offending operator, which is why
the threads accumulated years of dead-end workarounds (FromSqlRaw round-trips, query-tag
interceptors, ToList().AsQueryable()).

dotnet#38140 fixed part of this by cloning the subquery SelectExpression in
RelationalSqlTranslatingExpressionVisitor.BindProperty, which resolved the Where/OrderBy/
anonymous-type and Count cases. It explicitly left the member-Select-after case unsolved, and
after that fix the remaining cases surfaced a different error,
InvalidOperationException: Nullable object must have a value.

## What is actually being fixed

Root cause is an invariant violation in SelectExpression. A projection is represented as
exactly one of:

  - _projectionMapping  - keyed by ProjectionMember (member mode), or
  - _clientProjections  - an index list (client/index mode)

GetProjection reads one or the other depending on whether the binding carries a ProjectionMember
or an Index; the two are never meant to coexist.

ReplaceProjection has two overloads, and they were asymmetric:

  - ReplaceProjection(IReadOnlyList<Expression>)   clears _projectionMapping AND _clientProjections
  - ReplaceProjection(IReadOnlyDictionary<...>)     cleared ONLY _projectionMapping   <-- bug

Sequence that breaks:

  1. .Select(g => g.First())  falls back to index-based binding -> populates _clientProjections.
  2. .Select(r => r.EmployeeID)  translates in member mode -> repopulates _projectionMapping via the
     Dictionary overload, but leaves _clientProjections stale and non-empty.
  3. ApplyProjection sees _clientProjections.Count > 0, takes the index path, and the
     ClientProjectionRemappingExpressionVisitor walks the member-based shaper (Index == null) ->
     "Nullable object must have a value" (was EmptyProjectionMember before dotnet#38140).

The fix makes the Dictionary overload also clear _clientProjections and _aliasForClientProjections,
making it symmetric with the List overload and restoring the one-representation invariant.

## Challenges

- Misleading symptom. The exception is thrown in post-processing (ApplyProjection), far from the
  Select that creates the bad state, and it had already mutated once (EmptyProjectionMember ->
  Nullable-must-have-a-value) across dotnet#38140, so the stack trace points away from the cause.

- Two plausible fix sites. The contributor on dotnet#38140 pursued the upstream
  NavigationExpandingExpressionVisitor.ProcessSelect path (making the GroupBy Select return a
  pending selector) and found it cascades into broad regressions requiring a redesign of pending-
  selector handling. The actual defect is downstream, in the relational SelectExpression, where a
  one-line symmetry restoration suffices.

- Shared, hot code path. ReplaceProjection(IReadOnlyDictionary) is called from ~7 sites across the
  relational, SqlServer, and Sqlite providers. The change had to be proven safe everywhere, not just
  for the reported query.

## Outcome

Production change: clear _clientProjections / _aliasForClientProjections in the member-mode
ReplaceProjection overload. The clear is a no-op whenever the list is already empty (the normal
case), so the only behavioral change is in the previously-invalid mixed state.

Tests:
  - Un-skipped GroupBy_Select_Entire_Entity_Select and _Where_Select.
  - Added regression tests: GroupBy_Select_Entire_Entity_FirstOrDefault_Where,
    GroupBy_ResultSelector_Entire_Entity_Where (result-selector entry point),
    GroupBy_Select_Entire_Entity_GroupBy (the literal dotnet#31209 repro),
    GroupBy_Select_Entire_Entity_composite_key_Select (the literal dotnet#26748 repro).
  - Added SqlServer AssertSql baselines for all of the above (clean correlated-subquery / nested
    GROUP BY SQL).
  - InMemory keeps these skipped - it has a separate limitation in the same area.

## Review and validation

Three independent reviewers analysed the change for blast radius, cross-shape consequences, and
coverage gaps; their proposed experiments were then run.

  - Blast radius: all 7 callers of the member-mode overload traced. Six provably have an empty
    _clientProjections at call time (fresh select, just-cleared, or member-mode by construction), so
    the clear is a literal no-op; the seventh is the bug's target.

  - One genuine behavioral delta - the set-operation guard
    (SetOperationsNotAllowedAfterClientEvaluation). Verified empirically:
      * GroupBy(k).Select(g => g.First()).Select(o => o.OrderID).Union(...) previously threw
        spuriously and now returns correct results - corrective.
      * GroupBy(k).Select(g => g.First()).Union(...) (no intervening member Select) still correctly
        throws - the fix does not suppress the legitimate guard.

  - Structural invariant confirmed: when a kept value depends on a collection / correlated subquery /
    JSON node / entity held in _clientProjections, member-mode translation bails to the index
    fallback (List overload), so the Dictionary clear never fires destructively. No silent drop of
    collections, includes, JSON, or split-query metadata.

  - Pre-existing limitation, not a regression: projecting a navigation off the picked entity
    (GroupBy(k).Select(g => g.First()).Select(r => r.Customer.City)) fails as
    "ProjectionBindingExpression: 0 could not be translated". Confirmed by reverting the fix that it
    fails identically with and without this change. Separate, deeper issue.

  - dotnet#26748 cross-check: its Count and OrderBy repros were already resolved by dotnet#38140; its remaining
    member-Select repro (composite/anonymous key) failed with the exact "Nullable object must have a
    value" error and is resolved by this change. Verified by removing the fix and re-running.

  - Regression suites: ~12.1k SQLite and ~14.9k SqlServer query tests pass with zero failures; the
    NorthwindGroupBy class is green on SQLite, SqlServer, and InMemory.

Fixes dotnet#31209. Fixes dotnet#26748. Follow-up to dotnet#38140.
@ajcvickers ajcvickers force-pushed the fix/31209-groupby-select-entity-select branch from 53191ab to 356b677 Compare June 16, 2026 14:37
@ajcvickers

Copy link
Copy Markdown
Contributor Author

Also fixes #26748.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

When using GroupBy, getting "the given key 'EmptyProjectionMember' was not present in the dictionary"

2 participants