Consider number of predicates at each level in rewriting cost model #3681

hazefully · 2025-10-16T14:57:39Z

This PR improves the rewriting cost model by making it consider the number of predicates in each expression at each level of the expression graph, instead of comparing graphs based on the height of highest expressions in the graph that contains any predicates.

The reasoning for this is to push the cost model to prefer expressions with the same number of predicates, where some predicates were pushed from higher levels in the QGM to lower levels, which can lead to producing plans with more specific index key comparisons in the planning phase.

Example: the rewriting cost model should consider the expression:

SELECT sq1.a FROM (SELECT a, b FROM T WHERE a = 42) sq1,  (SELECT a FROM T2) sq2

to have a lower cost than:

SELECT a FROM (SELECT a, b, d FROM T) WHERE a = 42 AND EXISTS (SELECT a FROM T2)

The rewriting cost model is improved by considering two ExpressionProperty when comparing two expressions: the existing NormalizedResidualPredicateProperty property which calculates the total number of conjuncts in the combined query predicate across the entire expression tree at all levels and a new property PredicateCountByLevelProperty which calculates the number of predicates at each level of the expression graph. In addition, considering the property PredicateComplexityProperty is no longer necessary as what it checks for (the most complex predicate across the entire expression tree) is already checked indirectly by NormalizedResidualPredicateProperty.

Performance impact

Some basic analysis of the performance impact of considering those properties can be found in #3681 (comment).

hazefully · 2025-12-09T23:47:06Z

yaml-tests/src/test/resources/standard-tests.metrics.yaml

-    task_count: 2476
-    task_total_time_ms: 212
-    transform_count: 467
-    transform_time_ms: 67
-    transform_yield_count: 143
-    insert_time_ms: 21
-    insert_new_count: 315
-    insert_reused_count: 44
+    task_count: 1325
+    task_total_time_ms: 267
+    transform_count: 270
+    transform_time_ms: 153
+    transform_yield_count: 83
+    insert_time_ms: 12
+    insert_new_count: 165
+    insert_reused_count: 24


I think the planning metrics for the two queries here have improved because the rewriting cost model now prefers expressions with less total number of predicates, so when the predicates in those two queries are simplified to remove the duplicated predicate, we end up with a smaller search space. Before the change here, the two expressions (the original expressions with the duplicated predicate and the simplified one) were considered equal and the semantic hashcode tie breaking resulted in the original expression being chosen.

It looks almost suspicious that the time spent is higher even though all the counts are lower. Can you do a test for me? Remove all metrics files (metrics.binpb and metrics.yaml) try to run the entire suite in correction mode. Just glance over the time spent to see if there is a trend up or downwards. In general, this sort of stuff happens that we see confusing time durations as everyone uses a different set-up, had their Mac throttled or not, etc. -- so a sample size of one is pretty naive to comment on but still let's run this to just rule out the case that for some reason a rule has degraded. Also, you may want to do this for down stream.

See #3681 (comment), I did multiple runs of correcting the entire test suite (including downstream) and there doesn't seem to be any consistent downtrend/uptrend in the time metrics.

normen662

Hi, good stuff! I left detailed (I hope) comments on your changes. In general I would say that the following should be focused on:

take the new properties out of the expression property map, i.e. make them untracked
investigate the interaction with predicate complexity (would it be better to reorder and maybe get rid of predicate height)
evaluate if there is a planner performance regression

.../apple/foundationdb/record/query/plan/cascades/properties/PredicateCountByLevelProperty.java

normen662 · 2025-12-16T07:46:03Z

.../apple/foundationdb/record/query/plan/cascades/properties/PredicateCountByLevelProperty.java

+         *         deeper level or if {@code a} has a higher number of levels with predicates.
+         */
+        public static int compare(final PredicateCountByLevelInfo a, final PredicateCountByLevelInfo b) {
+            final int highestLevel = Integer.max(a.getHighestLevel(), b.getHighestLevel());


Wouldn't it be more concise to use two SortedMaps (TreeMap). You can then pop them off smallest first on both sides. In that way you can get rid of the highestLevel field which is redundant in your data structure.

Thanks for the suggestion, that is indeed better! I modified the code to do that and removed the highestLevel field in the data structure. There is a way to do this with the Java Streams API but I found it to be less readable than a simple for-loop, let me know if you think otherwise or if there is a better way to implement this.

I did have to keep the getHighestLevel method on the data structure just because the ImmutableSortedMap throws an exception for empty maps if lastKey is called on them so having the method is a neat way to wrap that check.

.../apple/foundationdb/record/query/plan/cascades/properties/PredicateCountByLevelProperty.java

normen662 · 2025-12-16T08:00:18Z

...src/main/java/com/apple/foundationdb/record/query/plan/cascades/ExpressionPropertiesMap.java

                ImmutableSet.of(),
-                ImmutableSet.of(ExpressionCountProperty.selectCount(), ExpressionCountProperty.tableFunctionCount(),
-                        PredicateComplexityProperty.predicateComplexity()),
+                ImmutableSet.of(


Ah, I think this should really selectCount and tableFunctionCount in there. I think predicateComplexity should not be in here but 🤷 . I think the new ones you are introducing should not be here.

Just a note that I had to leave predicateComplexity here because it is used in the SelectMergeRule in a code path that expects it to be tracked (

fdb-record-layer/fdb-record-layer-core/src/main/java/com/apple/foundationdb/record/query/plan/cascades/matching/structure/ExpressionsPartitionMatchers.java

Lines 114 to 118 in aaf3961

public static <E extends RelationalExpression> BiFunction<ExpressionPartition<E>, ? super E, Tuple> comparisonByPropertyList(@Nonnull ExpressionProperty<?>... expressionProperties) {

return (partition, expression) ->

Tuple.fromItems(Arrays.stream(expressionProperties)

.map(property -> partition.getNonPartitioningPropertyValue(expression, property))

.collect(Collectors.toList()));

).

normen662 · 2025-12-16T08:03:04Z

...core/src/main/java/com/apple/foundationdb/record/query/plan/cascades/RewritingCostModel.java

-        int bPredicateHeight = predicateHeight().evaluate(b);
-        if (aPredicateHeight != bPredicateHeight) {
-            return Integer.compare(aPredicateHeight, bPredicateHeight);
+        int aPredicateCount = predicateCount().evaluate(a);


How does all of this (not picking out this particular line but this entire added block here) interact with the predicate complexity? Wouldn't predicate complexity be subsuming the predicate height property you are introducing? Could we put the predicate complexity in front of the count-by-layer property here?

As discussed offline, I changed this to use the NormalizedResidualPredicateProperty instead as the first thing to check, which works nicely because it takes care of both the comparison of predicate count (but in a better way because it naturally prefers simpler predicates), and it also makes it unnecessary to consider the PredicateComplexityProperty.

The reason why we don't need to consider the PredicateComplexityProperty anymore is that if there is an expression that has a query predicate that has a worse predicate complexity (i.e. large tree diameter), it would contribute in the same way to the number of conjuncts in the normalized form of the combined query predicate. Otherwise this would mean that a predicate with a higher tree diameter (i.e. lots of nested predicates) would have a smaller number of conjuncts in the normal form than another predicate that has a smaller tree diameter, which doesn't make sense.

I also confirmed downstream that removing the check for PredicateComplexityProperty doesn't change anything.

normen662 · 2025-12-16T08:07:09Z

...le/foundationdb/record/query/plan/cascades/properties/PredicateCountByLevelPropertyTest.java

+    @Test
+    void compareReturnsInfoWithMoreLevelsInCaseOfEquality() {
+        final PredicateCountByLevelProperty.PredicateCountByLevelInfo aInfo = new PredicateCountByLevelProperty.PredicateCountByLevelInfo(
+                Map.of(1, 1, 2, 3, 3, 1), 3);


Another weird idiosyncrasy: All maps in the record layer have to be ImmutableMap (guava) instead of java.util.Map. The reason is that their copy-constructors avoid a copy if the source is a ImmutableMap. Map does that, too, but only if the source is a Map as well. So while it would be better to have the entire codebase on Map and not on ImmutableMap, someone would have to change everything first. The same applies to lists and sets as well.

I replaced all usages with that, thanks for the clarification!

.../apple/foundationdb/record/query/plan/cascades/properties/PredicateCountByLevelProperty.java

normen662 · 2025-12-16T08:13:56Z

yaml-tests/src/test/resources/standard-tests.metrics.yaml

-    task_count: 2476
-    task_total_time_ms: 212
-    transform_count: 467
-    transform_time_ms: 67
-    transform_yield_count: 143
-    insert_time_ms: 21
-    insert_new_count: 315
-    insert_reused_count: 44
+    task_count: 1325
+    task_total_time_ms: 267
+    transform_count: 270
+    transform_time_ms: 153
+    transform_yield_count: 83
+    insert_time_ms: 12
+    insert_new_count: 165
+    insert_reused_count: 24


It looks almost suspicious that the time spent is higher even though all the counts are lower. Can you do a test for me? Remove all metrics files (metrics.binpb and metrics.yaml) try to run the entire suite in correction mode. Just glance over the time spent to see if there is a trend up or downwards. In general, this sort of stuff happens that we see confusing time durations as everyone uses a different set-up, had their Mac throttled or not, etc. -- so a sample size of one is pretty naive to comment on but still let's run this to just rule out the case that for some reason a rule has degraded. Also, you may want to do this for down stream.

Using this property is better for multiple reasons: 1) It would naturally prefer simpler query predicates over complex ones, as the simpler predicates would result in a simpler combined normalized query predicate for the entire QGM. 2) It takes into consideration predicates that were simplified to be a tautology, making sure these predicates are preferred over unsimplified ones. A simpler predicate count is not able to do this. By using the NormalizedResidualPredicateProperty, it is no longer necessary to use the PredicateComplexityProperty in the RewritingCostModel as the NormalizedResidualPredicateProperty would take care of choosing the expreission that has the least maximal query predicate across its QGM (as that would lead to fewer conjuncts in the normalized form of the combined query predicate).

After testing against existing yaml-tests, this leads to slightly worse performance with no gain, due to having to calculate this property for many expressions that will end up being pruned by other properties considered in the rewriting cost model earlier.

For consistency with the rest of the codebase.

github-actions · 2025-12-18T18:03:08Z

📊 Metrics Diff Analysis Report

Summary

New queries: 1
Dropped queries: 0
Plan changed + metrics changed: 2
Plan unchanged + metrics changed: 2

ℹ️ About this analysis

This automated analysis compares query planner metrics between the base branch and this PR. It categorizes changes into:

New queries: Queries added in this PR
Dropped queries: Queries removed in this PR. These should be reviewed to ensure we are not losing coverage.
Plan changed + metrics changed: The query plan has changed along with planner metrics.
Metrics only changed: Same plan but different metrics

The last category in particular may indicate planner regressions that should be investigated.

New Queries

Count of new queries by file:

yaml-tests/src/test/resources/subquery-tests.metrics.yaml: 1

Plan and Metrics Changed

These queries experienced both plan and metrics changes. This generally indicates that there was some planner change
that means the planning for this query may be substantially different. Some amount of query plan metrics change is expected,
but the reviewer should still validate that these changes are not excessive.

Total: 2 queries

Statistical Summary (Plan and Metrics Changed)

task_count:

Average change: -14.0
Median change: -14
Standard deviation: 0.0
Range: -14 to -14
Queries changed: 2
No regressions! 🎉

insert_new_count:

Average change: -2.0
Median change: -2
Standard deviation: 0.0
Range: -2 to -2
Queries changed: 2
No regressions! 🎉

Significant Regressions (Plan and Metrics Changed)

There were 2 outliers detected. Outlier queries have a significant regression in at least one field. Statistically, this represents either an increase of more than two standard deviations above the mean or a large absolute increase (e.g., 100).

yaml-tests/src/test/resources/nested-with-nulls-proto.metrics.yaml:184: EXPLAIN select id from t1 where coalesce(a.a.a, 'blah') IS NOT NULL
- old explain: SCAN(<,>) | FILTER coalesce_string(_.A.A.A, @c14) NOT_NULL | MAP (_.ID AS ID)
- new explain: SCAN(<,>) | MAP (_.ID AS ID)
- task_count: 222 -> 208 (-14)
- insert_new_count: 22 -> 20 (-2)
yaml-tests/src/test/resources/nested-with-nulls-proto.metrics.yaml:194: EXPLAIN select id from t1 where coalesce(a.a.a, null) IS NOT NULL
- old explain: SCAN(<,>) | FILTER coalesce_string(promote(_.A.A.A AS STRING), NULL) NOT_NULL | MAP (_.ID AS ID)
- new explain: SCAN(<,>) | MAP (_.ID AS ID)
- task_count: 222 -> 208 (-14)
- insert_new_count: 22 -> 20 (-2)

Only Metrics Changed

These queries experienced only metrics changes without any plan changes. If these metrics have substantially changed,
then a planner change has been made which affects planner performance but does not correlate with any new outcomes,
which could indicate a regression.

Total: 2 queries

Statistical Summary (Only Metrics Changed)

task_count:

Average change: -1088.5
Median change: -1026
Standard deviation: 62.5
Range: -1151 to -1026
Queries changed: 2
No regressions! 🎉

transform_count:

Average change: -187.0
Median change: -177
Standard deviation: 10.0
Range: -197 to -177
Queries changed: 2
No regressions! 🎉

transform_yield_count:

Average change: -50.5
Median change: -41
Standard deviation: 9.5
Range: -60 to -41
Queries changed: 2
No regressions! 🎉

insert_new_count:

Average change: -132.5
Median change: -115
Standard deviation: 17.5
Range: -150 to -115
Queries changed: 2
No regressions! 🎉

insert_reused_count:

Average change: -20.5
Median change: -20
Standard deviation: 0.5
Range: -21 to -20
Queries changed: 2
No regressions! 🎉

Significant Regressions (Only Metrics Changed)

There were 2 outliers detected. Outlier queries have a significant regression in at least one field. Statistically, this represents either an increase of more than two standard deviations above the mean or a large absolute increase (e.g., 100).

yaml-tests/src/test/resources/standard-tests.metrics.yaml:112: EXPLAIN select * from T1 where (COL1 = 20 OR COL1 = 10) AND (COL1 = 20 OR COL1 = 10)
- explain: COVERING(I1 [EQUALS promote(@c9 AS LONG)] -> [COL1: KEY[0], ID: KEY[2]]) ⊎ COVERING(I1 [EQUALS promote(@c13 AS LONG)] -> [COL1: KEY[0], ID: KEY[2]]) | DISTINCT BY PK | FETCH
- task_count: 2476 -> 1325 (-1151)
- transform_count: 467 -> 270 (-197)
- transform_yield_count: 143 -> 83 (-60)
- insert_new_count: 315 -> 165 (-150)
- insert_reused_count: 44 -> 24 (-20)
yaml-tests/src/test/resources/standard-tests.metrics.yaml:125: EXPLAIN select * from T1 where (COL1 = 20 OR COL1 = 10) AND (COL1 = 20 OR COL1 = 10) ORDER BY COL1
- explain: ISCAN(I1 [EQUALS promote(@c9 AS LONG)]) ∪ ISCAN(I1 [EQUALS promote(@c13 AS LONG)]) COMPARE BY (_.COL1, recordType(_), _.ID)
- task_count: 2218 -> 1192 (-1026)
- transform_count: 418 -> 241 (-177)
- transform_yield_count: 105 -> 64 (-41)
- insert_new_count: 247 -> 132 (-115)
- insert_reused_count: 42 -> 21 (-21)

hazefully requested a review from normen662 October 16, 2025 14:57

hazefully added the enhancement New feature or request label Oct 16, 2025

hazefully force-pushed the improve-rewriting-cost-model branch from 3b65482 to 65694f8 Compare October 16, 2025 15:09

hazefully force-pushed the improve-rewriting-cost-model branch from 65694f8 to ca8d8c9 Compare December 9, 2025 23:33

hazefully commented Dec 9, 2025

View reviewed changes

hazefully added 3 commits December 9, 2025 23:50

Consider number of predicates at each level in rewriting cost model

0a46da4

Add yaml-test showcasing improved rewriting cost model

9684bf5

Update metrics for existing yaml tests

cd4e175

hazefully force-pushed the improve-rewriting-cost-model branch from ca8d8c9 to cd4e175 Compare December 9, 2025 23:50

hazefully marked this pull request as ready for review December 10, 2025 00:30

normen662 requested changes Dec 16, 2025

View reviewed changes

hazefully added 9 commits December 17, 2025 15:06

Update metrics and plans for YAML tests

bbf1e70

Add an additional unit test for simplified predicates comparison

c477aad

Make use of ImmutableMap and ImmutableList everywhere

104805c

For consistency with the rest of the codebase.

Add javadoc comments and use SortedMap in PredicateCountByLevelProperty

e015d0b

Merge branch 'main' into HEAD

5ac9587

Improve comment in RewritingCostModel

55d22ac

Remove unused imports

9aa13e1

hazefully requested a review from normen662 December 18, 2025 18:29

	public static <E extends RelationalExpression> BiFunction<ExpressionPartition<E>, ? super E, Tuple> comparisonByPropertyList(@Nonnull ExpressionProperty<?>... expressionProperties) {
	return (partition, expression) ->
	Tuple.fromItems(Arrays.stream(expressionProperties)
	.map(property -> partition.getNonPartitioningPropertyValue(expression, property))
	.collect(Collectors.toList()));

Consider number of predicates at each level in rewriting cost model #3681

Are you sure you want to change the base?

Consider number of predicates at each level in rewriting cost model #3681

Conversation

hazefully commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance impact

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

normen662 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 18, 2025

📊 Metrics Diff Analysis Report

Summary

New Queries

Plan and Metrics Changed

Statistical Summary (Plan and Metrics Changed)

Significant Regressions (Plan and Metrics Changed)

Only Metrics Changed

Statistical Summary (Only Metrics Changed)

Significant Regressions (Only Metrics Changed)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hazefully commented Oct 16, 2025 •

edited

Loading