SOLR-17319 : Combined Query Feature for Multi Query Execution #3418

ercsonusharma · 2025-07-04T09:19:21Z

https://issues.apache.org/jira/browse/SOLR-17319

Description

This feature aims to execute multiple queries of multiple kinds across multiple shards of a collection and combine their result basis an algorithm (like Reciprocal Rank Fusion). It also help resolve the issues being discussed w.r.t the previous PR, mainly around across shard documents merging. It provides more flexibility in terms of querying extending JSON Query DSL ultimately enabling Hybrid Search in a pure way solving the shortcomings.

Note: This feature is currently unsupported for non-distributed and grouping query.

Solution

Extended the QueryComponent to create new CombinedQueryComponent and ResponseBuilder to create new CombinedQueryResponseBuilder supports multiple response builders to hold the state and execute multiple queries.
In JSON Query DSL, a parameter is added to identity Combined Query request and basis that it invokes the new CombinedQueryComponent
CombinedQueryComponent have multiple response builders assigned for each query. These queries are first executed at the SolrSearchIndexer level and combined them using RRF for now.
At Shard level also, the responses for the multiple queries are merged.

Tests

Added tests for testing the RRF logic independently.
Added tests for across search index and distributed requests.
Added tests to assert existing behaviour of search handler's QueryComponent as well as for the newly added CombinedQueryComponent basis the flag in json query DSL.

Checklist

Please review the following and check all that apply:

I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
I have created a Jira issue and added the issue ID to my pull request title.
I have given Solr maintainers access to contribute to my PR branch. (optional but recommended, not available for branches on forks living under an organisation)
I have developed this patch against the main branch.
I have run ./gradlew check.
I have added tests for my changes.
I have added documentation for the Reference Guide

solr/core/src/java/org/apache/solr/handler/component/CombinedQueryComponent.java

solr/core/src/java/org/apache/solr/handler/component/QueryComponent.java

solr/core/src/java/org/apache/solr/handler/component/CombinedQueryComponent.java

ercsonusharma · 2025-07-09T17:21:12Z

@alessandrobenedetti @dsmiley, please help review it whenever you can. Thanks!

solr/core/src/java/org/apache/solr/handler/component/CombinedQueryComponent.java

solr/core/src/java/org/apache/solr/search/combine/ReciprocalRankFusion.java

solr/core/src/java/org/apache/solr/search/combine/QueryAndResponseCombiner.java

solr/core/src/java/org/apache/solr/handler/component/SearchHandler.java

solr/core/src/test/org/apache/solr/handler/component/CombinedQueryComponentTest.java

solr/core/src/java/org/apache/solr/search/combine/ReciprocalRankFusion.java

dsmiley

Really glad to see this work began by acknowledging the existing work and trying to address the pitfalls!

solr/core/src/java/org/apache/solr/handler/component/QueryComponent.java

solr/core/src/java/org/apache/solr/handler/component/SearchHandler.java

solr/core/src/java/org/apache/solr/handler/component/QueryComponent.java

solr/core/src/java/org/apache/solr/search/combine/QueryAndResponseCombiner.java

alessandrobenedetti · 2025-07-10T09:54:16Z

Hi @ercsonusharma , thanks for resurrecting this, didn't have time to dedicate to the feature in the last few months, good to see some movement!

In the next couple of weeks, I should be able to give it a go and review it!

solr/core/src/java/org/apache/solr/handler/component/CombinedQuerySearchHandler.java

solr/core/src/java/org/apache/solr/handler/component/CombinedQueryComponent.java

solr/core/src/java/org/apache/solr/handler/component/QueryComponent.java

dsmiley · 2025-09-02T13:28:26Z

Can you please "resolve" any conversation you think were addressed? This is a long PR with many conversations, making it hard to catch up with the current state.

solr/solr-ref-guide/modules/query-guide/pages/json-combined-query-dsl.adoc

…ue instead of implementMergeIds-taking-ShardDocQueueFactory

solr/core/src/java/org/apache/solr/handler/component/FacetComponent.java

solr/core/src/java/org/apache/solr/handler/component/HighlightComponent.java

…a/solr into feat_combined_query

solr/core/src/java/org/apache/solr/search/combine/ReciprocalRankFusion.java

solr/core/src/java/org/apache/solr/handler/component/CombinedQueryComponent.java

dsmiley · 2025-09-04T05:50:02Z

solr/core/src/java/org/apache/solr/handler/component/CombinedQueryComponent.java

+        final var unparsedQuery = params.get(queryKey);
+        ResponseBuilder rbNew = new ResponseBuilder(rb.req, new SolrQueryResponse(), rb.components);
+        rbNew.setQueryString(unparsedQuery);
+        super.prepare(rbNew);


wouldn't we want to manipulate the sort spec so that we get all docs up to offset (AKA "start" param) + rows since RRF/combiner is going to want to see all docs/rankings up to offset+rows? Otherwise our combiner is blind to the "offset" docs. Assuming you agree, then we need to basically apply paging at this layer (our component) instead of letting the subquery do it.

It anyways happening here

That's for distributed-search but not single-core search.

I think user-managed/standalone vs SolrCloud is orthogonal. This is about a single shard working correctly (in whatever Solr mode). IMO it's not optional for basic paging parameters to work correctly with one shard.

I could imagine we'd prefer a mechanism for a SearchComponent to force the "shortCircuit"=false thereby ensuring there's always a distributed phase. Maybe that could be done by re-ordering SearchHandler's call to getAndPrepShardHandler to be after prepareComponents (swap adjacent lines)? Then the prepare method of this component could force distrib and add the shortCircuit=false or something like that. And/or maybe a component should have a more elegant callback to communicate that it forces distributed search (even when there's one shard/core). This would overall simplify this component, no longer needing to handle paging in process(); instead do for distributed-search only.

solr/core/src/java/org/apache/solr/handler/component/CombinedQueryComponent.java

solr/core/src/java/org/apache/solr/search/combine/QueryAndResponseCombiner.java

solr/core/src/test/org/apache/solr/handler/component/CombinedQueryComponentTest.java

solr/core/src/test/org/apache/solr/handler/component/DistributedCombinedQueryComponentTest.java

solr/core/src/test/org/apache/solr/handler/component/CombinedQueryComponentTest.java

solr/core/src/java/org/apache/solr/search/combine/ReciprocalRankFusion.java

solr/core/src/java/org/apache/solr/handler/component/CombinedQueryComponent.java

dsmiley

The beauty/wisdom of BaseDistributedSearchTestCase is that it tests consistency between single shard and multi-shard. I think it's brilliant; that is the point of this base class. Doing so requires that you use the correct utility methods it provides. I noticed your test calls queryServer instead of query. If you look at their impls, you'll see what I'm getting at. You'll see other subclass tests using the various methods to do these tests.

I suspect there's a single-shard pagination bug. If so, then correct usage of this base class would surface it without you having to write more tests.

solr/core/src/java/org/apache/solr/search/combine/ReciprocalRankFusion.java

solr/core/src/test/org/apache/solr/handler/component/DistributedCombinedQueryComponentTest.java

dsmiley · 2025-09-04T16:33:50Z

The beauty/wisdom of BaseDistributedSearchTestCase is that it tests consistency between single shard and multi-shard. I think it's brilliant; that is the point of this base class.

Yet this PR/approach will not be able to comply since unlike most (all?) components, its results are affected substantially by distributed-search. The (unsaid?) vision of sharding / distributed-search was getting the same results as a single shard, and Solr does the work to pull off that trick, with plenty of tests demonstrating it does. In fact I'd say, with great disappointment, that the observed (by a user) results of this component will not be RRF when there's distributed search over shards.

ercsonusharma · 2025-09-08T02:57:56Z

Yet this PR/approach will not be able to comply since unlike most (all?) components, its results are affected substantially by distributed-search. The (unsaid?) vision of sharding / distributed-search was getting the same results as a single shard, and Solr does the work to pull off that trick, with plenty of tests demonstrating it does. In fact I'd say, with great disappointment, that the observed (by a user) results of this component will not be RRF when there's distributed search over shards.

pushed a change to the PR that adds an option for the user to choose which Combiner method to use — Way 1 (pre) or Way 2 (post). Please help with reviewing.

solr/core/src/java/org/apache/solr/handler/component/CombinedQuerySearchHandler.java

dsmiley · 2025-09-09T04:16:21Z

solr/core/src/java/org/apache/solr/handler/component/CombinedQuerySearchHandler.java

+      solrParams.set(ShardParams.SHARDS, localShardUrl);
+      req.setParams(solrParams);


I think it's a bit sneaky that a predicate looking method has a side-effect. This is a hack to work around a need for something proper -- for a component or handler to communicate we need the distributed search algorithm (no so-called short-circuit).

I followed what you said here:
I agree it's a bit sneaky. However, Apologies, but yet I didn't follow the part below:

for a component or handler to communicate, we need the distributed search algorithm (no so-called short-circuit).

afaik, in standalone distributed search, we need the shards value to be passed by the user.

It's not just standalone; a single shard collection will "short-circuit". I was going to link to my same comment. I understand the need/desire. Interestingly, this is the first component to want to prevent the short circuit, but I could see it being useful for any search component author who doesn't want the extra development cost of an optimized single-shard algorithm.

Perhaps if SearchHandler called SearchComponent.prepare before it initialized ShardHandler (that latter part needs to know if short-circuit), then a component could add the short-circuit param.

Makes sense. Have changed the logic in such a way that it can be useful for other search components as well where a distributed request can be forced depending on the component requirement.

dsmiley · 2025-09-09T04:33:49Z

solr/core/src/test/org/apache/solr/handler/component/DistributedCombinedQueryComponentTest.java

Please update this test to not call queryServer since it doesn't compare against the control client. I wish that method came with a big fat disclaimer and was named differently that didn't look so normal. There are probably more callers of it than there should be. I did #3639 just now.

In order for this test to compare both single shard & multi-shard with score relevancy, you could do a couple different things. One is to use the ExactStatsCache for distributed-IDF. Or, write queries that set a constant/fixed score. That would actually be most clear.

With such a test, you don't need to test non-distributed since the test infrastructure here allows you to do both as one.

True, but I also wanted to show the results are not the same as what was expected in a single shard, as I described in the jira issue. Asserting only ideal queries (like constant/fixed score) or a type of queries (lexical with ExactStatsCache) may not give a clear picture to the algorithm IMHO.

ercsonusharma · 2025-09-16T04:14:38Z

@dsmiley, Really appreciate your time and effort so far on the PR. When you have a moment, would you mind taking a look at the latest commit changes addressing the concerns raised previously? Thanks!

dsmiley

I was away at a conference; I'm back now. I just did a partial review. I have doubts that it makes sense to include "Way 1" if, I imagine, it increases the complexity / documentation matters and more fundamentally... why would someone choose it.

dsmiley · 2025-09-16T14:00:08Z

solr/core/src/java/org/apache/solr/handler/component/HttpShardHandler.java

+    if (req.getHttpSolrCall() != null
+        && StringUtils.isEmpty(req.getParams().get(ShardParams.SHARDS))) {
+      String scheme = req.getHttpSolrCall().getReq().getScheme();
+      String host = req.getHttpSolrCall().getReq().getServerName();
+      int port = req.getHttpSolrCall().getReq().getServerPort();
+      String context = req.getHttpSolrCall().getReq().getContextPath();
+      String core = req.getCore().getName();
+      String localShardUrl =
+          String.format(Locale.ROOT, "%s://%s:%d%s/%s", scheme, host, port, context, core);
+      solrParams.set(ShardParams.SHARDS, localShardUrl);
+      req.setParams(solrParams);
+      return;
+    }


This part looks suspicious to me; so shortCircuit=false isn't enough? Did you construct this based on seeing similar code elsewhere (where?) to create a shards URL?

No, shortCircuit=false is not enough as it helps make the rb.isDistrib=false and force the method to not go through the shardHandler requests, rather stick to node local indexes through SearchComponent.process method.
I couldn't find this code anywhere but created myself.

dsmiley · 2025-09-16T14:01:00Z

solr/core/src/java/org/apache/solr/handler/component/ResponseBuilder.java

@@ -141,6 +141,15 @@ public ResponseBuilder(
  public List<ShardRequest> outgoing; // requests to be sent
  public List<ShardRequest> finished; // requests that have received responses from all shards
  public String shortCircuitedURL;
+  private boolean forcedDistrib = false;


I thought of this as well, yet it feels weird to put a request aspect into the response data holder. It doesn't affect the response. Any way... it's really minor. This is pragmatic.

dsmiley · 2025-09-16T14:05:37Z

solr/core/src/test/org/apache/solr/handler/component/CombinedQueryComponentTest.java

 */
-public class CombinedQueryComponentTest extends SolrTestCaseJ4 {
+public class CombinedQueryComponentTest extends BaseDistributedSearchTestCase {


I'm now confused on how to differentiate the testing approach between this class and DistributedCombinedQueryComponentTest. That one is named appropriately (consistently with other distributed tests) and extends BaseDistributedSearchTestCase. I'm not sure this test has a role/purpose if that one can be comprehensive.

Since, the algorithm doesn't support non-distributed request anymore, it didn't make sense to add test cases by extending SolrTestCaseJ4 so I have changed it. However, this class can be merged with other CombinedQueryTest but the key idea was: The class verifies the functionality of the component by performing few basic queries in single sharded mode and validating the responses including limitations and combiner plugin.

If you feel it should be merged to other test classes, I am open to making those changes when we finalise this one

ercsonusharma · 2025-09-16T15:37:19Z

I have doubts that it makes sense to include "Way 1" if, I imagine, it increases the complexity/documentation matters and more fundamentally... why would someone choose it.

I don't think if it introduces any additional complexity in terms of code or documentation (adding a parameter combiner.method), rather it reuses several piece of code.
IMO, If the data is independently sharded and shard-level scoring doesn't matter in overall query relevance, it may not make sense to first combine shard results based on their original scores and then apply RRF per query (as done in Way 2). In such cases, users may prefer Way 1.

Sonu Sharma added 4 commits July 4, 2025 14:24

Combined Query Feature for Multi Query Execution

bf3cd5d

Tests: Combined Query Feature for Multi Query Execution

182bec9

Tests: Combined Query Feature for Multi Query Execution

b884f0e

Tests: Combined Query Feature for Multi Query Execution

29e8aea

github-actions bot added client:solrj tests cat:search module:clustering labels Jul 4, 2025

Improve: Fix typo

c113799

cpoerschke reviewed Jul 4, 2025

View reviewed changes

solr/core/src/java/org/apache/solr/handler/component/CombinedQueryComponent.java Show resolved Hide resolved

cpoerschke reviewed Jul 4, 2025

View reviewed changes

solr/core/src/java/org/apache/solr/handler/component/QueryComponent.java Outdated Show resolved Hide resolved

cpoerschke reviewed Jul 4, 2025

View reviewed changes

solr/core/src/java/org/apache/solr/handler/component/CombinedQueryComponent.java Outdated Show resolved Hide resolved

ercsonusharma added 2 commits July 4, 2025 22:58

Tests: Fix errors

3600ed3

Review comments: implementation

9b0c76e

atris requested changes Jul 9, 2025

View reviewed changes

dsmiley reviewed Jul 9, 2025

View reviewed changes

ercsonusharma added 3 commits July 12, 2025 14:23

Code review changes

a841bc7

Code review changes

91f8e09

Code review changes

cace1f7

github-actions bot removed the module:clustering label Jul 12, 2025

ercsonusharma added 3 commits July 13, 2025 21:35

Code review changes

299db43

Code review changes

840070e

Improvement and fixes

d2feefc

ercsonusharma requested a review from atris July 16, 2025 18:45

cpoerschke reviewed Jul 25, 2025

View reviewed changes

solr/core/src/java/org/apache/solr/handler/component/CombinedQuerySearchHandler.java Outdated Show resolved Hide resolved

cpoerschke reviewed Jul 25, 2025

View reviewed changes

solr/core/src/java/org/apache/solr/handler/component/CombinedQueryComponent.java Outdated Show resolved Hide resolved

cpoerschke reviewed Jul 25, 2025

View reviewed changes

solr/core/src/java/org/apache/solr/handler/component/CombinedQueryComponent.java Outdated Show resolved Hide resolved

dsmiley reviewed Sep 2, 2025

View reviewed changes

solr/core/src/java/org/apache/solr/handler/component/CombinedQueryComponent.java Outdated Show resolved Hide resolved

solr/core/src/java/org/apache/solr/handler/component/QueryComponent.java Outdated Show resolved Hide resolved

cpoerschke reviewed Sep 2, 2025

View reviewed changes

solr/solr-ref-guide/modules/query-guide/pages/json-combined-query-dsl.adoc Show resolved Hide resolved

cpoerschke reviewed Sep 2, 2025

View reviewed changes

solr/solr-ref-guide/modules/query-guide/pages/json-combined-query-dsl.adoc Show resolved Hide resolved

ercsonusharma and others added 5 commits September 3, 2025 10:30

review comment fix

ac85d2f

review comment fix

7b0593c

review comment enhancement

c03c0f7

simplification/consolidation: protected QueryComponent.newShardDocQue…

a52dd22

…ue instead of implementMergeIds-taking-ShardDocQueueFactory

factor out protected QueryComponent.setResultIdsAndResponseDocs method

195f3f1

dsmiley reviewed Sep 3, 2025

View reviewed changes

solr/core/src/java/org/apache/solr/handler/component/FacetComponent.java Outdated Show resolved Hide resolved

solr/core/src/java/org/apache/solr/handler/component/HighlightComponent.java Outdated Show resolved Hide resolved

ercsonusharma added 3 commits September 3, 2025 19:57

review comment enhancement

c1f5501

Merge branch 'feat_combined_query' of https://github.com/ercsonusharm…

3649d3e

…a/solr into feat_combined_query

refactor to reduce cyclometric complexity

4eedbed

dsmiley reviewed Sep 4, 2025

View reviewed changes

review comment fixes

0990e7f

dsmiley reviewed Sep 4, 2025

View reviewed changes

solr/core/src/java/org/apache/solr/handler/component/CombinedQueryComponent.java Outdated Show resolved Hide resolved

dsmiley reviewed Sep 4, 2025

View reviewed changes

solr/core/src/java/org/apache/solr/search/combine/ReciprocalRankFusion.java Outdated Show resolved Hide resolved

solr/core/src/test/org/apache/solr/handler/component/DistributedCombinedQueryComponentTest.java Outdated Show resolved Hide resolved

debug params fix and rrf shard sort order

14ff5e1

ercsonusharma added 2 commits September 5, 2025 13:58

test cases fix and rrf shard sort order

bd637b7

introducing combiner methods as pre and post

2958599

dsmiley reviewed Sep 9, 2025

View reviewed changes

ercsonusharma added 4 commits September 11, 2025 00:51

distrib forced and doc update

c3e44c3

distrib forced fix

e2dfcef

distrib forced fix

d4b34fc

test fix

3fe93b8

dsmiley reviewed Sep 16, 2025

View reviewed changes

		solrParams.set(ShardParams.SHARDS, localShardUrl);
		req.setParams(solrParams);

SOLR-17319 : Combined Query Feature for Multi Query Execution #3418

Are you sure you want to change the base?

SOLR-17319 : Combined Query Feature for Multi Query Execution #3418

Conversation

ercsonusharma commented Jul 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Solution

Tests

Checklist

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ercsonusharma commented Jul 9, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dsmiley left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alessandrobenedetti commented Jul 10, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dsmiley commented Sep 2, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dsmiley left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

dsmiley commented Sep 4, 2025

Uh oh!

ercsonusharma commented Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

ercsonusharma commented Jul 4, 2025 •

edited

Loading

ercsonusharma commented Sep 8, 2025 •

edited

Loading

ercsonusharma Sep 9, 2025 •

edited

Loading

ercsonusharma commented Sep 16, 2025 •

edited

Loading