SOLR-17319 : Combined Query Feature for Multi Query Execution #3418

ercsonusharma · 2025-07-04T09:19:21Z

https://issues.apache.org/jira/browse/SOLR-17319

Description

This feature aims to execute multiple queries of multiple kinds across multiple shards of a collection and combine their result basis an algorithm (like Reciprocal Rank Fusion). It also help resolve the issues being discussed w.r.t the previous PR, mainly around across shard documents merging. It provides more flexibility in terms of querying extending JSON Query DSL ultimately enabling Hybrid Search in a pure way solving the shortcomings.

This feature is currently not supported for non-distributed and grouping query.

Solution

Extended the QueryComponent to create new CombinedQueryComponent and ResponseBuilder to create new CombinedQueryResponseBuilder supports multiple response builders to hold the state and execute multiple queries.
In JSON Query DSL, a parameter is added to identity Combined Query request and basis that it invokes the new CombinedQueryComponent
CombinedQueryComponent have multiple response builders assigned for each query. These queries are first executed at the SolrSearchIndexer level and combined them using RRF for now.
At Shard level also, the responses for the multiple queries are merged.

Tests

Added tests for testing the RRF logic independently.
Added tests for across search index and distributed requests.
Added tests to assert existing behaviour of search handler's QueryComponent as well as for the newly added CombinedQueryComponent basis the flag in json query DSL.

Checklist

Please review the following and check all that apply:

I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
I have created a Jira issue and added the issue ID to my pull request title.
I have given Solr maintainers access to contribute to my PR branch. (optional but recommended, not available for branches on forks living under an organisation)
I have developed this patch against the main branch.
I have run ./gradlew check.
I have added tests for my changes.
I have added documentation for the Reference Guide

cpoerschke · 2025-07-04T16:56:56Z

solr/core/src/java/org/apache/solr/handler/component/CombinedQueryComponent.java

+   */
+  @Override
+  public void prepare(ResponseBuilder rb) throws IOException {
+    if (rb instanceof CombinedQueryResponseBuilder crb) {


Not seen this (newer) java pattern matching with instanceof before, nice!

cpoerschke · 2025-07-04T17:10:56Z

solr/core/src/java/org/apache/solr/handler/component/QueryComponent.java

+      for (int i = resultSize - 1; i >= 0; i--) {
+        ShardDoc shardDoc = queue.pop();
+        shardDoc.positionInResponse = i;
+        // Need the toString() for correlation with other lists that must
+        // be strings (like keys in highlighting, explain, etc)
+        resultIds.put(shardDoc.id.toString(), shardDoc);
+      }


Wondering if factoring out a protected QueryComponent method for this block (and the resultSize and resultIds above) would allow the CombinedQueryComponent to override the method, avoiding the need for rb instanceof CombinedQueryResponseBuilder above e.g.

Suggested change

for (int i = resultSize - 1; i >= 0; i--) {

ShardDoc shardDoc = queue.pop();

shardDoc.positionInResponse = i;

// Need the toString() for correlation with other lists that must

// be strings (like keys in highlighting, explain, etc)

resultIds.put(shardDoc.id.toString(), shardDoc);

}

Map<Object, ShardDoc> resultIds = createResultIds(queue, ss.getOffset());

Thanks for the input. This makes sense to me, and I have refactored out the method to leverage overriding.

cpoerschke · 2025-07-04T17:25:10Z

solr/core/src/java/org/apache/solr/handler/component/CombinedQueryComponent.java

+      boolean partialResults = false;
+      boolean segmentTerminatedEarly = false;
+      List<QueryResult> queryResults = new ArrayList<>();


Am not familiar with RRF on partial results, if that is a concept? But wondering if conceptually it's up to the combiner to decide e.g.

Suggested change

boolean partialResults = false;

boolean segmentTerminatedEarly = false;

List<QueryResult> queryResults = new ArrayList<>();

List<Boolean> partialResults = new ArrayList<>(crb.responseBuilders.size());

List<Boolean> segmentTerminatedEarly = new ArrayList<>(crb.responseBuilders.size());

List<QueryResult> queryResults = new ArrayList<>(crb.responseBuilders.size());

and then later (pseudo code)

combinedPartialResults, combinedSegmentTerminatedEarly, combinedQueryResult = combinerStrategy.combine(partialResults, segmentTerminatedEarly, queryResults);

RRF should just merge multiple doc results irrespective of whether they are partial or complete IMHO. If any of the ResponseBuilder QueryResults contain partial results, the whole merged QueryResults should be marked as partialResults. Same should be the case with segmentTerminatedEarly.

ercsonusharma · 2025-07-09T17:21:12Z

@alessandrobenedetti @dsmiley, please help review it whenever you can. Thanks!

atris · 2025-07-09T18:42:34Z

solr/core/src/java/org/apache/solr/handler/component/CombinedQueryComponent.java

+ * The CombinedQueryComponent class extends QueryComponent and provides support for executing
+ * multiple queries and combining their results.
+ */
+public class CombinedQueryComponent extends QueryComponent {


QueryComponent is specifically designed for Solr's distributed search processing. We override prepare method, but then invoke super.prepare with the sub response. This could quickly get uncontrolled for a query with large number of clauses.

I would suggest overriding SearchComponent and defining explicit subBuilder.process and subBuilder.prepare methods.

atris · 2025-07-09T18:43:37Z

solr/core/src/java/org/apache/solr/handler/component/CombinedQueryComponent.java

+   * @throws IOException if an I/O error occurs during preparation
+   */
+  @Override
+  public void prepare(ResponseBuilder rb) throws IOException {


How does this work with grouping, highlighting and faceting? Those methods from QueryComponent are not overridden here, so updated ResponseBuilders are not propagated there.

Highlighting and Faceting are separate components, so not affected, but as far as grouping is concerned, merge logic has to be there. Adding...

... so updated ResponseBuilders are not propagated there. ...

So to perhaps illustrate with an example, https://github.com/apache/solr/blob/releases/solr/9.8.1/solr/core/src/java/org/apache/solr/handler/component/HighlightComponent.java#L97 sets the rb.doHighlights flag and this would be on on the (CombinedQuery)ResponseBuilder builder but not CombinedQueryResponseBuilder.responseBuilders builders.

yes, and (CombinedQuery)ResponseBuilder builder is already populated with all the parameters including highlights query here.

atris · 2025-07-09T18:44:13Z

solr/core/src/java/org/apache/solr/handler/component/CombinedQueryComponent.java

+   * QueryAndResponseCombiner strategy, and sets the appropriate results and metadata in the
+   * CombinedQueryResponseBuilder.
+   *
+   * @param rb the ResponseBuilder object to process


I wonder if we can abstract the subquery propagated execution in a separate class:

public class SubQueryExecutor { private final SolrQueryRequest sharedReq; private final List<SearchComponent> components; public SubQueryExecutor(SolrQueryRequest req, List<SearchComponent> components) { this.sharedReq = req; this.components = components; } public void execute(List<ResponseBuilder> builders) throws IOException { for (ResponseBuilder rb : builders) { for (SearchComponent c : components) { c.prepare(rb); // or distributedPrepare } } for (ResponseBuilder rb : builders) { for (SearchComponent c : components) { c.process(rb); // or distributedProcess } } } }

This will avoid nesting like super.prepare(rbNew);

This kind of orchestration is already happening in SearchHandler - IMO, iterating through each ResponseBuilder is not needed for every SearchComponent. Only the CombinedQueryComponent needs the multiple queries ResponseBuilder for multi-query execution.

... Only the CombinedQueryComponent needs the multiple queries ResponseBuilder for multi-query execution.

From code reading it appears that if one wanted to have highlighting for the results being combined then the highlighting component would also need access.

But then again, perhaps that and various things could be initially deferred as out-of-scope (and documented as such) e.g. no combining with highlighting or faceting or cursor mark functionality.

Multiple Queries are executed in CombinedQueryComponent (just queries) which is set inside the queries field in JSON query DSL. After that, all the other components like Faceting and highlighting happens from (CombinedQuery)ResponseBuilder builder.

From code reading it appears that if one wanted to have highlighting for the results being combined then the highlighting component would also need access.

Not exactly, the highlighting components works by highlighting on the rb.getResults() set which is already create in the CombinedQueryComponent.

atris · 2025-07-09T18:46:15Z

solr/core/src/java/org/apache/solr/handler/component/CombinedQueryComponent.java

+    for (int i = 0; i < resultSize; i++) {
+      ShardDoc shardDoc = combinedShardDocs.get(i);
+      shardDoc.positionInResponse = i;
+      maxScore = Math.max(maxScore, shardDoc.score);


How does this normalise across different query types (KNN, BM25,, filters)?

Normalisation is not applicable for RRF

atris · 2025-07-09T18:50:02Z

solr/core/src/java/org/apache/solr/search/combine/ReciprocalRankFusion.java

+      String docId = scoredDoc.getKey();
+      Float score = scoredDoc.getValue();
+      ShardDoc shardDoc = docIdToShardDoc.get(docId);
+      shardDoc.score = score;


This is dangerous - this is mutating the original ShardDoc object. It might be referred to by another component, and is a bad idea to modify in place.

ShardDoc is local to the mergeIds method and not shared in any other object available to other component. Also, SolrDocumentList is created later using ShardDoc. Please help me understand if it's being shared anywhere else.

atris · 2025-07-09T18:51:09Z

solr/core/src/java/org/apache/solr/search/combine/QueryAndResponseCombiner.java

+  public static QueryAndResponseCombiner getImplementation(SolrParams requestParams) {
+    String algorithm =
+        requestParams.get(CombinerParams.COMBINER_ALGORITHM, CombinerParams.RECIPROCAL_RANK_FUSION);
+    if (algorithm.equals(CombinerParams.RECIPROCAL_RANK_FUSION)) {


This is hardcoded - why not have a Plugin interface here, allowing dynamic plugin loaded here?

atris · 2025-07-09T18:51:35Z

solr/core/src/java/org/apache/solr/search/combine/QueryAndResponseCombiner.java

+   * @return a list of explanations for the given queries and results
+   * @throws IOException if an I/O error occurs during the explanation retrieval process
+   */
+  public abstract NamedList<Explanation> getExplanations(


Please implement support for debug as well

atris · 2025-07-09T19:09:48Z

solr/core/src/java/org/apache/solr/handler/component/SearchHandler.java

@@ -240,7 +241,7 @@ public void changed(SolrPackageLoader.SolrPackage pkg, Ctx ctx) {
  }

  @SuppressWarnings({"unchecked"})
-  private void initComponents() {
+  private void initComponents(boolean isCombinedQuery) {


This is a code smell - initComponents should not be changing behaviour based on a flag specific to a component.

This would be solved if we inherited from SearchHandler or dynamically injected CombinedQueryComponent using a factory pattern

Strongly agree with the smell!

atris · 2025-07-09T19:10:36Z

solr/core/src/test/org/apache/solr/handler/component/CombinedQueryComponentTest.java

+import org.junit.BeforeClass;
+
+/**
+ * The CombinedQueryComponentTest class is a unit test suite for the CombinedQueryComponent in Solr.


We should add tests for queries returning no results and score ties

Going to add user input for ordering the docs across multiple queries in case of tie.

atris · 2025-07-09T19:11:07Z

solr/core/src/test/org/apache/solr/handler/component/CombinedQueryComponentTest.java

+    // cosine distance vector1= 0.970
+    docs.get(6).addField(vectorField, Arrays.asList(5f, 10f, 20f, 40f));
+    // cosine distance vector1= 0.515
+    docs.get(7).addField(vectorField, Arrays.asList(120f, 60f, 30f, 15f));


Please add test for the RRF score calculation explainability

atris · 2025-07-09T19:15:05Z

solr/core/src/java/org/apache/solr/search/combine/ReciprocalRankFusion.java

+      while (docs.hasNext() && ranking <= upTo) {
+        int docId = docs.nextDoc();
+        float rrfScore = 1f / (k + ranking);
+        docIdToScore.compute(docId, (id, score) -> (score == null) ? rrfScore : score + rrfScore);


This is assuming that each query returns upTo number of documents - what happens when a query returns lesser number of documents?

Then, only docs.size() number of documents are ranked.

atris · 2025-07-09T19:17:05Z

solr/core/src/java/org/apache/solr/search/combine/ReciprocalRankFusion.java

+      totalMatches = Math.max(totalMatches, rankedList.matches());
+      int ranking = 1;
+      while (docs.hasNext() && ranking <= upTo) {
+        int docId = docs.nextDoc();


upTo limit is per query, not a global top N. In fusion part, we return all unique docs across all subqueries. Where are we enforcing user specified top N limit?

As usual, User-specified top N is being enforced at the shard level here
and search index level at SolrIndexSearcher by setting the nums and offset in SortSpec.

atris · 2025-07-09T19:18:59Z

solr/core/src/java/org/apache/solr/search/combine/ReciprocalRankFusion.java

+      }
+    }
+    List<Map.Entry<Integer, Float>> sortedByScoreDescending =
+        docIdToScore.entrySet().stream()


This is essentially number of queries * upto. Have we scale tested this?

We need to enforce some limit on the no of queries to avoid a burst of queries. Adding..

atris · 2025-07-09T19:19:35Z

solr/core/src/java/org/apache/solr/search/combine/ReciprocalRankFusion.java

+
+    int combinedResultsLength = docIdToScore.size();
+    int[] combinedResultsDocIds = new int[combinedResultsLength];
+    float[] combinedResultScores = new float[combinedResultsLength];


How about early termination for non competitive iterators?

Early termination is not applicable in this context, as the complete set of documents is required for the RRF (Reciprocal Rank Fusion) algorithm to function correctly.

dsmiley

Really glad to see this work began by acknowledging the existing work and trying to address the pitfalls!

dsmiley · 2025-07-09T21:52:25Z

solr/core/src/java/org/apache/solr/handler/component/QueryComponent.java

+   * @return a map of shard documents, where the keys are the shard IDs as strings, and the values
+   *     are the corresponding ShardDoc objects
+   */
+  protected Map<Object, ShardDoc> createShardResult(


Shouldn't the key be a String and not an Object?

It should be String but the ResponseBuilder has type Object so had to keep it Object.

dsmiley · 2025-07-09T21:53:25Z

solr/core/src/java/org/apache/solr/handler/component/SearchHandler.java

@@ -240,7 +241,7 @@ public void changed(SolrPackageLoader.SolrPackage pkg, Ctx ctx) {
  }

  @SuppressWarnings({"unchecked"})
-  private void initComponents() {
+  private void initComponents(boolean isCombinedQuery) {


Strongly agree with the smell!

dsmiley · 2025-07-09T22:01:28Z

solr/core/src/java/org/apache/solr/handler/component/QueryComponent.java

+      ShardFieldSortedHitQueue queue,
+      Map<String, List<ShardDoc>> shardDocMap,
+      SolrDocumentList responseDocs) {
+    Map<Object, ShardDoc> resultIds = new HashMap<>();


See org.apache.solr.common.util.CollectionUtil#newHashMap and pre-size

dsmiley · 2025-07-09T22:03:15Z

solr/core/src/java/org/apache/solr/search/combine/QueryAndResponseCombiner.java

+ * The QueryAndResponseCombiner class is an abstract base class for combining query results and
+ * responses. It provides a framework for different algorithms to be implemented for merging ranked


results & responses -- seem synonymous to me.

alessandrobenedetti · 2025-07-10T09:54:16Z

Hi @ercsonusharma , thanks for resurrecting this, didn't have time to dedicate to the feature in the last few months, good to see some movement!

In the next couple of weeks, I should be able to give it a go and review it!

Sonu Sharma added 4 commits July 4, 2025 14:24

Combined Query Feature for Multi Query Execution

bf3cd5d

Tests: Combined Query Feature for Multi Query Execution

182bec9

Tests: Combined Query Feature for Multi Query Execution

b884f0e

Tests: Combined Query Feature for Multi Query Execution

29e8aea

github-actions bot added client:solrj tests cat:search module:clustering labels Jul 4, 2025

Improve: Fix typo

c113799

cpoerschke reviewed Jul 4, 2025

View reviewed changes

ercsonusharma added 2 commits July 4, 2025 22:58

Tests: Fix errors

3600ed3

Review comments: implementation

9b0c76e

atris requested changes Jul 9, 2025

View reviewed changes

dsmiley reviewed Jul 9, 2025

View reviewed changes

ercsonusharma added 3 commits July 12, 2025 14:23

Code review changes

a841bc7

Code review changes

91f8e09

Code review changes

cace1f7

github-actions bot removed the module:clustering label Jul 12, 2025

ercsonusharma added 2 commits July 13, 2025 21:35

Code review changes

299db43

Code review changes

840070e

		* The QueryAndResponseCombiner class is an abstract base class for combining query results and
		* responses. It provides a framework for different algorithms to be implemented for merging ranked

SOLR-17319 : Combined Query Feature for Multi Query Execution #3418

Are you sure you want to change the base?

SOLR-17319 : Combined Query Feature for Multi Query Execution #3418

Uh oh!

Conversation

ercsonusharma commented Jul 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Solution

Tests

Checklist

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ercsonusharma commented Jul 9, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ercsonusharma Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

atris Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ercsonusharma commented Jul 4, 2025 •

edited

Loading

ercsonusharma Jul 10, 2025 •

edited

Loading

atris Jul 9, 2025 •

edited

Loading