Skip to content

Conversation

ercsonusharma
Copy link

@ercsonusharma ercsonusharma commented Jul 4, 2025

https://issues.apache.org/jira/browse/SOLR-17319

Description

This feature aims to execute multiple queries of multiple kinds across multiple shards of a collection and combine their result basis an algorithm (like Reciprocal Rank Fusion). It also help resolve the issues being discussed w.r.t the previous PR, mainly around across shard documents merging. It provides more flexibility in terms of querying extending JSON Query DSL ultimately enabling Hybrid Search in a pure way solving the shortcomings.

Note: This feature is currently unsupported for non-distributed and grouping query.

Solution

  • Extended the QueryComponent to create new CombinedQueryComponent and ResponseBuilder to create new CombinedQueryResponseBuilder supports multiple response builders to hold the state and execute multiple queries.
  • In JSON Query DSL, a parameter is added to identity Combined Query request and basis that it invokes the new CombinedQueryComponent
  • CombinedQueryComponent have multiple response builders assigned for each query. These queries are first executed at the SolrSearchIndexer level and combined them using RRF for now.
  • At Shard level also, the responses for the multiple queries are merged.

Tests

  • Added tests for testing the RRF logic independently.
  • Added tests for across search index and distributed requests.
  • Added tests to assert existing behaviour of search handler's QueryComponent as well as for the newly added CombinedQueryComponent basis the flag in json query DSL.

Checklist

Please review the following and check all that apply:

  • I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • I have created a Jira issue and added the issue ID to my pull request title.
  • I have given Solr maintainers access to contribute to my PR branch. (optional but recommended, not available for branches on forks living under an organisation)
  • I have developed this patch against the main branch.
  • I have run ./gradlew check.
  • I have added tests for my changes.
  • I have added documentation for the Reference Guide

@ercsonusharma
Copy link
Author

@alessandrobenedetti @dsmiley, please help review it whenever you can. Thanks!

Copy link
Contributor

@dsmiley dsmiley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really glad to see this work began by acknowledging the existing work and trying to address the pitfalls!

@alessandrobenedetti
Copy link
Contributor

Hi @ercsonusharma , thanks for resurrecting this, didn't have time to dedicate to the feature in the last few months, good to see some movement!

In the next couple of weeks, I should be able to give it a go and review it!

@ercsonusharma ercsonusharma requested a review from atris July 16, 2025 18:45
@dsmiley
Copy link
Contributor

dsmiley commented Sep 2, 2025

Can you please "resolve" any conversation you think were addressed? This is a long PR with many conversations, making it hard to catch up with the current state.

final var unparsedQuery = params.get(queryKey);
ResponseBuilder rbNew = new ResponseBuilder(rb.req, new SolrQueryResponse(), rb.components);
rbNew.setQueryString(unparsedQuery);
super.prepare(rbNew);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't we want to manipulate the sort spec so that we get all docs up to offset (AKA "start" param) + rows since RRF/combiner is going to want to see all docs/rankings up to offset+rows? Otherwise our combiner is blind to the "offset" docs. Assuming you agree, then we need to basically apply paging at this layer (our component) instead of letting the subquery do it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It anyways happening here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's for distributed-search but not single-core search.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think user-managed/standalone vs SolrCloud is orthogonal. This is about a single shard working correctly (in whatever Solr mode). IMO it's not optional for basic paging parameters to work correctly with one shard.

I could imagine we'd prefer a mechanism for a SearchComponent to force the "shortCircuit"=false thereby ensuring there's always a distributed phase. Maybe that could be done by re-ordering SearchHandler's call to getAndPrepShardHandler to be after prepareComponents (swap adjacent lines)? Then the prepare method of this component could force distrib and add the shortCircuit=false or something like that. And/or maybe a component should have a more elegant callback to communicate that it forces distributed search (even when there's one shard/core). This would overall simplify this component, no longer needing to handle paging in process(); instead do for distributed-search only.

Copy link
Contributor

@dsmiley dsmiley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The beauty/wisdom of BaseDistributedSearchTestCase is that it tests consistency between single shard and multi-shard. I think it's brilliant; that is the point of this base class. Doing so requires that you use the correct utility methods it provides. I noticed your test calls queryServer instead of query. If you look at their impls, you'll see what I'm getting at. You'll see other subclass tests using the various methods to do these tests.

I suspect there's a single-shard pagination bug. If so, then correct usage of this base class would surface it without you having to write more tests.

@dsmiley
Copy link
Contributor

dsmiley commented Sep 4, 2025

The beauty/wisdom of BaseDistributedSearchTestCase is that it tests consistency between single shard and multi-shard. I think it's brilliant; that is the point of this base class.

Yet this PR/approach will not be able to comply since unlike most (all?) components, its results are affected substantially by distributed-search. The (unsaid?) vision of sharding / distributed-search was getting the same results as a single shard, and Solr does the work to pull off that trick, with plenty of tests demonstrating it does. In fact I'd say, with great disappointment, that the observed (by a user) results of this component will not be RRF when there's distributed search over shards.

@ercsonusharma
Copy link
Author

ercsonusharma commented Sep 8, 2025

Yet this PR/approach will not be able to comply since unlike most (all?) components, its results are affected substantially by distributed-search. The (unsaid?) vision of sharding / distributed-search was getting the same results as a single shard, and Solr does the work to pull off that trick, with plenty of tests demonstrating it does. In fact I'd say, with great disappointment, that the observed (by a user) results of this component will not be RRF when there's distributed search over shards.

pushed a change to the PR that adds an option for the user to choose which Combiner method to use — Way 1 (pre) or Way 2 (post). Please help with reviewing.

Comment on lines 114 to 115
solrParams.set(ShardParams.SHARDS, localShardUrl);
req.setParams(solrParams);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a bit sneaky that a predicate looking method has a side-effect. This is a hack to work around a need for something proper -- for a component or handler to communicate we need the distributed search algorithm (no so-called short-circuit).

Copy link
Author

@ercsonusharma ercsonusharma Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I followed what you said here:
I agree it's a bit sneaky. However, Apologies, but yet I didn't follow the part below:

for a component or handler to communicate, we need the distributed search algorithm (no so-called short-circuit).

afaik, in standalone distributed search, we need the shards value to be passed by the user.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not just standalone; a single shard collection will "short-circuit". I was going to link to my same comment. I understand the need/desire. Interestingly, this is the first component to want to prevent the short circuit, but I could see it being useful for any search component author who doesn't want the extra development cost of an optimized single-shard algorithm.

Perhaps if SearchHandler called SearchComponent.prepare before it initialized ShardHandler (that latter part needs to know if short-circuit), then a component could add the short-circuit param.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Have changed the logic in such a way that it can be useful for other search components as well where a distributed request can be forced depending on the component requirement.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update this test to not call queryServer since it doesn't compare against the control client. I wish that method came with a big fat disclaimer and was named differently that didn't look so normal. There are probably more callers of it than there should be. I did #3639 just now.

In order for this test to compare both single shard & multi-shard with score relevancy, you could do a couple different things. One is to use the ExactStatsCache for distributed-IDF. Or, write queries that set a constant/fixed score. That would actually be most clear.

With such a test, you don't need to test non-distributed since the test infrastructure here allows you to do both as one.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but I also wanted to show the results are not the same as what was expected in a single shard, as I described in the jira issue. Asserting only ideal queries (like constant/fixed score) or a type of queries (lexical with ExactStatsCache) may not give a clear picture to the algorithm IMHO.

@ercsonusharma
Copy link
Author

@dsmiley, Really appreciate your time and effort so far on the PR. When you have a moment, would you mind taking a look at the latest commit changes addressing the concerns raised previously? Thanks!

Copy link
Contributor

@dsmiley dsmiley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was away at a conference; I'm back now. I just did a partial review. I have doubts that it makes sense to include "Way 1" if, I imagine, it increases the complexity / documentation matters and more fundamentally... why would someone choose it.

Comment on lines +513 to +525
if (req.getHttpSolrCall() != null
&& StringUtils.isEmpty(req.getParams().get(ShardParams.SHARDS))) {
String scheme = req.getHttpSolrCall().getReq().getScheme();
String host = req.getHttpSolrCall().getReq().getServerName();
int port = req.getHttpSolrCall().getReq().getServerPort();
String context = req.getHttpSolrCall().getReq().getContextPath();
String core = req.getCore().getName();
String localShardUrl =
String.format(Locale.ROOT, "%s://%s:%d%s/%s", scheme, host, port, context, core);
solrParams.set(ShardParams.SHARDS, localShardUrl);
req.setParams(solrParams);
return;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part looks suspicious to me; so shortCircuit=false isn't enough? Did you construct this based on seeing similar code elsewhere (where?) to create a shards URL?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, shortCircuit=false is not enough as it helps make the rb.isDistrib=false and force the method to not go through the shardHandler requests, rather stick to node local indexes through SearchComponent.process method.
I couldn't find this code anywhere but created myself.

@@ -141,6 +141,15 @@ public ResponseBuilder(
public List<ShardRequest> outgoing; // requests to be sent
public List<ShardRequest> finished; // requests that have received responses from all shards
public String shortCircuitedURL;
private boolean forcedDistrib = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought of this as well, yet it feels weird to put a request aspect into the response data holder. It doesn't affect the response. Any way... it's really minor. This is pragmatic.

*/
public class CombinedQueryComponentTest extends SolrTestCaseJ4 {
public class CombinedQueryComponentTest extends BaseDistributedSearchTestCase {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm now confused on how to differentiate the testing approach between this class and DistributedCombinedQueryComponentTest. That one is named appropriately (consistently with other distributed tests) and extends BaseDistributedSearchTestCase. I'm not sure this test has a role/purpose if that one can be comprehensive.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since, the algorithm doesn't support non-distributed request anymore, it didn't make sense to add test cases by extending SolrTestCaseJ4 so I have changed it. However, this class can be merged with other CombinedQueryTest but the key idea was: The class verifies the functionality of the component by performing few basic queries in single sharded mode and validating the responses including limitations and combiner plugin.

If you feel it should be merged to other test classes, I am open to making those changes when we finalise this one

@ercsonusharma
Copy link
Author

ercsonusharma commented Sep 16, 2025

I have doubts that it makes sense to include "Way 1" if, I imagine, it increases the complexity/documentation matters and more fundamentally... why would someone choose it.

I don't think if it introduces any additional complexity in terms of code or documentation (adding a parameter combiner.method), rather it reuses several piece of code.
IMO, If the data is independently sharded and shard-level scoring doesn't matter in overall query relevance, it may not make sense to first combine shard results based on their original scores and then apply RRF per query (as done in Way 2). In such cases, users may prefer Way 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cat:search client:solrj documentation Improvements or additions to documentation tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants