-
Notifications
You must be signed in to change notification settings - Fork 762
SOLR-17319 : Combined Query Feature for Multi Query Execution #3418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
solr/core/src/java/org/apache/solr/handler/component/CombinedQueryComponent.java
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/handler/component/QueryComponent.java
Outdated
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/handler/component/CombinedQueryComponent.java
Outdated
Show resolved
Hide resolved
@alessandrobenedetti @dsmiley, please help review it whenever you can. Thanks! |
solr/core/src/java/org/apache/solr/handler/component/CombinedQueryComponent.java
Outdated
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/handler/component/CombinedQueryComponent.java
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/handler/component/CombinedQueryComponent.java
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/handler/component/CombinedQueryComponent.java
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/search/combine/ReciprocalRankFusion.java
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/search/combine/QueryAndResponseCombiner.java
Outdated
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/search/combine/QueryAndResponseCombiner.java
Outdated
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/handler/component/SearchHandler.java
Outdated
Show resolved
Hide resolved
solr/core/src/test/org/apache/solr/handler/component/CombinedQueryComponentTest.java
Outdated
Show resolved
Hide resolved
solr/core/src/test/org/apache/solr/handler/component/CombinedQueryComponentTest.java
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/search/combine/ReciprocalRankFusion.java
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/search/combine/ReciprocalRankFusion.java
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/search/combine/ReciprocalRankFusion.java
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/search/combine/ReciprocalRankFusion.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really glad to see this work began by acknowledging the existing work and trying to address the pitfalls!
solr/core/src/java/org/apache/solr/handler/component/QueryComponent.java
Outdated
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/handler/component/SearchHandler.java
Outdated
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/handler/component/QueryComponent.java
Outdated
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/search/combine/QueryAndResponseCombiner.java
Outdated
Show resolved
Hide resolved
Hi @ercsonusharma , thanks for resurrecting this, didn't have time to dedicate to the feature in the last few months, good to see some movement! In the next couple of weeks, I should be able to give it a go and review it! |
solr/core/src/java/org/apache/solr/handler/component/CombinedQuerySearchHandler.java
Outdated
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/handler/component/CombinedQueryComponent.java
Outdated
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/handler/component/CombinedQueryComponent.java
Outdated
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/handler/component/CombinedQueryComponent.java
Outdated
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/handler/component/QueryComponent.java
Outdated
Show resolved
Hide resolved
Can you please "resolve" any conversation you think were addressed? This is a long PR with many conversations, making it hard to catch up with the current state. |
solr/solr-ref-guide/modules/query-guide/pages/json-combined-query-dsl.adoc
Show resolved
Hide resolved
solr/solr-ref-guide/modules/query-guide/pages/json-combined-query-dsl.adoc
Show resolved
Hide resolved
…ue instead of implementMergeIds-taking-ShardDocQueueFactory
solr/core/src/java/org/apache/solr/handler/component/FacetComponent.java
Outdated
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/handler/component/HighlightComponent.java
Outdated
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/search/combine/ReciprocalRankFusion.java
Outdated
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/handler/component/CombinedQueryComponent.java
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/handler/component/CombinedQueryComponent.java
Outdated
Show resolved
Hide resolved
final var unparsedQuery = params.get(queryKey); | ||
ResponseBuilder rbNew = new ResponseBuilder(rb.req, new SolrQueryResponse(), rb.components); | ||
rbNew.setQueryString(unparsedQuery); | ||
super.prepare(rbNew); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't we want to manipulate the sort spec so that we get all docs up to offset (AKA "start" param) + rows since RRF/combiner is going to want to see all docs/rankings up to offset+rows? Otherwise our combiner is blind to the "offset" docs. Assuming you agree, then we need to basically apply paging at this layer (our component) instead of letting the subquery do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It anyways happening here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's for distributed-search but not single-core search.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think user-managed/standalone vs SolrCloud is orthogonal. This is about a single shard working correctly (in whatever Solr mode). IMO it's not optional for basic paging parameters to work correctly with one shard.
I could imagine we'd prefer a mechanism for a SearchComponent to force the "shortCircuit"=false thereby ensuring there's always a distributed phase. Maybe that could be done by re-ordering SearchHandler's call to getAndPrepShardHandler
to be after prepareComponents
(swap adjacent lines)? Then the prepare method of this component could force distrib and add the shortCircuit=false or something like that. And/or maybe a component should have a more elegant callback to communicate that it forces distributed search (even when there's one shard/core). This would overall simplify this component, no longer needing to handle paging in process(); instead do for distributed-search only.
solr/core/src/java/org/apache/solr/handler/component/CombinedQueryComponent.java
Outdated
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/search/combine/QueryAndResponseCombiner.java
Outdated
Show resolved
Hide resolved
solr/core/src/test/org/apache/solr/handler/component/CombinedQueryComponentTest.java
Outdated
Show resolved
Hide resolved
solr/core/src/test/org/apache/solr/handler/component/DistributedCombinedQueryComponentTest.java
Outdated
Show resolved
Hide resolved
solr/core/src/test/org/apache/solr/handler/component/CombinedQueryComponentTest.java
Outdated
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/search/combine/ReciprocalRankFusion.java
Outdated
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/handler/component/CombinedQueryComponent.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The beauty/wisdom of BaseDistributedSearchTestCase is that it tests consistency between single shard and multi-shard. I think it's brilliant; that is the point of this base class. Doing so requires that you use the correct utility methods it provides. I noticed your test calls queryServer
instead of query
. If you look at their impls, you'll see what I'm getting at. You'll see other subclass tests using the various methods to do these tests.
I suspect there's a single-shard pagination bug. If so, then correct usage of this base class would surface it without you having to write more tests.
solr/core/src/java/org/apache/solr/search/combine/ReciprocalRankFusion.java
Outdated
Show resolved
Hide resolved
solr/core/src/test/org/apache/solr/handler/component/DistributedCombinedQueryComponentTest.java
Outdated
Show resolved
Hide resolved
Yet this PR/approach will not be able to comply since unlike most (all?) components, its results are affected substantially by distributed-search. The (unsaid?) vision of sharding / distributed-search was getting the same results as a single shard, and Solr does the work to pull off that trick, with plenty of tests demonstrating it does. In fact I'd say, with great disappointment, that the observed (by a user) results of this component will not be RRF when there's distributed search over shards. |
pushed a change to the PR that adds an option for the user to choose which Combiner method to use — Way 1 (pre) or Way 2 (post). Please help with reviewing. |
solr/core/src/java/org/apache/solr/handler/component/CombinedQuerySearchHandler.java
Outdated
Show resolved
Hide resolved
solrParams.set(ShardParams.SHARDS, localShardUrl); | ||
req.setParams(solrParams); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a bit sneaky that a predicate looking method has a side-effect. This is a hack to work around a need for something proper -- for a component or handler to communicate we need the distributed search algorithm (no so-called short-circuit).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I followed what you said here:
I agree it's a bit sneaky. However, Apologies, but yet I didn't follow the part below:
for a component or handler to communicate, we need the distributed search algorithm (no so-called short-circuit).
afaik, in standalone distributed search, we need the shards value to be passed by the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not just standalone; a single shard collection will "short-circuit". I was going to link to my same comment. I understand the need/desire. Interestingly, this is the first component to want to prevent the short circuit, but I could see it being useful for any search component author who doesn't want the extra development cost of an optimized single-shard algorithm.
Perhaps if SearchHandler called SearchComponent.prepare before it initialized ShardHandler (that latter part needs to know if short-circuit), then a component could add the short-circuit param.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Have changed the logic in such a way that it can be useful for other search components as well where a distributed request can be forced depending on the component requirement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update this test to not call queryServer
since it doesn't compare against the control client. I wish that method came with a big fat disclaimer and was named differently that didn't look so normal. There are probably more callers of it than there should be. I did #3639 just now.
In order for this test to compare both single shard & multi-shard with score relevancy, you could do a couple different things. One is to use the ExactStatsCache for distributed-IDF. Or, write queries that set a constant/fixed score. That would actually be most clear.
With such a test, you don't need to test non-distributed since the test infrastructure here allows you to do both as one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, but I also wanted to show the results are not the same as what was expected in a single shard, as I described in the jira issue. Asserting only ideal queries (like constant/fixed score) or a type of queries (lexical with ExactStatsCache) may not give a clear picture to the algorithm IMHO.
@dsmiley, Really appreciate your time and effort so far on the PR. When you have a moment, would you mind taking a look at the latest commit changes addressing the concerns raised previously? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was away at a conference; I'm back now. I just did a partial review. I have doubts that it makes sense to include "Way 1" if, I imagine, it increases the complexity / documentation matters and more fundamentally... why would someone choose it.
if (req.getHttpSolrCall() != null | ||
&& StringUtils.isEmpty(req.getParams().get(ShardParams.SHARDS))) { | ||
String scheme = req.getHttpSolrCall().getReq().getScheme(); | ||
String host = req.getHttpSolrCall().getReq().getServerName(); | ||
int port = req.getHttpSolrCall().getReq().getServerPort(); | ||
String context = req.getHttpSolrCall().getReq().getContextPath(); | ||
String core = req.getCore().getName(); | ||
String localShardUrl = | ||
String.format(Locale.ROOT, "%s://%s:%d%s/%s", scheme, host, port, context, core); | ||
solrParams.set(ShardParams.SHARDS, localShardUrl); | ||
req.setParams(solrParams); | ||
return; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part looks suspicious to me; so shortCircuit=false isn't enough? Did you construct this based on seeing similar code elsewhere (where?) to create a shards URL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, shortCircuit=false is not enough as it helps make the rb.isDistrib=false and force the method to not go through the shardHandler requests, rather stick to node local indexes through SearchComponent.process method.
I couldn't find this code anywhere but created myself.
@@ -141,6 +141,15 @@ public ResponseBuilder( | |||
public List<ShardRequest> outgoing; // requests to be sent | |||
public List<ShardRequest> finished; // requests that have received responses from all shards | |||
public String shortCircuitedURL; | |||
private boolean forcedDistrib = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought of this as well, yet it feels weird to put a request aspect into the response data holder. It doesn't affect the response. Any way... it's really minor. This is pragmatic.
*/ | ||
public class CombinedQueryComponentTest extends SolrTestCaseJ4 { | ||
public class CombinedQueryComponentTest extends BaseDistributedSearchTestCase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm now confused on how to differentiate the testing approach between this class and DistributedCombinedQueryComponentTest. That one is named appropriately (consistently with other distributed tests) and extends BaseDistributedSearchTestCase. I'm not sure this test has a role/purpose if that one can be comprehensive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since, the algorithm doesn't support non-distributed request anymore, it didn't make sense to add test cases by extending SolrTestCaseJ4 so I have changed it. However, this class can be merged with other CombinedQueryTest but the key idea was: The class verifies the functionality of the component by performing few basic queries in single sharded mode and validating the responses including limitations and combiner plugin.
If you feel it should be merged to other test classes, I am open to making those changes when we finalise this one
I don't think if it introduces any additional complexity in terms of code or documentation (adding a parameter combiner.method), rather it reuses several piece of code. |
https://issues.apache.org/jira/browse/SOLR-17319
Description
This feature aims to execute multiple queries of multiple kinds across multiple shards of a collection and combine their result basis an algorithm (like Reciprocal Rank Fusion). It also help resolve the issues being discussed w.r.t the previous PR, mainly around across shard documents merging. It provides more flexibility in terms of querying extending JSON Query DSL ultimately enabling Hybrid Search in a pure way solving the shortcomings.
Note: This feature is currently unsupported for non-distributed and grouping query.
Solution
Tests
Checklist
Please review the following and check all that apply:
main
branch../gradlew check
.