-
Notifications
You must be signed in to change notification settings - Fork 741
SOLR-5707: Lucene Expressions in Solr #1244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Not looking at this anymore |
I commented in JIRA about my interest in furthering this. I'm sure there are some nice-to-have's (score function access, documentation), but simply working code & tested is enough to merge. I can brush the dust off get this PR mergeable; what do you say @risdenk ? This functionality plugs in nicely, showcasing Solr's great abstractions without needing to hack on any Solr plumbing. I would later expand the usage to support access to text fields via a DocTransformer use case but I think another PR can do that on top of the fine work already done here. |
Go for it @dsmiley I ended up not needing it but it did work when I was trying it |
# Conflicts: # solr/core/src/java/org/apache/solr/response/transform/ValueSourceAugmenter.java
Updating the PR to main was easy. Note that there is another PR #3340 that will overlap with this one which makes it possible for a DocTransformer to access the score. The failing tests relate to sorting by an expression that makes use of the score. If we mark those tests with an Addressing the score issue is complicated by the fact that Lucene's |
The ability to reference scores in a custom ValueSource based on DoubleValuesSource won't work with sorting. I filed this PR apache/lucene#14654 to Lucene to trivially fix that issue. But since Lucene 9.x is EOL; I don't know when we might expect that if ever in Solr 9. |
# Conflicts: # solr/core/src/java/org/apache/solr/response/transform/ValueSourceAugmenter.java
@hossman as the original author, maybe you'd like to review this? |
I cherry-picked the Lucene 9.12.2 upgrade (thanks @HoustonPutman ), removed the @houston not sure if you have concerns on this getting into 9.9. |
@AndreyBozhko you might be a good reviewer here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice!
|
||
The `ExpressionValueSourceParser` allows you to implement a custom valueSource merely by adding a concise JavaScript expression to your `solrconfig.xml`. | ||
The expression is precompiled, and offers competitive performance to those written in Java. | ||
The syntax is a limited subset of JavaScript that is purely numerical oriented, and only certain built-in fuctions can be called. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo "fuctions"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great - and the Lucene expressions definitely cover the use case I was working on in #3340.
I have a few tests in my PR - mostly around accessing score in function queries in various contexts, and I'm happy to adapt the tests and contribute them here.
A couple of other things that I worked on in #3340 and that would be good to consolidate:
- making sure functions can access scores in distributed queries,
- exposing score to functions that act as post-filters.
final MutableScorable scorable; // stored in fcontext (when not null) | ||
final IntFloatHashMap docToScoreMap; | ||
if (context.wantsScores()) { // TODO switch to ValueSource.needsScores once it exists | ||
docToScoreMap = new IntFloatHashMap(prefetchSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes me wish for something like
DocList sortedDocList = docList
.subset(docList.offset(), prefetchSize)
.sortedByDocId();
and then wrap the sortedDocList.iterator() as a Scorable. This could remove the need for a custom mapping between docids and scores in ValueSourceAugmenter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sorting & mapping has to happen somewhere. It's inelegant. Is your point to add a convenience API/method? If we do this in multiple places, then that'd make sense, but not otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looked to me that the mapping between docIds and scores is already established in the DocList (so why do the mapping again?), and only the sorting piece is missing. That's why I was thinking of a way to handle just the sorting.
But I agree - if this is the only place, then I'm OK with the logic as long as it works.
What would go wrong if this functionality were used in such? An aside: I have a wish for our tests to be able to run in both modes easily. I've used that strategy at work. The test here is based on our oldest test infrastructure, and that which wouldn't easily support that. I'd like to see further use of the new
I saw that one-liner you added for Do you have concerns with the merge-ability of this PR as it stands for 9.9? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would go wrong if this functionality were used in such?
I think if the expression tried to use the score, the distributed query would fail - see this comment #3340 (comment).
But overall, I'm OK with merging this - since the goal here is to support Lucene Expressions, and not necessarily to support accessing scores in function queries in every possible context.
final MutableScorable scorable; // stored in fcontext (when not null) | ||
final IntFloatHashMap docToScoreMap; | ||
if (context.wantsScores()) { // TODO switch to ValueSource.needsScores once it exists | ||
docToScoreMap = new IntFloatHashMap(prefetchSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looked to me that the mapping between docIds and scores is already established in the DocList (so why do the mapping again?), and only the sorting piece is missing. That's why I was thinking of a way to handle just the sorting.
But I agree - if this is the only place, then I'm OK with the logic as long as it works.
DocList / DocIterator / DocIterationInfo != a Lucene Scorable (which is an abstract class, BTW; limits creative class hierarchy possibilities). Furthermore the former is an iterator but the latter returns the current state. These things aren't compatible. @HoustonPutman I don't want to push too last minute for this in 9.9 unless you are comfortable with this as the RM. I think this is a great feature that has waited over 11 years too long. |
@dsmiley I just started producing the 9.9.0 release, so let's leave this for 9.10. We don't have to wait months for 9.10 either, we can do it in a month or two and get this out after baking for a little bit. |
Actually I’m having issues with the smoketester erroring in the tests, so if you can get this in by Monday and are sure it wont break the tests, go for it. It seems like a new feature, so im not too worried about it breaking anything other than the tests. |
New ExpressionValueSourceParser that allows custom function queries / VSPs to be defined in a subset of JavaScript, pre-compiled, and that which can access the score and fields. It's powered by the Lucene Expressions module. ValueSourceAugmenter: score propagation --------- Co-authored-by: Chris Hostetter <[email protected]> Co-authored-by: David Smiley <[email protected]> Co-authored-by: Ryan Ernst <[email protected]> (cherry picked from commit c3b5f57)
New ExpressionValueSourceParser that allows custom function queries / VSPs to be defined in a subset of JavaScript, pre-compiled, and that which can access the score and fields. It's powered by the Lucene Expressions module. ValueSourceAugmenter: score propagation --------- Co-authored-by: Chris Hostetter <[email protected]> Co-authored-by: David Smiley <[email protected]> Co-authored-by: Ryan Ernst <[email protected]> (cherry picked from commit c3b5f57)
Yep I'm confident in it. It's all merged. |
https://issues.apache.org/jira/browse/SOLR-5707
This is me taking Hoss's VSP patch from https://issues.apache.org/jira/browse/SOLR-5707 and trying to look at it again.
./gradlew check -x test -Pvalidation.errorprone=true -Pvalidation.sourcePatterns.failOnError=false
passes :)./gradlew -p solr/core test --tests ExpressionValueSourceParserTest
pass?