Skip to content

SOLR-5707: Lucene Expressions in Solr #1244

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Jul 13, 2025
Merged

SOLR-5707: Lucene Expressions in Solr #1244

merged 18 commits into from
Jul 13, 2025

Conversation

risdenk
Copy link
Contributor

@risdenk risdenk commented Dec 16, 2022

https://issues.apache.org/jira/browse/SOLR-5707

This is me taking Hoss's VSP patch from https://issues.apache.org/jira/browse/SOLR-5707 and trying to look at it again.

  • ./gradlew check -x test -Pvalidation.errorprone=true -Pvalidation.sourcePatterns.failOnError=false passes :)
  • nocommit are still in there
  • do we NEED to deal w/ score? can we do score different now that its been years later
  • does ./gradlew -p solr/core test --tests ExpressionValueSourceParserTest pass?
  • do all tests pass?
  • how does this perform - this is my real goal here compared to other say boostfunctions

@risdenk risdenk self-assigned this Dec 16, 2022
@risdenk
Copy link
Contributor Author

risdenk commented Oct 30, 2024

Not looking at this anymore

@risdenk risdenk closed this Oct 30, 2024
@dsmiley
Copy link
Contributor

dsmiley commented May 1, 2025

I commented in JIRA about my interest in furthering this. I'm sure there are some nice-to-have's (score function access, documentation), but simply working code & tested is enough to merge. I can brush the dust off get this PR mergeable; what do you say @risdenk ? This functionality plugs in nicely, showcasing Solr's great abstractions without needing to hack on any Solr plumbing. I would later expand the usage to support access to text fields via a DocTransformer use case but I think another PR can do that on top of the fine work already done here.

@risdenk
Copy link
Contributor Author

risdenk commented May 1, 2025

Go for it @dsmiley I ended up not needing it but it did work when I was trying it

# Conflicts:
#	solr/core/src/java/org/apache/solr/response/transform/ValueSourceAugmenter.java
@dsmiley
Copy link
Contributor

dsmiley commented May 6, 2025

Updating the PR to main was easy. Note that there is another PR #3340 that will overlap with this one which makes it possible for a DocTransformer to access the score. The failing tests relate to sorting by an expression that makes use of the score. If we mark those tests with an Ignore and add a follow-up to improve that matter, then I think we have something mergeable. Progress not perfection; right?

Addressing the score issue is complicated by the fact that Lucene's ValueSource API is legacy, has no concept of needsScores, but the replacement DoubleValuesSource does have this concept. Solr is still ValueSource based. I could see doing a switch for Solr 10 but that's for another issue and dev list discussion.

@dsmiley
Copy link
Contributor

dsmiley commented May 13, 2025

The ability to reference scores in a custom ValueSource based on DoubleValuesSource won't work with sorting. I filed this PR apache/lucene#14654 to Lucene to trivially fix that issue. But since Lucene 9.x is EOL; I don't know when we might expect that if ever in Solr 9.

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Jun 25, 2025
@dsmiley
Copy link
Contributor

dsmiley commented Jul 7, 2025

@hossman as the original author, maybe you'd like to review this?

@dsmiley dsmiley marked this pull request as ready for review July 8, 2025 02:20
@dsmiley
Copy link
Contributor

dsmiley commented Jul 8, 2025

I cherry-picked the Lucene 9.12.2 upgrade (thanks @HoustonPutman ), removed the Ignore annotation from the 2 tests, fixed a minor test bug that one Ignored test was hiding, and confirmed that this Lucene upgrade addressed the lingering concern :-)
I didn't push the Lucene upgrade commit so as to not confuse this PR, and thus those 2 tests will fail.

@houston not sure if you have concerns on this getting into 9.9.

@dsmiley
Copy link
Contributor

dsmiley commented Jul 9, 2025

@AndreyBozhko you might be a good reviewer here

Copy link
Contributor

@bruno-roustant bruno-roustant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice!


The `ExpressionValueSourceParser` allows you to implement a custom valueSource merely by adding a concise JavaScript expression to your `solrconfig.xml`.
The expression is precompiled, and offers competitive performance to those written in Java.
The syntax is a limited subset of JavaScript that is purely numerical oriented, and only certain built-in fuctions can be called.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo "fuctions"

Copy link
Contributor

@AndreyBozhko AndreyBozhko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great - and the Lucene expressions definitely cover the use case I was working on in #3340.

I have a few tests in my PR - mostly around accessing score in function queries in various contexts, and I'm happy to adapt the tests and contribute them here.

A couple of other things that I worked on in #3340 and that would be good to consolidate:

  • making sure functions can access scores in distributed queries,
  • exposing score to functions that act as post-filters.

final MutableScorable scorable; // stored in fcontext (when not null)
final IntFloatHashMap docToScoreMap;
if (context.wantsScores()) { // TODO switch to ValueSource.needsScores once it exists
docToScoreMap = new IntFloatHashMap(prefetchSize);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes me wish for something like

DocList sortedDocList = docList
    .subset(docList.offset(), prefetchSize)
    .sortedByDocId();

and then wrap the sortedDocList.iterator() as a Scorable. This could remove the need for a custom mapping between docids and scores in ValueSourceAugmenter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sorting & mapping has to happen somewhere. It's inelegant. Is your point to add a convenience API/method? If we do this in multiple places, then that'd make sense, but not otherwise.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looked to me that the mapping between docIds and scores is already established in the DocList (so why do the mapping again?), and only the sorting piece is missing. That's why I was thinking of a way to handle just the sorting.

But I agree - if this is the only place, then I'm OK with the logic as long as it works.

@dsmiley
Copy link
Contributor

dsmiley commented Jul 10, 2025

making sure functions can access scores in distributed queries,

What would go wrong if this functionality were used in such? An aside: I have a wish for our tests to be able to run in both modes easily. I've used that strategy at work. The test here is based on our oldest test infrastructure, and that which wouldn't easily support that. I'd like to see further use of the new SolrClientTestRule such as using SolrCloud; there's a JIRA for that.

exposing score to functions that act as post-filters.

I saw that one-liner you added for frange specifically (which I wish wasn't a PostFilter, but I digress). Is there a more general adaptation?

Do you have concerns with the merge-ability of this PR as it stands for 9.9?

Copy link
Contributor

@AndreyBozhko AndreyBozhko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would go wrong if this functionality were used in such?

I think if the expression tried to use the score, the distributed query would fail - see this comment #3340 (comment).

But overall, I'm OK with merging this - since the goal here is to support Lucene Expressions, and not necessarily to support accessing scores in function queries in every possible context.

final MutableScorable scorable; // stored in fcontext (when not null)
final IntFloatHashMap docToScoreMap;
if (context.wantsScores()) { // TODO switch to ValueSource.needsScores once it exists
docToScoreMap = new IntFloatHashMap(prefetchSize);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looked to me that the mapping between docIds and scores is already established in the DocList (so why do the mapping again?), and only the sorting piece is missing. That's why I was thinking of a way to handle just the sorting.

But I agree - if this is the only place, then I'm OK with the logic as long as it works.

@dsmiley
Copy link
Contributor

dsmiley commented Jul 12, 2025

DocList / DocIterator / DocIterationInfo != a Lucene Scorable (which is an abstract class, BTW; limits creative class hierarchy possibilities). Furthermore the former is an iterator but the latter returns the current state. These things aren't compatible.

@HoustonPutman I don't want to push too last minute for this in 9.9 unless you are comfortable with this as the RM. I think this is a great feature that has waited over 11 years too long.

@HoustonPutman
Copy link
Contributor

@dsmiley I just started producing the 9.9.0 release, so let's leave this for 9.10. We don't have to wait months for 9.10 either, we can do it in a month or two and get this out after baking for a little bit.

@HoustonPutman
Copy link
Contributor

Actually I’m having issues with the smoketester erroring in the tests, so if you can get this in by Monday and are sure it wont break the tests, go for it. It seems like a new feature, so im not too worried about it breaking anything other than the tests.

@dsmiley dsmiley merged commit c3b5f57 into apache:main Jul 13, 2025
4 checks passed
dsmiley pushed a commit that referenced this pull request Jul 14, 2025
New ExpressionValueSourceParser that allows custom function queries / VSPs to be defined in a
  subset of JavaScript, pre-compiled, and that which can access the score and fields. It's powered by
  the Lucene Expressions module.

ValueSourceAugmenter:  score propagation

---------

Co-authored-by: Chris Hostetter <[email protected]>
Co-authored-by: David Smiley <[email protected]>
Co-authored-by: Ryan Ernst <[email protected]>

(cherry picked from commit c3b5f57)
dsmiley pushed a commit that referenced this pull request Jul 14, 2025
New ExpressionValueSourceParser that allows custom function queries / VSPs to be defined in a
  subset of JavaScript, pre-compiled, and that which can access the score and fields. It's powered by
  the Lucene Expressions module.

ValueSourceAugmenter:  score propagation

---------

Co-authored-by: Chris Hostetter <[email protected]>
Co-authored-by: David Smiley <[email protected]>
Co-authored-by: Ryan Ernst <[email protected]>

(cherry picked from commit c3b5f57)
@dsmiley
Copy link
Contributor

dsmiley commented Jul 14, 2025

Yep I'm confident in it. It's all merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cat:search client:solrj dependencies Dependency upgrades documentation Improvements or additions to documentation tests tool:build
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants