Skip to content

SOLR-5707: Lucene Expressions in Solr #1244

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Jul 13, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 5 additions & 4 deletions gradle/documentation/pull-lucene-javadocs.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -41,11 +41,12 @@ configure(project(":solr:documentation")) {
// - For now this list is focused solely on the javadocs needed for ref-guide link validation.
// - If/when additional links are added from the ref-guide to additional lucene modules not listed here,
// they can be added.
// - If/when we need the lucene javadocs for "all" lucene depdencies in Solr (ie: to do link checking
// from all Solr javadocs?) then perhaps we can find a way to build this list programatically?
// - If these javadocs are (only every) consumed by the ref guide only, then these deps & associated tasks
// - If/when we need the lucene javadocs for "all" lucene dependencies in Solr (ie: to do link checking
// from all Solr javadocs?) then perhaps we can find a way to build this list programmatically?
// - If these javadocs are only consumed by the ref guide, then these deps & associated tasks
// should just be moved to the ref-guide build.gradle
javadocs variantOf(libs.apache.lucene.core) { classifier 'javadoc' }
javadocs variantOf(libs.apache.lucene.expressions) { classifier 'javadoc' }
javadocs variantOf(libs.apache.lucene.analysis.common) { classifier 'javadoc' }
javadocs variantOf(libs.apache.lucene.analysis.stempel) { classifier 'javadoc' }
javadocs variantOf(libs.apache.lucene.queryparser) { classifier 'javadoc' }
Expand All @@ -65,7 +66,7 @@ configure(project(":solr:documentation")) {
def resolved = configurations.javadocs.resolvedConfiguration
resolved.resolvedArtifacts.each { artifact ->
def id = artifact.moduleVersion.id
// This mimics the directory stucture used on lucene.apache.org for the javadocs of all modules.
// This mimics the directory structure used on lucene.apache.org for the javadocs of all modules.
//
// HACK: the lucene.apache.org javadocs are organized to match the module directory structure in the repo,
// not the "flat" artifact names -- so there is no one size fits all way to determine the directory name.
Expand Down
12 changes: 8 additions & 4 deletions solr/CHANGES.txt
Original file line number Diff line number Diff line change
Expand Up @@ -223,10 +223,6 @@ Other Changes
================== 9.9.0 ==================
New Features
---------------------
* SOLR-17582: The CLUSTERSTATUS API will now stream each collection's status to the response,
fetching and computing it on the fly. To avoid a backwards compatibility concern, this won't work
for wt=javabin. (Matthew Biscocho, David Smiley)

* SOLR-17626: Add RawTFSimilarityFactory class. (Christine Poerschke)

* SOLR-17656: New 'skipLeaderRecovery' replica property allows PULL replicas with existing indexes to immediately become ACTIVE (hossman)
Expand All @@ -243,6 +239,10 @@ New Features

* SOLR-17749: Added linear function support for RankField via RankQParserPlugin. (Christine Poerschke)

* SOLR-5707: New ExpressionValueSourceParser that allows custom function queries / VSPs to be defined in a
subset of JavaScript, pre-compiled, and that which can access the score and fields. It's powered by
the Lucene Expressions module. (hossman, David Smiley, Ryan Ernst, Kevin Risden)

Improvements
---------------------
* SOLR-15751: The v2 API now has parity with the v1 "COLSTATUS" and "segments" APIs, which can be used to fetch detailed information about
Expand Down Expand Up @@ -291,6 +291,10 @@ Improvements

Optimizations
---------------------
* SOLR-17582: The CLUSTERSTATUS API will now stream each collection's status to the response,
fetching and computing it on the fly. To avoid a backwards compatibility concern, this won't work
for wt=javabin. (Matthew Biscocho, David Smiley)

* SOLR-17578: Remove ZkController internal core supplier, for slightly faster reconnection after Zookeeper session loss. (Pierre Salagnac)

* SOLR-17669: Reduced memory usage in SolrJ getBeans() method when handling dynamic fields with wildcards. (Martin Anzinger)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,11 @@
import java.util.Map;
import org.apache.lucene.index.LeafReaderContext;
import org.apache.lucene.index.ReaderUtil;
import org.apache.lucene.internal.hppc.IntFloatHashMap;
import org.apache.lucene.internal.hppc.IntObjectHashMap;
import org.apache.lucene.queries.function.FunctionValues;
import org.apache.lucene.queries.function.ValueSource;
import org.apache.lucene.search.Scorable;
import org.apache.solr.common.SolrDocument;
import org.apache.solr.common.SolrException;
import org.apache.solr.response.ResultContext;
Expand Down Expand Up @@ -68,20 +70,44 @@ public void setContext(ResultContext context) {
fcontext = ValueSource.newContext(searcher);
this.valueSource.createWeight(fcontext, searcher);
final var docList = context.getDocList();
if (docList == null) {
final int prefetchSize = docList == null ? 0 : Math.min(docList.size(), maxPrefetchSize);
if (prefetchSize == 0) {
return;
}

final int prefetchSize = Math.min(docList.size(), maxPrefetchSize);
// Check if scores are wanted and initialize the Scorable if so
final MutableScorable scorable; // stored in fcontext (when not null)
final IntFloatHashMap docToScoreMap;
if (context.wantsScores()) { // TODO switch to ValueSource.needsScores once it exists
docToScoreMap = new IntFloatHashMap(prefetchSize);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes me wish for something like

DocList sortedDocList = docList
    .subset(docList.offset(), prefetchSize)
    .sortedByDocId();

and then wrap the sortedDocList.iterator() as a Scorable. This could remove the need for a custom mapping between docids and scores in ValueSourceAugmenter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sorting & mapping has to happen somewhere. It's inelegant. Is your point to add a convenience API/method? If we do this in multiple places, then that'd make sense, but not otherwise.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looked to me that the mapping between docIds and scores is already established in the DocList (so why do the mapping again?), and only the sorting piece is missing. That's why I was thinking of a way to handle just the sorting.

But I agree - if this is the only place, then I'm OK with the logic as long as it works.

scorable =
new MutableScorable() {
@Override
public float score() throws IOException {
return docToScoreMap.get(docBase + localDocId);
}
};
fcontext.put("scorer", scorable);
} else {
scorable = null;
docToScoreMap = null;
}

// Get the IDs and scores
final int[] ids = new int[prefetchSize];
int i = 0;
var iter = docList.iterator();
while (iter.hasNext() && i < prefetchSize) {
ids[i++] = iter.nextDoc();
ids[i] = iter.nextDoc();
if (docToScoreMap != null) {
docToScoreMap.put(ids[i], iter.score());
}
i++;
}
Arrays.sort(ids);
cachedValuesById = new IntObjectHashMap<>(ids.length);

// Get the values in docId order. Store in cachedValuesById
cachedValuesById = new IntObjectHashMap<>(ids.length);
FunctionValues values = null;
int docBase = -1;
int nextDocBase = 0; // i.e. this segment's maxDoc
Expand All @@ -95,9 +121,16 @@ public void setContext(ResultContext context) {
}

int localId = docid - docBase;
var value = values.objectVal(localId);

if (scorable != null) {
scorable.docBase = docBase;
scorable.localDocId = localId;
}
var value = values.objectVal(localId); // note: might use the Scorable

cachedValuesById.put(docid, value != null ? value : NULL_SENTINEL);
}
fcontext.remove("scorer"); // remove ours; it was there only for prefetching
} catch (IOException e) {
throw new SolrException(
SolrException.ErrorCode.SERVER_ERROR, "exception for valuesource " + valueSource, e);
Expand All @@ -119,8 +152,13 @@ public void transform(SolrDocument doc, int docid, DocIterationInfo docIteration
try {
int idx = ReaderUtil.subIndex(docid, readerContexts);
LeafReaderContext rcontext = readerContexts.get(idx);
FunctionValues values = valueSource.getValues(fcontext, rcontext);
int localId = docid - rcontext.docBase;

if (context.wantsScores()) {
fcontext.put("scorer", new ScoreAndDoc(localId, docIterationInfo.score()));
}

FunctionValues values = valueSource.getValues(fcontext, rcontext);
setValue(doc, values.objectVal(localId));
} catch (IOException e) {
throw new SolrException(
Expand All @@ -131,6 +169,17 @@ public void transform(SolrDocument doc, int docid, DocIterationInfo docIteration
}
}

private abstract static class MutableScorable extends Scorable {

int docBase;
int localDocId;

@Override
public int docID() {
return localDocId;
}
}

/** Always returns true */
@Override
public boolean needsSolrIndexSearcher() {
Expand All @@ -142,4 +191,25 @@ protected void setValue(SolrDocument doc, Object val) {
doc.setField(name, val);
}
}

/** Fake scorer for a single document */
protected static class ScoreAndDoc extends Scorable {
final int docid;
final float score;

ScoreAndDoc(int docid, float score) {
this.docid = docid;
this.score = score;
}

@Override
public int docID() {
return docid;
}

@Override
public float score() throws IOException {
return score;
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,155 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.solr.search;

import static org.apache.solr.common.SolrException.ErrorCode.SERVER_ERROR;

import java.text.ParseException;
import java.util.ArrayList;
import java.util.List;
import java.util.Objects;
import java.util.Optional;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import org.apache.lucene.expressions.Bindings;
import org.apache.lucene.expressions.Expression;
import org.apache.lucene.expressions.js.JavascriptCompiler;
import org.apache.lucene.queries.function.ValueSource;
import org.apache.lucene.search.DoubleValuesSource;
import org.apache.solr.common.SolrException;
import org.apache.solr.common.util.NamedList;
import org.apache.solr.schema.IndexSchema;
import org.apache.solr.schema.SchemaField;

/**
* A ValueSource parser configured with a pre-compiled expression that can then be evaluated at
* request time. It's powered by the Lucene Expressions module, which is a subset of JavaScript.
*/
public class ExpressionValueSourceParser extends ValueSourceParser {

public static final String SCORE_KEY = "score-name"; // TODO get rid of this? Why have it?
public static final String EXPRESSION_KEY = "expression";

private Expression expression;
private String scoreKey;
private int numPositionalArgs = 0; // Number of positional arguments in the expression

@Override
public void init(NamedList<?> args) {
initConfiguredExpression(args);
initScoreKey(args);
super.init(args);
}

/** Checks for optional scoreKey override */
private void initScoreKey(NamedList<?> args) {
scoreKey = Optional.ofNullable((String) args.remove(SCORE_KEY)).orElse(SolrReturnFields.SCORE);
}

/** Parses the pre-configured expression */
private void initConfiguredExpression(NamedList<?> args) {
String expressionStr =
Optional.ofNullable((String) args.remove(EXPRESSION_KEY))
.orElseThrow(
() ->
new SolrException(
SERVER_ERROR, EXPRESSION_KEY + " must be configured with an expression"));

// Find the highest positional argument in the expression
Pattern pattern = Pattern.compile("\\$(\\d+)");
Matcher matcher = pattern.matcher(expressionStr);
while (matcher.find()) {
int argNum = Integer.parseInt(matcher.group(1));
numPositionalArgs = Math.max(numPositionalArgs, argNum);
}

// TODO add way to register additional functions
try {
this.expression = JavascriptCompiler.compile(expressionStr);
} catch (ParseException e) {
throw new SolrException(
SERVER_ERROR, "Unable to parse javascript expression: " + expressionStr, e);
}
}

// TODO: support dynamic expressions: expr("foo * bar / 32") ??

@Override
public ValueSource parse(FunctionQParser fp) throws SyntaxError {
assert null != fp;

// Parse positional arguments if any
List<DoubleValuesSource> positionalArgs = new ArrayList<>();
for (int i = 0; i < numPositionalArgs; i++) {
ValueSource vs = fp.parseValueSource();
positionalArgs.add(vs.asDoubleValuesSource());
}

IndexSchema schema = fp.getReq().getSchema();
SolrBindings b = new SolrBindings(scoreKey, schema, positionalArgs);
return ValueSource.fromDoubleValuesSource(expression.getDoubleValuesSource(b));
}

/**
* A bindings class that uses schema fields to resolve variables.
*
* @lucene.internal
*/
public static class SolrBindings extends Bindings {
private final String scoreKey;
private final IndexSchema schema;
private final List<DoubleValuesSource> positionalArgs;

/**
* @param scoreKey The binding name that should be used to represent the score, may be null
* @param schema IndexSchema for field bindings
* @param positionalArgs List of positional arguments
*/
public SolrBindings(
String scoreKey, IndexSchema schema, List<DoubleValuesSource> positionalArgs) {
this.scoreKey = scoreKey;
this.schema = schema;
this.positionalArgs = positionalArgs != null ? positionalArgs : new ArrayList<>();
}

@Override
public DoubleValuesSource getDoubleValuesSource(String key) {
assert null != key;

if (Objects.equals(scoreKey, key)) {
return DoubleValuesSource.SCORES;
}

// Check for positional arguments like $1, $2, etc.
if (key.startsWith("$")) {
try {
int position = Integer.parseInt(key.substring(1));
return positionalArgs.get(position - 1); // Convert to 0-based index
} catch (RuntimeException e) {
throw new IllegalArgumentException("Not a valid positional argument: " + key, e);
}
}

SchemaField field = schema.getFieldOrNull(key);
if (null != field) {
return field.getType().getValueSource(field, null).asDoubleValuesSource();
}

throw new IllegalArgumentException("No binding or schema field for key: " + key);
}
}
}
Loading
Loading