Skip to content

Clarifying TF-IDF / ClassicSimilarity scoring changes across Lucene versions (impact seen during Solr 5 → 8 upgrade) #15547

@parveensaini

Description

@parveensaini

While upgrading a production system from Solr 5.5.4 (Lucene 5.x) to Solr 8.9.x (Lucene 8.x), we noticed consistent ranking differences for the same queries and data when using TF-IDF–based scoring.

This came up during a Solr JIRA discussion (SOLR-17757), where Solr maintainers mentioned that the behavior is intentional and comes from Lucene rather than Solr.

I’m opening this issue mainly to confirm this at the Lucene level and to ask whether this change is already documented somewhere, or if it would make sense to call it out more explicitly for users doing major version upgrades.

I’m not suggesting reverting or changing the behavior — just trying to make sure expectations are clear, since ranking shifts during upgrades can be surprising in real deployments.

The differences are visible via explain output and were reproducible with the same dataset and queries. Details and code snippets are linked in the Solr issue:
https://issues.apache.org/jira/browse/SOLR-17757

If helpful, I can also put together a small Lucene-level reproducer.

Thanks for the guidance.

— Parveen Saini

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions