Draft: Alternative Normalization Scheme for Reranking Update rerank.py #214

AuditAIH · 2025-09-29T08:36:41Z

Draft: Alternative Normalization Scheme for Reranking

This submission presents a draft of a normalization scheme more suitable for reranking tasks.

Notably, this implementation replaces the original Min-Max normalization (which maps scores to the 0~1.0 range) with sigmoid normalization. The sigmoid function (1 / (1 + e^(-x))) is applied to each score to achieve the 0-1 range mapping.

Important note: This change will have a significant impact on API numerical values that currently return results in the (0,1) range. Therefore, this draft is not recommended for merging at this stage.

Pull Request Checklist

Thank you for your contribution! Before submitting your PR, please make sure you have completed the following checks:

Compatibility Check

I have checked whether this change affects the backward compatibility of the plugin declared in README.md
I have checked whether this change affects the forward compatibility of the plugin declared in README.md
If this change introduces a breaking change, I have discussed it with the project maintainer and specified the release version in the README.md
I have described the compatibility impact and the corresponding version number in the PR description
I have checked whether the plugin version is updated in the README.md

Available Checks

Code has passed local tests
Relevant documentation has been updated (if necessary)

# Draft: Alternative Normalization Scheme for Reranking This submission presents a draft of a normalization scheme more suitable for reranking tasks. Notably, this implementation replaces the original Min-Max normalization (which maps scores to the 0~1.0 range) with sigmoid normalization. The sigmoid function (1 / (1 + e^(-x))) is applied to each score to achieve the 0-1 range mapping. Important note: This change will have a significant impact on API numerical values that currently return results in the (0,1) range. Therefore, this draft is not recommended for merging at this stage.

AuditAIH · 2025-09-29T08:40:08Z

Analysis of the Two Normalization Methods on Different Input Ranges

Original Min-Max Normalization:

Formula: (x - min) / (max - min)
Data in (0,1) range: Remains unchanged, as it is already within the 0-1 range.
Data in (-10,10) range: Will be mapped to (0,1); for example, -10→0, 0→0.5, 10→1.

New Sigmoid Normalization:

Formula: 1 / (1 + e^(-x))
Data in (0,1) range: Will be compressed to approximately the (0.5, 0.73) range.
Data in (-10,10) range: Will be mapped to approximately (0.000045, 0.999955), but most values will cluster around 0.5.

Difference Analysis:

Min-Max Normalization:

Preserves the relative distance relationships of the original data.
Performs a linear transformation, so the data distribution remains unchanged.
Is sensitive to outliers.

Sigmoid Normalization:

Uses a non-linear transformation that compresses the data range.
Maps all values to (0,1) but without a linear relationship.
Exhibits a saturation effect for large and small values.

For your scenario (where llama.cpp returns raw scores):
If the range of raw scores is large, Sigmoid normalization may be better than Min-Max—this is because it can handle input of any range, whereas Min-Max requires knowing the exact minimum and maximum values.

If you want to preserve the relative distance relationships of the original data, Min-Max normalization is preferable; if the range of raw scores is uncertain or there are outliers, Sigmoid normalization is more stable.

Yeuoly · 2025-10-01T17:13:13Z

It's a breaking change I believe, which may effect lots of plugins depend on it, before we merge it, I'd like to have a list declares the range, let's me know your thoughts on it.

AuditAIH · 2025-10-17T06:45:22Z

@Yeuoly
So, my idea is that in the future, Reranker models and embedding models will be uniformly provided by dify. This will include a unified URL, unified configurations for whether a key is required, and whether normalization is needed. The returned parameters will also integrate large models from various providers, and by default or through customization, they will only return data such as scores, texts, and the like.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Draft: Alternative Normalization Scheme for Reranking Update rerank.py #214

Draft: Alternative Normalization Scheme for Reranking Update rerank.py #214

Uh oh!

AuditAIH commented Sep 29, 2025

Uh oh!

AuditAIH commented Sep 29, 2025 •

edited

Loading

Uh oh!

Yeuoly commented Oct 1, 2025

Uh oh!

AuditAIH commented Oct 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Draft: Alternative Normalization Scheme for Reranking Update rerank.py #214

Are you sure you want to change the base?

Draft: Alternative Normalization Scheme for Reranking Update rerank.py #214

Uh oh!

Conversation

AuditAIH commented Sep 29, 2025

Draft: Alternative Normalization Scheme for Reranking

Pull Request Checklist

Compatibility Check

Available Checks

Uh oh!

AuditAIH commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Analysis of the Two Normalization Methods on Different Input Ranges

Original Min-Max Normalization:

New Sigmoid Normalization:

Difference Analysis:

Min-Max Normalization:

Sigmoid Normalization:

Uh oh!

Yeuoly commented Oct 1, 2025

Uh oh!

AuditAIH commented Oct 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AuditAIH commented Sep 29, 2025 •

edited

Loading