Skip to content

Conversation

@AuditAIH
Copy link

Draft: Alternative Normalization Scheme for Reranking

This submission presents a draft of a normalization scheme more suitable for reranking tasks.

Notably, this implementation replaces the original Min-Max normalization (which maps scores to the 0~1.0 range) with sigmoid normalization. The sigmoid function (1 / (1 + e^(-x))) is applied to each score to achieve the 0-1 range mapping.

Important note: This change will have a significant impact on API numerical values that currently return results in the (0,1) range. Therefore, this draft is not recommended for merging at this stage.

Pull Request Checklist

Thank you for your contribution! Before submitting your PR, please make sure you have completed the following checks:

Compatibility Check

  • I have checked whether this change affects the backward compatibility of the plugin declared in README.md
  • I have checked whether this change affects the forward compatibility of the plugin declared in README.md
  • If this change introduces a breaking change, I have discussed it with the project maintainer and specified the release version in the README.md
  • I have described the compatibility impact and the corresponding version number in the PR description
  • I have checked whether the plugin version is updated in the README.md

Available Checks

  • Code has passed local tests
  • Relevant documentation has been updated (if necessary)

# Draft: Alternative Normalization Scheme for Reranking

This submission presents a draft of a normalization scheme more suitable for reranking tasks. 

Notably, this implementation replaces the original Min-Max normalization (which maps scores to the 0~1.0 range) with sigmoid normalization. The sigmoid function (1 / (1 + e^(-x))) is applied to each score to achieve the 0-1 range mapping.

Important note: This change will have a significant impact on API numerical values that currently return results in the (0,1) range. Therefore, this draft is not recommended for merging at this stage.
@AuditAIH
Copy link
Author

AuditAIH commented Sep 29, 2025

Analysis of the Two Normalization Methods on Different Input Ranges

Original Min-Max Normalization:

  • Formula: (x - min) / (max - min)
  • Data in (0,1) range: Remains unchanged, as it is already within the 0-1 range.
  • Data in (-10,10) range: Will be mapped to (0,1); for example, -10→0, 0→0.5, 10→1.

New Sigmoid Normalization:

  • Formula: 1 / (1 + e^(-x))
  • Data in (0,1) range: Will be compressed to approximately the (0.5, 0.73) range.
  • Data in (-10,10) range: Will be mapped to approximately (0.000045, 0.999955), but most values will cluster around 0.5.

Difference Analysis:

Min-Max Normalization:

  • Preserves the relative distance relationships of the original data.
  • Performs a linear transformation, so the data distribution remains unchanged.
  • Is sensitive to outliers.

Sigmoid Normalization:

  • Uses a non-linear transformation that compresses the data range.
  • Maps all values to (0,1) but without a linear relationship.
  • Exhibits a saturation effect for large and small values.

For your scenario (where llama.cpp returns raw scores):
If the range of raw scores is large, Sigmoid normalization may be better than Min-Max—this is because it can handle input of any range, whereas Min-Max requires knowing the exact minimum and maximum values.

If you want to preserve the relative distance relationships of the original data, Min-Max normalization is preferable; if the range of raw scores is uncertain or there are outliers, Sigmoid normalization is more stable.
image

@Yeuoly
Copy link
Collaborator

Yeuoly commented Oct 1, 2025

It's a breaking change I believe, which may effect lots of plugins depend on it, before we merge it, I'd like to have a list declares the range, let's me know your thoughts on it.

@AuditAIH
Copy link
Author

@Yeuoly
So, my idea is that in the future, Reranker models and embedding models will be uniformly provided by dify. This will include a unified URL, unified configurations for whether a key is required, and whether normalization is needed. The returned parameters will also integrate large models from various providers, and by default or through customization, they will only return data such as scores, texts, and the like.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants