Skip to content

Releases: instructkr/bb25

Release v0.3.0

06 Mar 22:52

Choose a tag to compare

bb25 Release Notes

🚀 Release Highlights

  • 7 New Rust Modules: Advanced fusion, calibration, learnable weights, attention-based weighting, metrics, and debugging—all with full Python bindings.
  • 10 Scoring Strategies: Support for BM25, Bayesian BM25, fitted, hybrid OR/AND, balanced log-odds fusion, gated (ReLU/Swish), learned weights, and attention.
  • Advanced Learning: Online and batch learning capabilities featuring momentum, Polyak averaging, gradient clipping, and learning rate decay.
  • Deep Explainability: A complete debugging pipeline to trace every stage of fusion computation and directly compare document rankings.

📦 New Modules

Probabilistic Signal Fusion (fusion.rs)

  • Log-space probability operators (prob_and, prob_or, prob_not).
  • cosine_to_probability mapping from [-1, 1] to (0, 1).
  • log_odds_conjunction with configurable alpha scaling.
  • balanced_log_odds_fusion for hybrid sparse-dense fusion via logit-space normalization and linear blending.
  • Gating enum options including NoGating, Relu, and Swish.

Bayesian Probability Calibration (probability.rs)

  • BayesianProbabilityTransform for sigmoid likelihood with composite priors.
  • Three distinct training modes: Balanced, PriorAware, and PriorFree.
  • Batch fit() and online update() methods via SGD.
  • wand_upper_bound() for safe pruning during WAND-style index traversal.

Signal Reliability Weights (learnable_weights.rs)

  • LearnableLogOddsWeights offering softmax-parameterized signal weights.
  • Hebbian gradient updates with advanced optimizer features for batch and online modes.

Query-Dependent Weighting (attention_weights.rs)

  • AttentionLogOddsWeights for learned attention over signals conditioned on query features.
  • Xavier-initialized weights with optional per-column min-max normalization.
  • Per-candidate fusion to assess query-dependent signal importance.

Calibration Quality Assessment (metrics.rs)

  • expected_calibration_error for weighted bin ECE.
  • brier_score for mean squared error between predictions and labels.
  • reliability_diagram for per-bin metrics.
  • calibration_report() for a single-call, formatted summary of all metrics.

Fusion Tracing & Explainability (debug.rs)

  • FusionDebugger for a full computation trace of the fusion pipeline.
  • Granular tracing methods: trace_bm25(), trace_vector(), trace_fusion(), and trace_document().
  • compare() functionality to identify dominant signals and crossover stages across documents.
  • Multiple formatting outputs (verbose trace, summary, comparison).

🛠 Enhancements & Bug Fixes

Core Scoring Improvements

  • Added a base rate prior to BayesianBM25Scorer for a two-step Bayes update with a corpus-level prior.
  • Refactored BayesianBM25Scorer::score() to use fusion::prob_or for multi-term aggregation.
  • Fix: Corrected the log-odds conjunction formula to match the paper specification.

Benchmark Runner Upgrades

  • Expanded to 10 distinct scorers.
  • Added --embedding-model option for live sentence-transformers encoding.
  • Integrated calibration diagnostics (ECE, Brier score) per scorer.
  • Performance Fix: Resolved the O(Q*D*Q) bottleneck in hybrid scoring, bringing it down to O(Q*D).

Experiments & API

  • Added 3 new experiments validating base rate prior effects, log-odds conjunction properties, and fusion primitives (bringing the total to 13).
  • Full PyO3 bindings mapped for all new types and module-level functions.
  • Fix: Constructors requiring optional arguments (like Corpus()) now correctly fall back to default parameters.

Release v0.2.0

16 Feb 06:44

Choose a tag to compare

Release v0.1.2

08 Feb 20:02

Choose a tag to compare

What's Changed

  • Replace product conjunction with log-odds conjunction by @jaepil in #1

New Contributors

  • @jaepil made their first contribution in #1

Full Changelog: https://github.com/instructkr/bb25/commits/v0.1.2