Releases: instructkr/bb25
Releases · instructkr/bb25
Release v0.3.0
bb25 Release Notes
🚀 Release Highlights
- 7 New Rust Modules: Advanced fusion, calibration, learnable weights, attention-based weighting, metrics, and debugging—all with full Python bindings.
- 10 Scoring Strategies: Support for BM25, Bayesian BM25, fitted, hybrid OR/AND, balanced log-odds fusion, gated (ReLU/Swish), learned weights, and attention.
- Advanced Learning: Online and batch learning capabilities featuring momentum, Polyak averaging, gradient clipping, and learning rate decay.
- Deep Explainability: A complete debugging pipeline to trace every stage of fusion computation and directly compare document rankings.
📦 New Modules
Probabilistic Signal Fusion (fusion.rs)
- Log-space probability operators (
prob_and,prob_or,prob_not). cosine_to_probabilitymapping from [-1, 1] to (0, 1).log_odds_conjunctionwith configurable alpha scaling.balanced_log_odds_fusionfor hybrid sparse-dense fusion via logit-space normalization and linear blending.- Gating enum options including
NoGating,Relu, andSwish.
Bayesian Probability Calibration (probability.rs)
BayesianProbabilityTransformfor sigmoid likelihood with composite priors.- Three distinct training modes:
Balanced,PriorAware, andPriorFree. - Batch
fit()and onlineupdate()methods via SGD. wand_upper_bound()for safe pruning during WAND-style index traversal.
Signal Reliability Weights (learnable_weights.rs)
LearnableLogOddsWeightsoffering softmax-parameterized signal weights.- Hebbian gradient updates with advanced optimizer features for batch and online modes.
Query-Dependent Weighting (attention_weights.rs)
AttentionLogOddsWeightsfor learned attention over signals conditioned on query features.- Xavier-initialized weights with optional per-column min-max normalization.
- Per-candidate fusion to assess query-dependent signal importance.
Calibration Quality Assessment (metrics.rs)
expected_calibration_errorfor weighted bin ECE.brier_scorefor mean squared error between predictions and labels.reliability_diagramfor per-bin metrics.calibration_report()for a single-call, formatted summary of all metrics.
Fusion Tracing & Explainability (debug.rs)
FusionDebuggerfor a full computation trace of the fusion pipeline.- Granular tracing methods:
trace_bm25(),trace_vector(),trace_fusion(), andtrace_document(). compare()functionality to identify dominant signals and crossover stages across documents.- Multiple formatting outputs (verbose trace, summary, comparison).
🛠 Enhancements & Bug Fixes
Core Scoring Improvements
- Added a base rate prior to
BayesianBM25Scorerfor a two-step Bayes update with a corpus-level prior. - Refactored
BayesianBM25Scorer::score()to usefusion::prob_orfor multi-term aggregation. - Fix: Corrected the log-odds conjunction formula to match the paper specification.
Benchmark Runner Upgrades
- Expanded to 10 distinct scorers.
- Added
--embedding-modeloption for live sentence-transformers encoding. - Integrated calibration diagnostics (ECE, Brier score) per scorer.
- Performance Fix: Resolved the O(Q*D*Q) bottleneck in hybrid scoring, bringing it down to O(Q*D).
Experiments & API
- Added 3 new experiments validating base rate prior effects, log-odds conjunction properties, and fusion primitives (bringing the total to 13).
- Full PyO3 bindings mapped for all new types and module-level functions.
- Fix: Constructors requiring optional arguments (like
Corpus()) now correctly fall back to default parameters.
Release v0.2.0
Full Changelog: v0.1.2...v0.2.0