Fix IndexError in HBOS with n_bins='auto' when test data exceeds training range #644

MohammadMdv · 2025-10-14T13:29:05Z

Summary

This PR fixes an IndexError that occurs in the HBOS algorithm when using n_bins='auto' and test data contains values outside the training data range.

Fixes: #643

Problem

When using HBOS with automatic bin selection (n_bins='auto'), the model crashes with an IndexError during prediction if test data contains values that exceed the training data range for any feature.

Error Traceback

IndexError: index 147 is out of bounds for axis 0 with size 147
  File "pyod/models/hbos.py", line 274, in _calculate_outlier_scores_auto
    outlier_scores[j, i] = out_score_i[bin_inds[j] - 1]

Minimal Reproduction

from pyod.models.hbos import HBOS
import numpy as np

# Training data: range [0, 10]
X_train = np.random.randn(100, 5) * 2 + 5
X_train = np.clip(X_train, 0, 10)

model = HBOS(n_bins='auto')
model.fit(X_train)

# Test data with value exceeding training range
X_test = np.array([[5, 5, 15, 5, 5]])  # Feature 2 = 15 > 10

predictions = model.predict(X_test)  # ❌ IndexError!

Root Cause

The _calculate_outlier_scores_auto function was recalculating the optimal number of bins on the test data (using get_optimal_n_bins(X[:, i])), while using histograms and bin edges computed from the training data.

When a test value exceeds the training range:

np.digitize returns an index equal to len(bin_edges[i]) (i.e., n_bins_train + 1)
The boundary check bin_inds[j] == optimal_n_bins + 1 fails because optimal_n_bins (from test data) ≠ n_bins_train
Code falls through to: outlier_scores[j, i] = out_score_i[bin_inds[j] - 1]
This attempts to access out_score_i[n_bins_train] which is out of bounds

Solution

Changed line 233 in _calculate_outlier_scores_auto to use the training histogram size:

# Before:
optimal_n_bins = get_optimal_n_bins(X[:, i])  # ❌ Recalculates on test data

# After:
optimal_n_bins = hist[i].shape[0]  # ✅ Uses training histogram size

This ensures consistency between the bin edges (from training) and the bin count used for boundary checks.

Changes

Modified Files

pyod/models/hbos.py: Fixed _calculate_outlier_scores_auto function (1 line)
pyod/test/test_hbos_auto_bins_fix.py: Added comprehensive test suite (new file)

Diff

@@ -230,7 +230,8 @@ def _calculate_outlier_scores_auto(X, bin_edges, hist, alpha,
         # Add a regularizer for preventing overflow
         out_score_i = np.log2(hist[i] + alpha)
 
-        optimal_n_bins = get_optimal_n_bins(X[:, i])
+        # Use the number of bins determined during fit (training)
+        optimal_n_bins = hist[i].shape[0]
 
         for j in range(n_samples):

Testing

Added comprehensive test suite (test_hbos_auto_bins_fix.py) that verifies:

✅ Test data with values outside training range
✅ All test values above training range
✅ All test values below training range
✅ Mixed in-range and out-of-range values
✅ Consistency with static bins behavior

Test Results

✓ ALL TESTS PASSED!
The fix correctly handles out-of-range test values.

Impact

Benefits

✅ Fixes crash when test data exceeds training range
✅ Maintains correct outlier detection behavior
✅ Slight performance improvement (removes redundant get_optimal_n_bins call)
✅ Aligns behavior with static bin version (_calculate_outlier_scores)

Backward Compatibility

✅ No API changes
✅ No breaking changes to existing functionality
✅ Only fixes buggy edge case
✅ Test data within training range behaves identically

Checklist

Bug fix (non-breaking change which fixes an issue)
Code follows the project's style guidelines
Added comprehensive tests demonstrating the bug and validating the fix
All new and existing tests pass locally
Documentation comment added (inline comment)
Relates to issue IndexError in HBOS with n_bins='auto' when test data exceeds training range #643

Additional Context

This is a critical fix for production use cases where test/production data naturally contains values outside the training distribution - a common scenario in anomaly detection where anomalies often have extreme values. The static bin version (n_bins=<int>) handles this correctly, but the auto version was crashing.

…ning range - Fixed bug where test values outside training range caused IndexError - Changed _calculate_outlier_scores_auto to use training histogram size instead of recalculating optimal_n_bins on test data - Added comprehensive test suite to verify the fix - Fixes issue yzhao062#643

yzhao062 · 2025-10-14T17:41:21Z

This is great -- can you resubmit to the development branch? thank you

coveralls · 2025-10-14T17:56:10Z

Pull Request Test Coverage Report for Build 18498171972

Details

85 of 120 (70.83%) changed or added relevant lines in 1 file are covered.
No unchanged relevant lines lost coverage.
Overall coverage decreased (-0.3%) to 95.093%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
pyod/test/test_hbos_auto_bins_fix.py	85	120	70.83%

Totals
Change from base Build 15575914138:	-0.3%
Covered Lines:	10446
Relevant Lines:	10985

💛 - Coveralls

MohammadMdv · 2025-10-15T05:50:24Z

This is great -- can you resubmit to the development branch? thank you

Ofcourse

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix IndexError in HBOS with n_bins='auto' when test data exceeds training range #644

Fix IndexError in HBOS with n_bins='auto' when test data exceeds training range #644

Uh oh!

MohammadMdv commented Oct 14, 2025

Uh oh!

yzhao062 commented Oct 14, 2025

Uh oh!

coveralls commented Oct 14, 2025 •

edited

Loading

Uh oh!

MohammadMdv commented Oct 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Fix IndexError in HBOS with n_bins='auto' when test data exceeds training range #644

Are you sure you want to change the base?

Fix IndexError in HBOS with n_bins='auto' when test data exceeds training range #644

Uh oh!

Conversation

MohammadMdv commented Oct 14, 2025

Summary

Problem

Error Traceback

Minimal Reproduction

Root Cause

Solution

Changes

Modified Files

Diff

Testing

Test Results

Impact

Benefits

Backward Compatibility

Checklist

Additional Context

Uh oh!

yzhao062 commented Oct 14, 2025

Uh oh!

coveralls commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Test Coverage Report for Build 18498171972

Details

💛 - Coveralls

Uh oh!

MohammadMdv commented Oct 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

coveralls commented Oct 14, 2025 •

edited

Loading