Skip to content

Monotonic Optimal Binning algorithm is a statistical approach to transform continuous variables into optimal and monotonic categorical variables.

License

Notifications You must be signed in to change notification settings

ChenTaHung/Monotonic-Optimal-Binning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Monotonic-Optimal-Binning

MOBPY - Monotonic Optimal Binning for Python

Run Tests Python 3.9+ License: MIT PyPI version

A fast, deterministic Python library for creating monotonic optimal bins with respect to a target variable. MOBPY implements a stack-based Pool-Adjacent-Violators Algorithm (PAVA) followed by constrained adjacent merging, ensuring strict monotonicity and statistical robustness.

🎯 Key Features

  • ⚑ Fast & Deterministic: Stack-based PAVA with O(n) complexity, followed by O(k) adjacent merges
  • πŸ“Š Monotonic Guarantee: Ensures strict monotonicity (increasing/decreasing) between bins and target
  • πŸ”§ Flexible Constraints: Min/max samples, min positives, min/max bins with automatic resolution
  • πŸ“ˆ WoE & IV Calculation: Automatic Weight of Evidence and Information Value for binary targets
  • 🎨 Rich Visualizations: Comprehensive plotting functions for PAVA process and binning results
  • ♾️ Safe Edges: First bin starts at -∞, last bin ends at +∞ for complete coverage

πŸ“¦ Installation

pip install MOBPY

For development installation:

git clone https://github.com/ChenTaHung/Monotonic-Optimal-Binning.git
cd Monotonic-Optimal-Binning
pip install -e .

πŸš€ Quick Start

import pandas as pd
import numpy as np
from MOBPY import MonotonicBinner, BinningConstraints
from MOBPY.plot import plot_bin_statistics, plot_pava_comparison
import matplotlib.pyplot as plt

df = pd.read_csv('/Users/chentahung/Desktop/git/mob-py/data/german_data_credit_cat.csv')
# Convert default to 0/1 (original is 1/2)
df['default'] = df['default'] - 1

# Configure constraints
constraints = BinningConstraints(
    min_bins=4,           # Minimum number of bins
    max_bins=6,           # Maximum number of bins
    min_samples=0.05,     # Each bin needs at least 5% of total samples
    min_positives=0.01    # Each bin needs at least 1% of total positive samples
)

# Create and fit the binner
binner = MonotonicBinner(
    df=df,
    x='Durationinmonth',
    y='default',
    constraints=constraints
)
binner.fit()

# Get binning results
bins = binner.bins_()        # Bin boundaries
summary = binner.summary_()  # Detailed statistics with WoE/IV
display(summary)

Output:

    bucket	    count	count_pct	sum	    mean	    std	        min	 max	woe	        iv
0	(-inf, 9)	94	    9.4	        10.0	0.106383	0.309980	0.0	 1.0	1.241870	0.106307
1	[9, 16)	    337	    33.7	    79.0	0.234421	0.424267	0.0	 1.0	0.335632	0.035238
2	[16, 45)	499	    49.9	    171.0	0.342685	0.475084	0.0	 1.0	-0.193553	0.019342
3	[45, +inf)	70	    7.0	4       0.0	    0.571429	0.498445	0.0	 1.0	-1.127082	0.102180

πŸ“Š Visualization

MOBPY provides comprehensive visualization of binning results:

# Generate comprehensive binning analysis plot
fig = plot_bin_statistics(binner)
plt.show()

Binning Analysis

The plot_bin_statistics function creates a multi-panel visualization showing:

  • Top Left: Weight of Evidence (WoE) bars for each bin
  • Top Right: Event rate trend with sample distribution
  • Bottom Left: Sample distribution histogram
  • Bottom Right: Target distribution boxplots per bin

πŸ”¬ Understanding the Algorithm

MOBPY uses a two-stage approach:

Stage 1: PAVA (Pool-Adjacent-Violators Algorithm)

Creates initial monotonic blocks by pooling adjacent violators:

from MOBPY.plot import plot_pava_comparison

# Visualize PAVA process
fig = plot_pava_comparison(binner)
plt.show()

Pava Comparison

Stage 2: Constrained Merging

Merges adjacent blocks to satisfy constraints while preserving monotonicity:

# Check initial PAVA blocks vs final bins
print(f"PAVA blocks: {len(binner.pava_blocks_())}")
print(f"Final bins: {len(binner.bins_())}")

> PAVA blocks: 10
> Final bins: 4

πŸŽ›οΈ Advanced Configuration

Custom Constraints

# Fractional constraints (adaptive to data size)
constraints = BinningConstraints(
    max_bins=8,
    min_samples=0.05,     # 5% of total samples
    max_samples=0.30,     # 30% of total samples
    min_positives=0.01    # 1% of positive samples
)

# Absolute constraints (fixed values)
constraints = BinningConstraints(
    max_bins=5,
    min_samples=100,      # At least 100 samples per bin
    max_samples=500       # At most 500 samples per bin
)

Handling Special Values

# Exclude special codes from binning
age_binner = MonotonicBinner(
    df=df,
    x='Age',
    y='default',
    constraints= constraints,
    exclude_values=[-999, -1, 0]  # Treat as separate bins
).fit()

Transform New Data

new_data = pd.DataFrame({'age': [25, 45, 65]})

# Get bin assignments
bins = age_binner.transform(new_data['age'], assign='interval')
print(bins)
# Output:
# 0    (-inf, 26)
# 1      [35, 75)
# 2      [35, 75)
# Name: age, dtype: object

# Get WoE values for scoring
print(age_binner.transform(new_data['age'], assign='woe'))
# Output:
# 0   -0.526748
# 1    0.306015
# 2    0.306015

πŸ“ˆ Use Cases

MOBPY is ideal for:

  • Credit Risk Modeling: Create monotonic risk score bins for regulatory compliance
  • Insurance Pricing: Develop age/risk factor bands with clear premium progression
  • Customer Segmentation: Build ordered customer value tiers
  • Feature Engineering: Generate interpretable binned features for ML models
  • Regulatory Reporting: Ensure transparent, monotonic relationships in models

πŸ“š Documentation

πŸ§ͺ Testing

# Run unit tests
pytest -vv -ignore-userwarnings -q

πŸ“– Reference

πŸ‘₯ Authors

  1. Ta-Hung (Denny) Chen

  2. Yu-Cheng (Darren) Tsai

  3. Peter Chen

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.