Skip to content

v7labs/diversity-metric

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Chart Diversity Scorer (VOL Focus)

A standalone tool to compute diversity scores for collections of chart images using DINOv2 embeddings, with primary focus on the VOL (Volume) metric.

What is VOL?

VOL (Volume) is a diversity metric that measures how well a collection of images spans the embedding space. It's computed as the geometric mean of eigenvalues of the Gram matrix (similarity matrix) of L2-normalized embeddings.

Interpretation

  • Range: [0, 1]
  • Higher VOL = more diverse/spread out images
  • VOL = 1 indicates maximum diversity (orthogonal embeddings)
  • VOL β†’ 0 indicates low diversity (similar/duplicate images)

Why VOL?

VOL is particularly useful for:

  • Detecting duplicate or near-duplicate images
  • Measuring dataset diversity
  • Evaluating synthetic data generation quality
  • Comparing different chart collections

Features

  • πŸš€ Uses state-of-the-art DINOv2 vision transformer
  • πŸ“Š Focuses on VOL metric with supporting metrics
  • πŸ–ΌοΈ Supports multiple image formats (PNG, JPG, SVG, etc.)
  • ⚑ GPU-accelerated (with CPU fallback)
  • πŸ’Ύ Optional saving of embeddings and scores
  • πŸ“ˆ Eigenvalue statistics for deeper analysis

Installation

Requirements

  • Python 3.9 or higher
  • CUDA-capable GPU (optional)
  • uv - Fast Python package installer

Quick Setup

Option 1: Automated Setup (Recommended)

bash setup.sh

This script will:

  • Check Python version
  • Install uv if needed
  • Install all dependencies
  • Verify the installation

Option 2: Manual Setup

  1. Install uv (if not already installed):
# Via curl (Linux/macOS)
curl -LsSf https://astral.sh/uv/install.sh | sh

# Via Homebrew (macOS)
brew install uv

# Via pip (any platform)
pip install uv
  1. Install dependencies:

With virtual environment (recommended):

uv venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
uv pip install -e .

Or system-wide:

uv pip install --system -e .

Why uv? It's 10-100x faster than pip and provides better dependency resolution

Usage

Basic Usage

python compute_diversity.py /path/to/your/charts

Save Results to File

python compute_diversity.py /path/to/your/charts --save-scores

Save Both Scores and Embeddings

python compute_diversity.py /path/to/your/charts \
    --save-scores \
    --save-embeddings \
    --output-dir results

Use CPU Only (No GPU)

python compute_diversity.py /path/to/your/charts --cpu

Adjust Batch Size

# Larger batch size for more GPU memory
python compute_diversity.py /path/to/your/charts --batch-size 32

# Smaller batch size for less GPU memory
python compute_diversity.py /path/to/your/charts --batch-size 8

Command-Line Options

positional arguments:
  input_dir             Folder with chart images (supports PNG, JPG, SVG, etc.)

optional arguments:
  -h, --help            Show help message and exit
  --output-dir DIR      Output folder for results (default: diversity_out)
  --batch-size N        Batch size for processing (default: 16)
  --cpu                 Force CPU usage (disable GPU)
  --save-embeddings     Save embeddings to file
  --save-scores         Save scores to file

Output

Console Output

The script prints comprehensive results including:

  • Primary Metric: VOL score
  • Eigenvalue Statistics: Mean, std, min, max
  • Supporting Metrics: MPD, RAD, Q10, ENT

Example output:

============================================================
🎯 DIVERSITY SCORES (VOL FOCUS)
============================================================
Number of images: 50
Embedding dimension: 768
------------------------------------------------------------

πŸ”· PRIMARY METRIC:
  VOL (Volume):                   0.234567

  Interpretation:
  β€’ VOL represents the 'volume' of the convex hull in embedding space
  β€’ Higher VOL = more diverse/spread out images
  β€’ Range: [0, 1], where 1 = maximum diversity
  β€’ Computed as geometric mean of eigenvalues
------------------------------------------------------------

πŸ“ˆ EIGENVALUE STATISTICS:
  Mean:    1.234567
  Std:     0.123456
  Min:     0.012345
  Max:     2.345678
------------------------------------------------------------

πŸ“ SUPPORTING METRICS:
  MPD (Mean Pairwise Distance):   0.456789
  RAD (Minimum Distance):         0.012345
  Q10 (10th Percentile):          0.123456
  ENT (Normalized Entropy):       0.789012
============================================================

File Output

When using --save-scores, a detailed text file is saved with:

  • All metrics and scores
  • Eigenvalue statistics
  • VOL interpretation guide

When using --save-embeddings, a NumPy array file (.npy) is saved containing the DINOv2 embeddings for all images.

Supported Image Formats

  • PNG (.png)
  • JPEG (.jpg, .jpeg)
  • WebP (.webp)
  • BMP (.bmp)
  • TIFF (.tif, .tiff)
  • SVG (.svg) - requires cairosvg

Understanding the Metrics

Primary Metric

  • VOL (Volume): Geometric mean of eigenvalues of the Gram matrix. Measures the "volume" spanned by embeddings in high-dimensional space.

Supporting Metrics

  • MPD (Mean Pairwise Distance): Average distance between all pairs of images
  • RAD (Radius): Minimum pairwise distance (detects duplicates)
  • Q10 (10th Percentile): 10th percentile of pairwise distances (robust to outliers)
  • ENT (Normalized Entropy): Entropy of eigenvalue distribution (normalized by log(n))

Eigenvalue Statistics

  • Mean: Average eigenvalue (indicates overall embedding spread)
  • Std: Standard deviation (indicates variability in dimensions)
  • Min: Smallest eigenvalue (detects collapsed dimensions)
  • Max: Largest eigenvalue (detects dominant directions)

Technical Details

DINOv2 Model

This tool uses DINOv2 ViT-Base-Patch14 (vit_base_patch14_dinov2), a self-supervised vision transformer trained on diverse image data. It produces 768-dimensional embeddings that capture semantic visual features.

Why DINOv2?

  • State-of-the-art for visual similarity
  • No task-specific fine-tuning needed
  • Robust to image variations
  • Excellent for chart/diagram understanding

Volume Computation

1. Compute similarity matrix: S = E @ E^T (where E are normalized embeddings)
2. Compute eigenvalues: λ₁, Ξ»β‚‚, ..., Ξ»β‚™ = eig(S)
3. VOL = (∏ λᡒ)^(1/n) = geometric mean of eigenvalues

Performance Tips

For Large Datasets

  1. Use GPU: Ensure PyTorch is installed with CUDA support
  2. Increase batch size: Use --batch-size 32 or higher if GPU memory allows
  3. Monitor memory: Reduce batch size if you encounter OOM errors

For Small Datasets

  1. Use CPU: Add --cpu flag if GPU overhead is not worth it (< 100 images)
  2. Smaller batch size: Use --batch-size 8 to reduce memory usage

Expected Processing Times

  • 100 images: ~30 seconds (GPU) / ~2 minutes (CPU)
  • 1000 images: ~5 minutes (GPU) / ~20 minutes (CPU)
  • 10000 images: ~45 minutes (GPU) / ~3 hours (CPU)

Times are approximate and depend on hardware

Troubleshooting

"CUDA out of memory" Error

Reduce batch size:

python compute_diversity.py /path/to/charts --batch-size 4

"No images found" Error

Ensure your folder contains supported image formats and check file permissions.

SVG Rendering Issues

If SVG files fail to load:

# On macOS
brew install cairo

# On Ubuntu/Debian
sudo apt-get install libcairo2-dev

# Then reinstall cairosvg
uv pip install --upgrade cairosvg

This tool is provided as-is for research and evaluation purposes.

About

Calculate images diversity metric

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published