Feature Request
The current implementation of discover_network computes pairwise distances via a full distance matrix, which scales as O(N^2 T log T).
This becomes a major bottleneck for large datasets. I propose adding an optional kd_tree=True parameter to leverage KD-Tree–based neighbor searches for both kNN and geometric-kNN causation entropy algorithms.
Mathematical Motivation
-
kNN and geometric-kNN causation entropy rely on identifying nearest neighbors in phase space or within a radius r.
-
Using a brute-force distance matrix requires evaluating all pairwise distances even though only local neighborhoods are needed.
-
A KD-Tree (or Ball Tree) data structure can reduce nearest-neighbor queries to approximately O(N log N) in low-to-moderate dimensions.
This optimization is mathematically equivalent: the same neighborhood sets are found, but with much faster query times.
Proposed API Change
discover_network(data, method="knn", k=5, kd_tree=True, ...)
Benefits
Significant speedup for large N, especially when repeatedly computing neighborhoods during causation entropy estimation.
Reduced memory footprint (no need to store the full distance matrix).
Backward compatible: default behavior remains unchanged.
Next Steps
Implement KD-Tree option inside neighbor search routines for kNN and geometric-kNN.
Add unit tests ensuring identical results between kd_tree=True and kd_tree=False.
Benchmark runtime scaling on synthetic datasets.
Feature Request
The current implementation of discover_network computes pairwise distances via a full distance matrix, which scales as
O(N^2 T log T).This becomes a major bottleneck for large datasets. I propose adding an optional kd_tree=True parameter to leverage KD-Tree–based neighbor searches for both kNN and geometric-kNN causation entropy algorithms.
Mathematical Motivation
kNN and geometric-kNN causation entropy rely on identifying nearest neighbors in phase space or within a radius
r.Using a brute-force distance matrix requires evaluating all pairwise distances even though only local neighborhoods are needed.
A KD-Tree (or Ball Tree) data structure can reduce nearest-neighbor queries to approximately
O(N log N)in low-to-moderate dimensions.This optimization is mathematically equivalent: the same neighborhood sets are found, but with much faster query times.
Proposed API Change
discover_network(data, method="knn", k=5, kd_tree=True, ...)kd_tree=True: Use scipy.spatial.KDTree (or sklearn.neighbors.KDTree) for neighbor search.kd_tree=False: Fall back to the existing distance-matrix implementation.Benefits
Significant speedup for large
N, especially when repeatedly computing neighborhoods during causation entropy estimation.Reduced memory footprint (no need to store the full distance matrix).
Backward compatible: default behavior remains unchanged.
Next Steps
Implement KD-Tree option inside neighbor search routines for kNN and geometric-kNN.
Add unit tests ensuring identical results between
kd_tree=Trueandkd_tree=False.Benchmark runtime scaling on synthetic datasets.