📣 Optimize the knn and geometric-knn causation entropy network discovery algorithms by using a KD-Tree instead of a distance matrix

### Feature Request

The current implementation of discover_network computes pairwise distances via a full distance matrix, which scales as `O(N^2 T log T)`.

This becomes a major bottleneck for large datasets. I propose adding an optional kd_tree=True parameter to leverage KD-Tree–based neighbor searches for both kNN and geometric-kNN causation entropy algorithms.

#### Mathematical Motivation

* kNN and geometric-kNN causation entropy rely on identifying nearest neighbors in phase space or within a radius `r`.

* Using a brute-force distance matrix requires evaluating all pairwise distances  even though only local neighborhoods are needed.

* A KD-Tree (or Ball Tree) data structure can reduce nearest-neighbor queries to approximately `O(N log N)` in low-to-moderate dimensions.

This optimization is mathematically equivalent: the same neighborhood sets are found, but with much faster query times.

Proposed API Change

`discover_network(data, method="knn", k=5, kd_tree=True, ...)`


- `kd_tree=True`: Use scipy.spatial.KDTree (or sklearn.neighbors.KDTree) for neighbor search.

- `kd_tree=False`: Fall back to the existing distance-matrix implementation.

#### Benefits

Significant speedup for large `N`, especially when repeatedly computing neighborhoods during causation entropy estimation.

Reduced memory footprint (no need to store the full distance matrix).

Backward compatible: default behavior remains unchanged.

#### Next Steps

Implement KD-Tree option inside neighbor search routines for kNN and geometric-kNN.

Add unit tests ensuring identical results between `kd_tree=True` and `kd_tree=False`.

Benchmark runtime scaling on synthetic datasets.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

📣 Optimize the knn and geometric-knn causation entropy network discovery algorithms by using a KD-Tree instead of a distance matrix #6

Feature Request

Mathematical Motivation

Benefits

Next Steps

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

📣 Optimize the knn and geometric-knn causation entropy network discovery algorithms by using a KD-Tree instead of a distance matrix #6

Description

Feature Request

Mathematical Motivation

Benefits

Next Steps

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions