Suggest replacing scipy.stats.norm.cdf with scipy.special.ndtr for performance improvement

https://github.com/pyjanitor-devs/pyjanitor/blob/c1a96e51547e5868f5fa3c0023261f9628dc95a3/janitor/math.py#L229
 Current Code:
`return pd.Series(scipy.stats.norm.cdf(s), index=s.index)`
Suggested Replacement:
```
from scipy.special import ndtr
return pd.Series(ndtr(s), index=s.index)
```
The current implementation uses scipy.stats.norm.cdf(s) to compute the cumulative distribution function (CDF) of a standard normal distribution. While this approach is correct and expressive, it introduces unnecessary overhead because norm.cdf constructs and evaluates a frozen distribution object under the hood.

In contrast, scipy.special.ndtr(s) provides a low-level, highly optimized C implementation that computes the standard normal CDF directly. It avoids object creation and internal delegation, making it significantly faster—especially when applied to large arrays or inside performance-critical loops.

Since both norm.cdf(s) and ndtr(s) produce numerically identical results for the standard normal distribution, replacing one with the other is a safe and efficient optimization.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Suggest replacing scipy.stats.norm.cdf with scipy.special.ndtr for performance improvement #1468

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Suggest replacing scipy.stats.norm.cdf with scipy.special.ndtr for performance improvement #1468

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions