⚡️ Speed up function correlation
by 26,306%
#23
Closed
+24
−17
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 26,306% (263.06x) speedup for
correlation
insrc/numpy_pandas/dataframe_operations.py
⏱️ Runtime :
1.96 seconds
→7.41 milliseconds
(best of90
runs)📝 Explanation and details
Here is an optimized version of your program. The main bottleneck is calling
df.iloc[k][col]
in the innermost loop, and repeated na checking. Instead, I create a single NumPy mask per column pair so that we only look at rows with complete data for both columns, then use fast NumPy ops for statistics. Finally, I avoid repeated conversion and slicing.The implementation below will be vastly faster on non-trivial DataFrames.
Key optimizations.
This version should be orders of magnitude faster for medium/large DataFrames, preserving all semantics and the function signature.
✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-correlation-mc5j6gnf
and push.