⚡️ Speed up function pivot_table
by 4,131%
#35
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 4,131% (41.31x) speedup for
pivot_table
insrc/numpy_pandas/dataframe_operations.py
⏱️ Runtime :
210 milliseconds
→4.96 milliseconds
(best of99
runs)📝 Explanation and details
Here is an optimized rewrite of your code.
The main performance bottleneck is the use of
df.iloc[i]
in a per-row loop, which is extremely slow in pandas, especially for large DataFrames.Instead, we will extract relevant columns as numpy arrays (or pandas Series), then iterate in a vectorized, cache-friendly way without repeated DataFrame lookups or allocations.
Also, the aggregations can be done efficiently using dictionaries and only looping over the minimal data necessary.
The aggregation helper functions are unchanged.
No change to function signature or output format.
Optimized version
Key changes & speedups:
for i in range(len(df)): row = df.iloc[i] ...
with a direct zip of numpy arrays from the selected columns.setdefault
to reduce the number of dictionary lookups and lines.This change will typically speed up the function by 10x-50x+ on large DataFrames.
No external dependencies are added. Uses only pandas and numpy, both are already installed.
All the comments from your original code that described distinct sections remain applicable.
✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-pivot_table-mc9s59u1
and push.