Fix WeightedMinHash input mutation#317
Conversation
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #317 +/- ##
=========================================
Coverage ? 66.97%
=========================================
Files ? 21
Lines ? 2616
Branches ? 0
=========================================
Hits ? 1752
Misses ? 864
Partials ? 0
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Code Review
This pull request ensures that the input array v in the minhash method of WeightedMinHashGenerator is copied when it is already of type float32, preventing unintended mutation of the original input. A corresponding test case has been added to verify this behavior. I have no feedback to provide.
Summary
Fixes a mutation bug in
WeightedMinHashGenerator.minhash()where passing anp.float32input array with zero values caused the caller-owned array to be modified in place.The method replaces zero entries with
NaNinternally before taking logs. For existingfloat32numpy arrays, the code reused the caller’s array directly, so zeros in the original input becameNaNafter callingminhash().Fix
float32numpy inputs before applying the internal zero-to-NaNtransformation.minhash()does not mutatefloat32input arrays.Tests
python -m pytest -o addopts= test/test_weighted_minhash.py
python -m pytest -o addopts= test/test_minhash.py test/test_hyperloglog.py test/test_lshforest.py