-
Notifications
You must be signed in to change notification settings - Fork 34
fix(simd): umasked AVX2 load #239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@copilot summarize the changes in this PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes a critical SIMD memory safety bug in AVX2 masked load operations that could cause segmentation faults when reading beyond allocated memory boundaries. The fix ensures masked loads respect buffer boundaries in ragged-epilogue scenarios.
Key Changes:
- Fixed AVX2 masked load implementation to prevent out-of-bounds reads
- Added comprehensive ASan-detected regression tests for distance computations
- Enhanced CI with dedicated AddressSanitizer build configuration
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/svs/lib/avx_detection.cpp | Added test for runtime AVX flag patching mechanism |
| tests/svs/core/distance.cpp | Added ASan regression tests for distance computation with ragged epilogues |
| tests/CMakeLists.txt | Updated Catch2 to v3.11.0 and improved test discovery with tag-based labels |
| .github/workflows/build-linux.yml | Added clang++-18 ASan build configuration with leak detection disabled |
| tests/svs/index/vamana/multi.cpp | Tagged test as long-running to exclude from ASan builds |
| tests/svs/index/vamana/index.cpp | Tagged test as long-running to exclude from ASan builds |
| tests/svs/index/inverted/memory_based.cpp | Tagged test as long-running to exclude from ASan builds |
| tests/svs/index/inverted/clustering.cpp | Tagged test as long-running to exclude from ASan builds |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This reverts commit f856a96.
Co-authored-by: Copilot <[email protected]>
ibhati
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new workflow is added in the existing workflows, please make sure the earlier flow and options did not change.
| } // namespace | ||
|
|
||
| CATCH_TEST_CASE("Random Clustering - End to End", "[inverted][random_clustering]") { | ||
| CATCH_TEST_CASE("Random Clustering - End to End", "[long][inverted][random_clustering]") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this [long]? Why is this required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, had a comment that I didn't post. I added a new tag [long] that marks long-running tests. They are skipped in the debug asan build.
ScalableVectorSearch/.github/workflows/build-linux.yml
Lines 63 to 64 in 63e58cd
| # skip longer-running tests | |
| ctest_args: "-LE long" |
ethanglaser
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we know how long the ASan run takes if completing successfully?
@ethanglaser about 10 minutes total, < 5 mins of which is testing. |
@ibhati the only change is that the existing steps are executing three more tests (with negligible runtime). |
The previous code always performed a full width load on the provided data. In ragged-epilogue scenarios, where we request a masked load, this resulted in SEGV errors in certain runs with address sanitizer.
Why wasn't this caught sooner?
The OS only triggers a segmentation fault if a read accesses an unmapped memory page. Since memory protection (typically) operates at a 4KB page granularity, reading past the end of a buffer is "safe" from the OS's perspective unless the overflow happens to cross exactly into an unmapped page.
Why is ASan catching it sporadically?
Since our underlying object storage is
std::vector, ASan detection requires two specific conditions to align:size()must equal itscapacity(). If there is spare capacity, the unsafe load simply reads valid (though uninitialized) memory owned by the vector.