-
Notifications
You must be signed in to change notification settings - Fork 143
Add filter for cagra::merge #1496
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Add an optional filter argument to cagra::merge, so that we can compact the index to remove deleted items when merging
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
| cagra::index_params output_index_params; | ||
|
|
||
| /// Strategy for merging. Defaults to `MergeStrategy::MERGE_STRATEGY_PHYSICAL`. | ||
| cuvs::neighbors::MergeStrategy merge_strategy = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we just remove the merge params altogether? I don't know if we need them anymore. Maybe we do- that's why I'm asking. Might help to think forward to when we have a SMART merge option (maye you've thought of this already). But maybe we don't need a "merge strategy" for that? Maybe we just have a flag or parameter for "rebuild" vs "combine"?
| reinterpret_cast<uintptr_t>(_merge<int8_t>(res, *params, indices, num_indices)); | ||
| reinterpret_cast<uintptr_t>(_merge<int8_t>(res, *params, indices, num_indices, filter)); | ||
| } else if (dtype.code == kDLUInt && dtype.bits == 8) { | ||
| output_index->addr = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah- I was trying to figure out how you were going to support allowing a user to pass in the "index" to populate in the C layer while only have the option to return a newly created index in the C++ layer. Not opposed to this solution at all. Just good that we're not doing a copy!
| auto out_layout = raft::make_strided_layout(filtered_dataset.view().extents(), | ||
| std::array<int64_t, 2>{stride, 1}); | ||
|
|
||
| merged_index.update_dataset(handle, owning_t{std::move(filtered_dataset), out_layout}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We really need to prioritize having the indexes all accept "Dataset" instead of mdspans. This is going to get out of control :-(
| template <typename T, typename IdxT> | ||
| struct index; | ||
| struct merge_params; | ||
| struct index_params; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These include files have gotten super lightweight now that they are only including prototypes and declarations. I'm not a huge fan of all of the CAGRA headers we're creating. Can we just consolidate this into the main header?
Also- for some reason, we have a header for cagra_optimize.hpp and I think this was a misunderstanding. Would you be able to copy that into the main cagra.hpp in this PR? It should be a straightforward copy/paste... just trying to keep the number of headers manageable. The whole benefit to no longer being header only is that the headers are super super lightweight.
| 0.006, | ||
| min_recall)); | ||
|
|
||
| /* TODO: eval_distances doesn't work, potentially because of id translation mismatch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fwiw, This needs either fixed (by adding id translations) - or removed before this PR should be merged
Add an optional filter argument to cagra::merge, so that we can compact the index to remove deleted items when merging.
Also remove the 'LOGICAL_MERGE' strategy from the
cagra::mergeapi, since this is only supported with the composite merge api. (Thecagra::mergefunction ignore this value if set, and only does a logical merge).