Skip to content

Conversation

@benfred
Copy link
Member

@benfred benfred commented Nov 5, 2025

Add an optional filter argument to cagra::merge, so that we can compact the index to remove deleted items when merging.

Also remove the 'LOGICAL_MERGE' strategy from the cagra::merge api, since this is only supported with the composite merge api. (The cagra::merge function ignore this value if set, and only does a logical merge).

Add an optional filter argument to cagra::merge, so that we can compact
the index to remove deleted items when merging
@copy-pr-bot
Copy link

copy-pr-bot bot commented Nov 5, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@cjnolet cjnolet added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Nov 19, 2025
cagra::index_params output_index_params;

/// Strategy for merging. Defaults to `MergeStrategy::MERGE_STRATEGY_PHYSICAL`.
cuvs::neighbors::MergeStrategy merge_strategy =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just remove the merge params altogether? I don't know if we need them anymore. Maybe we do- that's why I'm asking. Might help to think forward to when we have a SMART merge option (maye you've thought of this already). But maybe we don't need a "merge strategy" for that? Maybe we just have a flag or parameter for "rebuild" vs "combine"?

reinterpret_cast<uintptr_t>(_merge<int8_t>(res, *params, indices, num_indices));
reinterpret_cast<uintptr_t>(_merge<int8_t>(res, *params, indices, num_indices, filter));
} else if (dtype.code == kDLUInt && dtype.bits == 8) {
output_index->addr =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah- I was trying to figure out how you were going to support allowing a user to pass in the "index" to populate in the C layer while only have the option to return a newly created index in the C++ layer. Not opposed to this solution at all. Just good that we're not doing a copy!

auto out_layout = raft::make_strided_layout(filtered_dataset.view().extents(),
std::array<int64_t, 2>{stride, 1});

merged_index.update_dataset(handle, owning_t{std::move(filtered_dataset), out_layout});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really need to prioritize having the indexes all accept "Dataset" instead of mdspans. This is going to get out of control :-(

template <typename T, typename IdxT>
struct index;
struct merge_params;
struct index_params;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These include files have gotten super lightweight now that they are only including prototypes and declarations. I'm not a huge fan of all of the CAGRA headers we're creating. Can we just consolidate this into the main header?

Also- for some reason, we have a header for cagra_optimize.hpp and I think this was a misunderstanding. Would you be able to copy that into the main cagra.hpp in this PR? It should be a straightforward copy/paste... just trying to keep the number of headers manageable. The whole benefit to no longer being header only is that the headers are super super lightweight.

0.006,
min_recall));

/* TODO: eval_distances doesn't work, potentially because of id translation mismatch
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw, This needs either fixed (by adding id translations) - or removed before this PR should be merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality non-breaking Introduces a non-breaking change

Projects

Development

Successfully merging this pull request may close these issues.

2 participants