Skip to content

[BUG] Segfault after using cuvsRMMPoolMemoryResourceEnable/cuvsRMMMemoryResourceReset #1454

@ldematte

Description

@ldematte

Describe the bug

When using the C API cuvsRMMPoolMemoryResourceEnable/cuvsRMMMemoryResourceReset, cuvsBruteForceSearch fails with a SEGFAULT for a null pointer:

siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x0000000000000000

Here is the complete stack trace for the core dump.
Notice there is some JVM stuff in the way (catching and rethrowing segfault), but it should be pretty clear nonetheless that this happens when cuvsBruteForceSearch tries to allocate some memory (rmm::device_buffer::allocate_async) after cuvsRMMMemoryResourceReset was called.

Program terminated with signal SIGABRT, Aborted.
#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
44      ./nptl/pthread_kill.c: No such file or directory.
[Current thread is 1 (Thread 0x7fbdeabff6c0 (LWP 1338041))]
(gdb) bt
#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
#1  0x00007fbdec30cf4f in __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#2  0x00007fbdec2bdfb2 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3  0x00007fbdec2a8472 in __GI_abort () at ./stdlib/abort.c:79
#4  0x00007fbdeaea4179 in os::abort(bool, void const*, void const*) [clone .cold] () from /usr/lib/jvm/jdk-24.0.1/lib/server/libjvm.so
#5  0x00007fbdebbc3718 in VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned char*, void const*, void const*, char const*, int, unsigned long) ()
   from /usr/lib/jvm/jdk-24.0.1/lib/server/libjvm.so
#6  0x00007fbdebbc3eab in VMError::report_and_die(Thread*, unsigned int, unsigned char*, void const*, void const*, char const*, ...) () from /usr/lib/jvm/jdk-24.0.1/lib/server/libjvm.so
#7  0x00007fbdebbc3ece in VMError::report_and_die(Thread*, unsigned int, unsigned char*, void const*, void const*) () from /usr/lib/jvm/jdk-24.0.1/lib/server/libjvm.so
#8  0x00007fbdeba26f10 in JVM_handle_linux_signal () from /usr/lib/jvm/jdk-24.0.1/lib/server/libjvm.so
#9  <signal handler called>
#10 0x00007fbdc0794f14 in void* cuda::mr::__4::_Resource_vtable_builder::_Alloc_async<rmm::mr::device_memory_resource>(void*, unsigned long, unsigned long, cuda::__4::stream_ref) ()
   from /home/ldematte/miniconda3/envs/cuvs-25-12/lib/libcuvs_c.so
#11 0x00007fbdc16f8692 in rmm::device_buffer::allocate_async(unsigned long) () from /home/ldematte/miniconda3/envs/cuvs-25-12/lib/librmm.so
#12 0x00007fbdc16f86fa in rmm::device_buffer::device_buffer(unsigned long, rmm::cuda_stream_view, rmm::detail::cccl_async_resource_ref<cuda::mr::__4::basic_resource_ref<(cuda::mr::__4::_AllocType)1, cuda::mr::__4::device_accessi--Type <RET> for more, q to quit, c to continue without paging--                                                                                                                                                                    
ble> >) () from /home/ldematte/miniconda3/envs/cuvs-25-12/lib/librmm.so                                                                                                                                                             
#13 0x00007fbd65d5d39f in void cuvs::neighbors::detail::tiled_brute_force_knn<float, long, float, raft::identity_op>(raft::resources const&, float const*, float const*, unsigned long, unsigned long, unsigned long, unsigned long, float*, long*, cuvsDistanceType, float, unsigned long, unsigned long, float const*, float const*, unsigned int const*, raft::identity_op, cuvs::neighbors::filtering::FilterType) [clone .constprop.0] ()                          
   from /home/ldematte/miniconda3/envs/cuvs-25-12/lib/libcuvs.so
#14 0x00007fbd65dbc9c6 in void cuvs::neighbors::detail::brute_force_search_filtered<float, long, unsigned int, float>(raft::resources const&, cuvs::neighbors::brute_force::index<float, float> const&, std::experimental::mdspan<float const, std::experimental::extents<long, 18446744073709551615ul, 18446744073709551615ul>, std::experimental::layout_right, raft::host_device_accessor<std::experimental::default_accessor<float const>, (raft::memory_type)2> >, cuvs::neighbors::filtering::base_filter const*, std::experimental::mdspan<long, std::experimental::extents<long, 18446744073709551615ul, 18446744073709551615ul>, std::experimental::layout_right, raft::host_device_accessor<std::experimental::default_accessor<long>, (raft::memory_type)2> >, std::experimental::mdspan<float, std::experimental::extents<long, 18446744073709551615ul, 18446744073709551615ul>, std::experimental::layout_right, raft::host_device_accessor<std::experimental::default_accessor<float>, (raft::memory_type)2> >, std::optional<std::experimental::mdspan<float const, std::experimental::extents<long, 18446744073709551615ul>, std::experimental::layout_right, raft::host_device_accessor<std::experimental::default_accessor<float const>, (raft::memory_type)2> > >) () from /home/ldematte/miniconda3/envs/cuvs-25-12/lib/libcuvs.so                                                                  
#15 0x00007fbd65dbeb0f in void cuvs::neighbors::detail::search<float, long, float, std::experimental::layout_right>(raft::resources const&, cuvs::neighbors::brute_force::index<float, float> const&, std::experimental::mdspan<float const, std::experimental::extents<long, 18446744073709551615ul, 18446744073709551615ul>, std::experimental::layout_right, raft::host_device_accessor<std::experimental::default_accessor<float const>, (raft::memory_type)2> >, std::experimental::mdspan<long, std::experimental::extents<long, 18446744073709551615ul, 18446744073709551615ul>, std::experimental::layout_right, raft::host_device_accessor<std::experimental::default_accessor<long>, (raft::memory_type)2> >, std::experimental::mdspan<float, std::experimental::extents<long, 18446744073709551615ul, 18446744073709551615ul>, std::experimental::layout_right, raft::host_device_accessor<std::experimental::default_accessor<float>, (raft::memory_type)2> >, cuvs::neighbors::filtering::base_filter const&) () from /home/ldematte/miniconda3/envs/cuvs-25-12/lib/libcuvs.so                                                                                        
--Type <RET> for more, q to quit, c to continue without paging--
#16 0x00007fbdc07b2361 in cuvsBruteForceSearch::{lambda()#1}::operator()() const () from /home/ldematte/miniconda3/envs/cuvs-25-12/lib/libcuvs_c.so
#17 0x00007fbdc07b339c in cuvsBruteForceSearch () from /home/ldematte/miniconda3/envs/cuvs-25-12/lib/libcuvs_c.so

Steps/Code to reproduce bug

Checkout #1453 and run java tests (cd cuvs/java/cuvs-java && mvn clean verify)
I'll try to repro this with C code (cuvsRMMPoolMemoryResourceEnable + cuvsRMMMemoryResourceReset + cuvsBruteForceSearch) when/if I have time, but I might not be able to do it.

Environment details (please complete the following information):

  • Environment location: Bare-metal
  • Method of RAFT install: from source (main)

Additional context

If GH allows it, I can attach the Java hs_err file and/or the core dump

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions