Skip to content

Add unit test for fix of plugin EP library handle leak in PR 28396#28430

Open
adrianlizarraga wants to merge 15 commits into
mainfrom
adrianl/EpLoadRefCountLeak_UnitTest
Open

Add unit test for fix of plugin EP library handle leak in PR 28396#28430
adrianlizarraga wants to merge 15 commits into
mainfrom
adrianl/EpLoadRefCountLeak_UnitTest

Conversation

@adrianlizarraga
Copy link
Copy Markdown
Contributor

@adrianlizarraga adrianlizarraga commented May 9, 2026

Description

Add a unit test that verifies RegisterExecutionProviderLibrary / UnregisterExecutionProviderLibrary does not leak the library handle (regression test for #28396).

ProviderLibrary::Load() loads the EP library and probes for the GetProvider symbol. Most plugin EP libraries don't export it, so the probe fails. Before #28396, Load() returned the error without calling Unload(), leaking a refcount.

Test approach

The test copies the EP library to a temporary directory with a unique filename, ensuring it has never been loaded in the process. After register + unregister, it checks that the library is fully unloaded (refcount == 0).

adrianlizarraga and others added 2 commits May 8, 2026 17:13
Make the library handle leak test cross-platform (Windows and Linux)
by extracting IsLibraryLoaded/ReleaseLibraryOnce helpers that use
GetModuleHandleW/FreeLibrary on Windows and dlopen(RTLD_NOLOAD)/dlclose
on Linux.

Replace GTEST_SKIP with a drain loop that releases any leaked refcounts
from prior tests, ensuring the test always runs regardless of execution
order.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a regression unit test in the AutoEP test suite to ensure registering/unregistering a plugin EP library does not leak a dynamic library handle (specifically when probing for GetProvider fails).

Changes:

  • Add platform-specific helpers to detect whether a shared library is loaded.
  • Add a new gtest that registers/unregisters the example plugin EP library and asserts the library is fully unloaded afterward.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread onnxruntime/test/autoep/test_registration.cc Outdated
Comment thread onnxruntime/test/autoep/test_registration.cc Outdated
adrianlizarraga and others added 5 commits May 11, 2026 10:38
Remove ReleaseLibraryOnce and the unbounded drain loop. Instead,
capture the library loaded state before register and assert it
returns to that baseline after unregister. This avoids the double
dlclose UB risk on Linux and the potential infinite loop.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace the baseline comparison with GTEST_SKIP when the library is
already loaded by a prior test. The boolean IsLibraryLoaded cannot
detect refcount leaks when the library is already mapped (it cannot
distinguish refcount N from N+1). The skip is honest about this
limitation and still catches the regression when the test runs first.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Move RegisterUnregisterDoesNotLeakLibraryHandle to a dedicated test
binary (onnxruntime_autoep_handle_leak_test). This guarantees the
plugin EP library starts with refcount 0, eliminating the need for
GTEST_SKIP when other tests have already loaded the library.

The new binary reuses the shared test_main.cc for main()/ort_env
and links test_autoep_utils.cc for library path computation.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

Comment thread cmake/onnxruntime_unittests.cmake Outdated
The file(GLOB) picks up all *.cc in the autoep directory, so
test_handle_leak.cc was being compiled into both the main
onnxruntime_autoep_test binary and the isolated
onnxruntime_autoep_handle_leak_test binary. The ASSERT_FALSE at
test start would trip in the main binary since other tests load
the same library first.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

The separate test binary (onnxruntime_autoep_handle_leak_test) caused the
ARM64 Linux Debug CI to run out of memory during linking. Revert to using
GTEST_SKIP when the library is already loaded by another test in the same
binary. Add a TODO comment documenting the separate-binary approach for
when CI resources allow it.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@adrianlizarraga adrianlizarraga requested a review from Copilot May 11, 2026 21:07
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 6 comments.

Comment thread onnxruntime/test/autoep/test_handle_leak.cc
Comment thread onnxruntime/test/autoep/test_handle_leak.cc
Comment thread onnxruntime/test/autoep/test_handle_leak.cc Outdated
Comment thread onnxruntime/test/autoep/test_handle_leak.cc Outdated
Comment thread onnxruntime/test/autoep/test_handle_leak.cc Outdated
Comment thread onnxruntime/test/autoep/test_handle_leak.cc
- Add explicit <memory> and <string> includes instead of relying on
  transitive includes.
- Guard RTLD_NOLOAD with #ifdef for portability on platforms that may
  not define it; returns true (triggering GTEST_SKIP) when unavailable.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
adrianlizarraga and others added 2 commits May 11, 2026 16:20
Instead of returning a misleading 'true' when RTLD_NOLOAD is unavailable,
return std::nullopt to explicitly indicate the platform cannot determine
loaded-library state. The test skips with a clear message in that case.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

Comment thread onnxruntime/test/autoep/test_handle_leak.cc Outdated
adrianlizarraga and others added 3 commits May 13, 2026 11:12
…dence

Copy the EP library to a temporary directory with a unique filename before
testing register/unregister. This guarantees the library starts with refcount 0
regardless of test execution order, eliminating the GTEST_SKIP that could mask
the very regression this test is designed to catch.

- Use existing TemporaryDirectory (RAII) for cross-platform cleanup
- Use std::filesystem::copy_file to create uniquely-named library copy
- Remove the loaded-before GTEST_SKIP since the copy is always fresh
- Keep GTEST_SKIP only for platforms without RTLD_NOLOAD support

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.

@adrianlizarraga adrianlizarraga changed the title Add unit test for PR 28396 Add unit test for fix of plugin EP library handle leak in PR 28396 May 13, 2026
@adrianlizarraga adrianlizarraga marked this pull request as ready for review May 13, 2026 20:15
ASSERT_TRUE(IsLibraryLoaded(temp_library_path).value_or(false)) << "Library should be loaded after registration.";

// Unregister releases the EpLibraryPlugin's reference.
ort_env->UnregisterExecutionProviderLibrary(registration_name.c_str());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as this is updating ort_env, should we ensure that it gets called when leaving the scope? e.g., with gsl::finally

return false;
#else
// RTLD_NOLOAD is not available on this platform; cannot probe without loading.
(void)library_path;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(void)library_path;
static_cast<void>(library_path);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants