Skip to content

Mitigate Android crypto OOM via GC memory pressure and Java GC nudges#126271

Open
steveisok wants to merge 3 commits intodotnet:mainfrom
steveisok:fix/android-crypto-gc-pressure
Open

Mitigate Android crypto OOM via GC memory pressure and Java GC nudges#126271
steveisok wants to merge 3 commits intodotnet:mainfrom
steveisok:fix/android-crypto-gc-pressure

Conversation

@steveisok
Copy link
Copy Markdown
Member

Add GC memory pressure tracking to Android crypto objects (DSA, EC, RSA, X509) so the .NET GC can see the true native cost of JNI-backed Java key and certificate objects. Extract the duplicated bookkeeping into a shared AndroidCryptoMemoryPressure struct.

In the native layer, periodically call System.gc() after crypto object destructions to give ART a chance to reclaim BoringSSL buffers whose JNI global refs have been released. Expose TriggerJavaGarbageCollection for managed callers and add an AndroidGarbageCollectAttribute that forces GC after each test on Android to prevent OOM in test runs.

Fixes #118603
Fixes #62547

Add GC memory pressure tracking to Android crypto objects (DSA, EC, RSA,
X509) so the .NET GC can see the true native cost of JNI-backed Java
key and certificate objects. Extract the duplicated bookkeeping into a
shared AndroidCryptoMemoryPressure struct.

In the native layer, periodically call System.gc() after crypto object
destructions to give ART a chance to reclaim BoringSSL buffers whose
JNI global refs have been released. Expose TriggerJavaGarbageCollection
for managed callers and add an AndroidGarbageCollectAttribute that
forces GC after each test on Android to prevent OOM in test runs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @bartonjs, @vcsjones, @dotnet/area-system-security
See info in area-owners.md if you want to be subscribed.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to reduce Android OOMs in crypto-heavy scenarios (notably test runs) by making .NET more aware of JNI-backed native memory and by proactively nudging the Java/ART GC after releasing crypto-related JNI global references.

Changes:

  • Add managed-side GC memory pressure tracking for Android crypto keys/certificates (RSA/DSA/EC/X509) via a shared AndroidCryptoMemoryPressure helper.
  • Add native-side periodic System.gc() nudges after crypto object destruction / JNI global-ref releases, plus a direct AndroidCryptoNative_TriggerJavaGarbageCollection export for managed callers.
  • Re-enable System.Security.Cryptography.Tests in Android test runs and add an Android-only xUnit attribute that forces GC after each test.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/native/libs/System.Security.Cryptography.Native.Android/pal_rsa.c Trigger Java GC nudges after RSA destruction.
src/native/libs/System.Security.Cryptography.Native.Android/pal_eckey.c Trigger Java GC nudges after EC key destruction.
src/native/libs/System.Security.Cryptography.Native.Android/pal_lifetime.c Trigger Java GC nudges after deleting JNI global refs.
src/native/libs/System.Security.Cryptography.Native.Android/pal_jni.h Add cached java/lang/System method IDs and declare MaybeTriggerJavaGC.
src/native/libs/System.Security.Cryptography.Native.Android/pal_jni.c Implement periodic System.gc() nudges using an atomic destruction counter; cache System.gc method.
src/native/libs/System.Security.Cryptography.Native.Android/pal_misc.h Declare exported native entry point to trigger Java GC from managed.
src/native/libs/System.Security.Cryptography.Native.Android/pal_misc.c Implement exported AndroidCryptoNative_TriggerJavaGarbageCollection.
src/libraries/tests.proj Re-enable running System.Security.Cryptography.Tests for Android configurations and add back to smoke tests.
src/libraries/System.Security.Cryptography/tests/System.Security.Cryptography.Tests.csproj Include new Android interop and the xUnit GC-after-test attribute when UseAndroidCrypto is enabled.
src/libraries/System.Security.Cryptography/tests/AndroidGarbageCollectAttribute.cs Add assembly-level xUnit hook to force GC/finalization + Java GC after each test on Android.
src/libraries/System.Security.Cryptography/src/System/Security/Cryptography/X509Certificates/AndroidCertificatePal.cs Add native memory pressure tracking for Android certificate PAL instances.
src/libraries/System.Security.Cryptography/src/System.Security.Cryptography.csproj Include shared AndroidCryptoMemoryPressure in the AndroidCrypto build.
src/libraries/Common/src/System/Security/Cryptography/AndroidCryptoMemoryPressure.cs Introduce shared helper to Add/Remove GC memory pressure for Android crypto objects.
src/libraries/Common/src/System/Security/Cryptography/RSAAndroid.cs Add native memory pressure tracking for RSA keys (including regeneration/import paths).
src/libraries/Common/src/System/Security/Cryptography/DSAAndroid.cs Add native memory pressure tracking for DSA keys.
src/libraries/Common/src/System/Security/Cryptography/ECAndroid.cs Add native memory pressure tracking for EC keys.
src/libraries/Common/src/Interop/Android/System.Security.Cryptography.Native.Android/Interop.GarbageCollection.cs Add managed P/Invoke declaration for triggering Java GC.

steveisok and others added 2 commits March 29, 2026 07:08
GC.AddMemoryPressure/RemoveMemoryPressure is intended for types that
rely solely on finalizers, not for types implementing the dispose
pattern. Since the Android crypto and X509 types all implement
IDisposable, the correct fix is to ensure Dispose is called rather
than adding pressure hints.

Retains the native Java GC nudges (MaybeTriggerJavaGC) and the test
AndroidGarbageCollectAttribute, which address a separate concern.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The Deformatter() and Formatter() tests created DSA instances without
using statements, leaking native key resources.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@steveisok
Copy link
Copy Markdown
Member Author

I ran an RSS profile on the test app while it was running. Memory spikes up to ~1.1GB but consistently recovers back down to ~200MB.

// destructions, call System.gc() to give ART a chance to reclaim native
// memory (BoringSSL buffers) backing Java key/certificate objects whose
// JNI global refs have already been deleted.
#define JAVA_GC_TRIGGER_INTERVAL 5
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like a very low number.

What would be the impact of these extra GCs on real world Android apps?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering the same thing.

I threw it up here as a starting point and we can increase the number or dial this back completely.

/// Forces a garbage collection after each test method on Android.
/// On Android, crypto objects wrap JNI global refs backed by Java/BoringSSL
/// native memory that the .NET GC cannot see. Forcing a collection after
/// each test ensures SafeHandles are finalized and JNI roots are removed
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming the tests are disposing the crypto objects properly, this should not be finalizing any.

Comment on lines +28 to +29
GC.Collect();
GC.WaitForPendingFinalizers();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
GC.Collect();
GC.WaitForPendingFinalizers();

This should not be needed if the tests are disposing the crypto objects properly.

// destructions, call System.gc() to give ART a chance to reclaim native
// memory (BoringSSL buffers) backing Java key/certificate objects whose
// JNI global refs have already been deleted.
#define JAVA_GC_TRIGGER_INTERVAL 5
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect that the Android GC is going to be smart enough to trigger often enough to avoid running into out of memory issues.

Are we sure that this change is not just trying to mask actual bugs? (memory leaks, missing Dispose calls, or bad tunning of .NET GC for mobile)

Copy link
Copy Markdown
Member Author

@steveisok steveisok Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is a legit issue that is a visibility gap between the two GC's. Dispose releases the JNI refs, but the Java GC defers the collection and doesn't know it needs to as they pile up.

ECDsaKeyFileTests has 61 tests that spike the memory to over 1GB in about 3 seconds. The emulators are tight and gets flooded before it has any time to act.

The question I have is the ref counting and Java GC nudging optimizing for an extreme scenario that only happens in our tests? If so, perhaps dialing the tests back is the right answer. I figured I'd start with the runtime change first so that we can see it for real and decide.

Copy link
Copy Markdown
Member

@jkotas jkotas Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From 10,000ft, these tests should be doing the following on the Java side:

  • allocate Java crypto object
  • create JNI ref
  • call methods on Java crypto object
  • release JNI ref

I assume that create JNI ref + release JNI ref does not impact scheduling of the Java GC. Is that correct?

We are left with Java crypto object + call methods on Java crypto object. It is surprising that the Java GC would not be able to handle something as simple as that well.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ECDsaKeyFileTests has 61 tests that spike the memory to over 1GB in about 3 seconds.

This is 16MB per test on average. Do we have any tests that pass large buffers back and forth? If we are allocating a lot of very large buffers on the Java side, I can imagine that the Java GC may have troubles keeping up with that.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a piling on problem. A bunch of small Java objects that allocate native BoringSSL memory that the Java GC is not smart enough to account for. It's either that or a bug in BoringSSL where it does a poor job of keeping the Java GC in the loop about its allocations.

Our tests allocate in rapid succession and boom, out of memory before the GC knows what hit it.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could also be a consequence of our JNI integration. We might be operating at a layer lower than "normal" and not getting the benefit of looping in the Java GC.

NativeAllocationRegistry would be helpful, but that is API level 26 and above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

3 participants