Mitigate Android crypto OOM via GC memory pressure and Java GC nudges#126271
Mitigate Android crypto OOM via GC memory pressure and Java GC nudges#126271steveisok wants to merge 3 commits intodotnet:mainfrom
Conversation
Add GC memory pressure tracking to Android crypto objects (DSA, EC, RSA, X509) so the .NET GC can see the true native cost of JNI-backed Java key and certificate objects. Extract the duplicated bookkeeping into a shared AndroidCryptoMemoryPressure struct. In the native layer, periodically call System.gc() after crypto object destructions to give ART a chance to reclaim BoringSSL buffers whose JNI global refs have been released. Expose TriggerJavaGarbageCollection for managed callers and add an AndroidGarbageCollectAttribute that forces GC after each test on Android to prevent OOM in test runs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Tagging subscribers to this area: @bartonjs, @vcsjones, @dotnet/area-system-security |
There was a problem hiding this comment.
Pull request overview
This PR aims to reduce Android OOMs in crypto-heavy scenarios (notably test runs) by making .NET more aware of JNI-backed native memory and by proactively nudging the Java/ART GC after releasing crypto-related JNI global references.
Changes:
- Add managed-side GC memory pressure tracking for Android crypto keys/certificates (RSA/DSA/EC/X509) via a shared
AndroidCryptoMemoryPressurehelper. - Add native-side periodic
System.gc()nudges after crypto object destruction / JNI global-ref releases, plus a directAndroidCryptoNative_TriggerJavaGarbageCollectionexport for managed callers. - Re-enable
System.Security.Cryptography.Testsin Android test runs and add an Android-only xUnit attribute that forces GC after each test.
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| src/native/libs/System.Security.Cryptography.Native.Android/pal_rsa.c | Trigger Java GC nudges after RSA destruction. |
| src/native/libs/System.Security.Cryptography.Native.Android/pal_eckey.c | Trigger Java GC nudges after EC key destruction. |
| src/native/libs/System.Security.Cryptography.Native.Android/pal_lifetime.c | Trigger Java GC nudges after deleting JNI global refs. |
| src/native/libs/System.Security.Cryptography.Native.Android/pal_jni.h | Add cached java/lang/System method IDs and declare MaybeTriggerJavaGC. |
| src/native/libs/System.Security.Cryptography.Native.Android/pal_jni.c | Implement periodic System.gc() nudges using an atomic destruction counter; cache System.gc method. |
| src/native/libs/System.Security.Cryptography.Native.Android/pal_misc.h | Declare exported native entry point to trigger Java GC from managed. |
| src/native/libs/System.Security.Cryptography.Native.Android/pal_misc.c | Implement exported AndroidCryptoNative_TriggerJavaGarbageCollection. |
| src/libraries/tests.proj | Re-enable running System.Security.Cryptography.Tests for Android configurations and add back to smoke tests. |
| src/libraries/System.Security.Cryptography/tests/System.Security.Cryptography.Tests.csproj | Include new Android interop and the xUnit GC-after-test attribute when UseAndroidCrypto is enabled. |
| src/libraries/System.Security.Cryptography/tests/AndroidGarbageCollectAttribute.cs | Add assembly-level xUnit hook to force GC/finalization + Java GC after each test on Android. |
| src/libraries/System.Security.Cryptography/src/System/Security/Cryptography/X509Certificates/AndroidCertificatePal.cs | Add native memory pressure tracking for Android certificate PAL instances. |
| src/libraries/System.Security.Cryptography/src/System.Security.Cryptography.csproj | Include shared AndroidCryptoMemoryPressure in the AndroidCrypto build. |
| src/libraries/Common/src/System/Security/Cryptography/AndroidCryptoMemoryPressure.cs | Introduce shared helper to Add/Remove GC memory pressure for Android crypto objects. |
| src/libraries/Common/src/System/Security/Cryptography/RSAAndroid.cs | Add native memory pressure tracking for RSA keys (including regeneration/import paths). |
| src/libraries/Common/src/System/Security/Cryptography/DSAAndroid.cs | Add native memory pressure tracking for DSA keys. |
| src/libraries/Common/src/System/Security/Cryptography/ECAndroid.cs | Add native memory pressure tracking for EC keys. |
| src/libraries/Common/src/Interop/Android/System.Security.Cryptography.Native.Android/Interop.GarbageCollection.cs | Add managed P/Invoke declaration for triggering Java GC. |
...rity.Cryptography/src/System/Security/Cryptography/X509Certificates/AndroidCertificatePal.cs
Show resolved
Hide resolved
src/libraries/System.Security.Cryptography/tests/AndroidGarbageCollectAttribute.cs
Show resolved
Hide resolved
src/libraries/Common/src/System/Security/Cryptography/AndroidCryptoMemoryPressure.cs
Outdated
Show resolved
Hide resolved
src/libraries/Common/src/System/Security/Cryptography/RSAAndroid.cs
Outdated
Show resolved
Hide resolved
src/libraries/Common/src/System/Security/Cryptography/AndroidCryptoMemoryPressure.cs
Outdated
Show resolved
Hide resolved
GC.AddMemoryPressure/RemoveMemoryPressure is intended for types that rely solely on finalizers, not for types implementing the dispose pattern. Since the Android crypto and X509 types all implement IDisposable, the correct fix is to ensure Dispose is called rather than adding pressure hints. Retains the native Java GC nudges (MaybeTriggerJavaGC) and the test AndroidGarbageCollectAttribute, which address a separate concern. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The Deformatter() and Formatter() tests created DSA instances without using statements, leaking native key resources. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
I ran an RSS profile on the test app while it was running. Memory spikes up to ~1.1GB but consistently recovers back down to ~200MB. |
| // destructions, call System.gc() to give ART a chance to reclaim native | ||
| // memory (BoringSSL buffers) backing Java key/certificate objects whose | ||
| // JNI global refs have already been deleted. | ||
| #define JAVA_GC_TRIGGER_INTERVAL 5 |
There was a problem hiding this comment.
This feels like a very low number.
What would be the impact of these extra GCs on real world Android apps?
There was a problem hiding this comment.
I am wondering the same thing.
I threw it up here as a starting point and we can increase the number or dial this back completely.
| /// Forces a garbage collection after each test method on Android. | ||
| /// On Android, crypto objects wrap JNI global refs backed by Java/BoringSSL | ||
| /// native memory that the .NET GC cannot see. Forcing a collection after | ||
| /// each test ensures SafeHandles are finalized and JNI roots are removed |
There was a problem hiding this comment.
Assuming the tests are disposing the crypto objects properly, this should not be finalizing any.
| GC.Collect(); | ||
| GC.WaitForPendingFinalizers(); |
There was a problem hiding this comment.
| GC.Collect(); | |
| GC.WaitForPendingFinalizers(); |
This should not be needed if the tests are disposing the crypto objects properly.
| // destructions, call System.gc() to give ART a chance to reclaim native | ||
| // memory (BoringSSL buffers) backing Java key/certificate objects whose | ||
| // JNI global refs have already been deleted. | ||
| #define JAVA_GC_TRIGGER_INTERVAL 5 |
There was a problem hiding this comment.
I would expect that the Android GC is going to be smart enough to trigger often enough to avoid running into out of memory issues.
Are we sure that this change is not just trying to mask actual bugs? (memory leaks, missing Dispose calls, or bad tunning of .NET GC for mobile)
There was a problem hiding this comment.
No, this is a legit issue that is a visibility gap between the two GC's. Dispose releases the JNI refs, but the Java GC defers the collection and doesn't know it needs to as they pile up.
ECDsaKeyFileTests has 61 tests that spike the memory to over 1GB in about 3 seconds. The emulators are tight and gets flooded before it has any time to act.
The question I have is the ref counting and Java GC nudging optimizing for an extreme scenario that only happens in our tests? If so, perhaps dialing the tests back is the right answer. I figured I'd start with the runtime change first so that we can see it for real and decide.
There was a problem hiding this comment.
From 10,000ft, these tests should be doing the following on the Java side:
- allocate Java crypto object
- create JNI ref
- call methods on Java crypto object
- release JNI ref
I assume that create JNI ref + release JNI ref does not impact scheduling of the Java GC. Is that correct?
We are left with Java crypto object + call methods on Java crypto object. It is surprising that the Java GC would not be able to handle something as simple as that well.
There was a problem hiding this comment.
ECDsaKeyFileTests has 61 tests that spike the memory to over 1GB in about 3 seconds.
This is 16MB per test on average. Do we have any tests that pass large buffers back and forth? If we are allocating a lot of very large buffers on the Java side, I can imagine that the Java GC may have troubles keeping up with that.
There was a problem hiding this comment.
This looks like a piling on problem. A bunch of small Java objects that allocate native BoringSSL memory that the Java GC is not smart enough to account for. It's either that or a bug in BoringSSL where it does a poor job of keeping the Java GC in the loop about its allocations.
Our tests allocate in rapid succession and boom, out of memory before the GC knows what hit it.
There was a problem hiding this comment.
It could also be a consequence of our JNI integration. We might be operating at a layer lower than "normal" and not getting the benefit of looping in the Java GC.
NativeAllocationRegistry would be helpful, but that is API level 26 and above.
Add GC memory pressure tracking to Android crypto objects (DSA, EC, RSA, X509) so the .NET GC can see the true native cost of JNI-backed Java key and certificate objects. Extract the duplicated bookkeeping into a shared AndroidCryptoMemoryPressure struct.
In the native layer, periodically call System.gc() after crypto object destructions to give ART a chance to reclaim BoringSSL buffers whose JNI global refs have been released. Expose TriggerJavaGarbageCollection for managed callers and add an AndroidGarbageCollectAttribute that forces GC after each test on Android to prevent OOM in test runs.
Fixes #118603
Fixes #62547