Fix Android AEAD tag mismatch exceptions#128889
Conversation
Co-authored-by: simonrozsival <374616+simonrozsival@users.noreply.github.com>
|
Does AesGcm need the same fix? |
|
@copilot let's change the Android implementation of AesGcm and AesCcm (if CCM is supported on Android) to use the new AEAD update call. |
Co-authored-by: vcsjones <361677+vcsjones@users.noreply.github.com>
| [return: MarshalAs(UnmanagedType.Bool)] | ||
| private static partial bool EvpAeadCipherUpdate( | ||
| SafeEvpCipherCtxHandle ctx, | ||
| ref byte @out, |
There was a problem hiding this comment.
@copilot let's avoid using keywords as parameter names in C# so they don't need to be @-prefixed. Call them input or output, for example.
Let's not use the ref byte and ref MemoryMarshal.GetReference pattern any more. Just use ReadOnlySpan and Span - the LibraryImport marshaller understands how to marshal them. See @src/libraries/Common/src/Interop/OSX/System.Security.Cryptography.Native.Apple/Interop.X25519.cs as an example.
There was a problem hiding this comment.
Addressed in b28e258 by changing the new Android AEAD update interop signature to use Span<byte> / ReadOnlySpan<byte> directly and by removing the keyword-prefixed parameter names.
Co-authored-by: vcsjones <361677+vcsjones@users.noreply.github.com>
|
/azp run runtime-android |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Tagging subscribers to 'arch-android': @vitek-karas, @simonrozsival, @steveisok, @akoeplinger |
There was a problem hiding this comment.
Pull request overview
This PR adjusts the Android crypto (Javax.Crypto Cipher) interop and managed AEAD implementations so that authentication tag failures detected during Cipher.update(...) are surfaced as AuthenticationTagMismatchException (instead of a generic CryptographicException), aligning behavior with AEAD expectations.
Changes:
- Added a new native entry point (
AndroidCryptoNative_AeadCipherUpdate) that detectsAEADBadTagExceptionduringCipher.update. - Refactored native cipher update to check for JNI exceptions before copying output back to managed buffers, and reused the same tag-mismatch detection pattern for finalization.
- Updated Android managed decrypt paths (ChaCha20-Poly1305, AES-GCM, AES-CCM) to use the AEAD-aware update and throw
AuthenticationTagMismatchExceptionon detected tag mismatch.
Show a summary per file
| File | Description |
|---|---|
| src/native/libs/System.Security.Cryptography.Native.Android/pal_cipher.h | Declares the new AEAD-aware cipher update export. |
| src/native/libs/System.Security.Cryptography.Native.Android/pal_cipher.c | Implements AEAD-aware update and centralizes AEADBadTagException detection for update/final. |
| src/libraries/Common/src/Interop/Android/System.Security.Cryptography.Native.Android/Interop.Cipher.cs | Adds managed interop for the new AEAD-aware update entry point. |
| src/libraries/System.Security.Cryptography/src/System/Security/Cryptography/ChaCha20Poly1305.Android.cs | Uses AEAD-aware update for tag processing and maps mismatches to AuthenticationTagMismatchException. |
| src/libraries/System.Security.Cryptography/src/System/Security/Cryptography/AesGcm.Android.cs | Uses AEAD-aware update for tag processing and maps mismatches to AuthenticationTagMismatchException. |
| src/libraries/System.Security.Cryptography/src/System/Security/Cryptography/AesCcm.Android.cs | Uses AEAD-aware update for tag processing and maps mismatches to AuthenticationTagMismatchException. |
Copilot's findings
- Files reviewed: 6/6 changed files
- Comments generated: 1
|
/azp run runtime-android |
|
Azure Pipelines successfully started running 1 pipeline(s). |
…decrypt On some Android devices, when an AEAD decryption fails authentication, Conscrypt determines the thrown exception type by reading BoringSSL's thread-local error queue, which can still contain a stale error from a prior unrelated operation on the same thread (e.g. an RSA error). That stale error is read first (FIFO) and mapped to the wrong exception type (observed java.security.InvalidKeyException with an RSA MODULUS_TOO_LARGE message during a ChaCha20Poly1305 decrypt) instead of the expected javax.crypto.AEADBadTagException. Because detection only matched AEADBadTagException, authTagMismatch stayed false and a generic CryptographicException was thrown instead of AuthenticationTagMismatchException, intermittently failing ChaCha20Poly1305Tests.EncryptTamperAADDecrypt. For a successfully initialized AEAD cipher in decrypt mode, the key and parameters were already validated at Cipher.init, so the only legitimate failure at the tag-update/doFinal step is an authentication failure. Treat any java.security.GeneralSecurityException caught there (in decrypt mode) as an authentication tag mismatch. This also covers AES-GCM and AES-CCM, which share the same native AEAD paths. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add a debug breadcrumb on the GeneralSecurityException fallback path so a genuine non-authentication failure, if ever reclassified as a tag mismatch, can still be diagnosed from logcat. This path does not run for the common AEADBadTagException case and is behavior-neutral for pass/fail. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Guard GetStringUTFChars with an ExceptionCheck after getMessage() and clear any pending JNI exception before returning, so the fallback diagnostic path never leaves a stray Java exception for the caller (which raises the managed AuthenticationTagMismatchException next). No memory or gref leaks: only local refs are created and all are released. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
I'd like to look at this a bit further on Thursday, I am not convinced this is the right change (it could be! I'm just not sure yet). |
|
/azp run runtime-android |
|
Azure Pipelines successfully started running 1 pipeline(s). |
| if (!outl && !in) | ||
| // it means caller wants us to record "inl" but we don't need it. | ||
| return SUCCESS; | ||
|
|
||
| abort_if_invalid_pointer_argument(outl); | ||
| abort_if_invalid_pointer_argument(in); | ||
|
|
||
| int32_t result = FAIL; | ||
| INIT_LOCALS(loc, inDataBytes, outDataBytes); | ||
|
|
||
| JNIEnv* env = GetJNIEnv(); | ||
| jbyteArray inDataBytes = make_java_byte_array(env, inl); | ||
| (*env)->SetByteArrayRegion(env, inDataBytes, 0, inl, (jbyte*)in); | ||
| loc[inDataBytes] = make_java_byte_array(env, inl); | ||
| (*env)->SetByteArrayRegion(env, loc[inDataBytes], 0, inl, (jbyte*)in); | ||
| ON_EXCEPTION_PRINT_AND_GOTO(cleanup); |
The previous fix treated any GeneralSecurityException during an Android AEAD decrypt as an authentication tag mismatch. That depended on Conscrypt implementation details and risked false positives, so revert the three native commits (pal_cipher.c / pal_jni.c / pal_jni.h) back to recognizing only a genuine AEADBadTagException. Root cause: on AEAD decrypt failure Conscrypt derives the Java exception type by popping the oldest entry from BoringSSL's thread-local FIFO error queue. A stale error left by an unrelated prior operation on the same pooled thread (e.g. an RSA failure) is read first and mapped to the wrong type (InvalidKeyException), which is not a BadPaddingException and so is never converted to AEADBadTagException. Conscrypt clears the whole queue after reading, so the failed attempt leaves a clean queue. Fix: in the managed Android AEAD DecryptCore, when the first attempt throws a CryptographicException that is not AuthenticationTagMismatchException, retry the decrypt once on the same thread. The retry reads a clean error queue and observes the true result (a real tag mismatch surfaces as AEADBadTagException). A valid ciphertext authenticates regardless of the queue, so the retry only ever runs after a genuine failure and can never turn an authentication failure into unauthenticated plaintext. Validated on a physical 32-bit Android device (Samsung Galaxy A16, armeabi-v7a): 5x full System.Security.Cryptography test suite, 12373 tests, 0 failures, including all ChaCha20Poly1305 EncryptTamperAADDecrypt cases. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
| // failed attempt clears that queue, so an immediate same-thread retry observes | ||
| // the true result. A valid ciphertext cannot reach this path, so the retry can | ||
| // never turn an authentication failure into unauthenticated plaintext. | ||
| DecryptCoreOnce(nonce, ciphertext, tag, plaintext, associatedData); |
There was a problem hiding this comment.
I am very suspicious about this fix. Here is what we know about the failure.
- There is one consistent test failure:
EncryptTamperAADDecrypt(dataLength: 0, additionalDataLength: 1). This is a bit of an edge-case test. All failures are for an empty ciphertext with AAD. - All failures are for ARM32, not ARM64 or XArch.
"Retry" logic here generally feels very wrong to me. I'd sooner mark this one test case (not the whole test!) as ActiveIssue to better understand it first.
I am going to try to open a throw-away pull request with some better diagnostics to see what is going on since I cannot reproduce this locally.
If we want to get CI back to green, I would sooner disable that one very-specific test case for Android as an ActiveIssue.
|
I'm trying to do some scouting / investigation here: #129000 |
|
Okay, so this does appear to be a bug in Conscrypt. When it imports an RSA key - and fails - it does not seem to be handling the error queue correctly. The actual exception that JNI is giving us for the bad authentication tag is ChaCha20Poly1305 throwing an error for an RSA modulus being too large is obviously nonsense. We can "fix" ChaCha20 but there is actually nothing wrong with it. The RSA tests just left stale data in the error queue. I would say we probably shouldn't try working around this in ChaCha20Poly1305 - because RSA actually poisoned the error queue, this is very likely to crop up somewhere else - AES-GCM AEAD tag handling, or any other place that cares about what exception was thrown. We might be clearing the error queue here, but this is going to depend on the order in which tests run, if new tests are added, etc. It feels extremely fragile to me. I put up #129033 as an alternative suggestion for a fix, for now. |
|
#129033 ran and all cryptography tests passed. I think we should close this in favor of the other pull request. I can reproduce this problem with AES-GCM - it isn't specific to ChaCha20/Poly1305. I don't think it is feasible for us to sprinkle error handling logic like this everywhere, we were just depending on unit test order. If we want to put a work around in somewhere when we should do it in RSA to stop it from contaminating the BoringSSL error queue. |
Android AEAD ciphers can surface authentication tag failures from the tag
Cipher.updatecall instead of finalization. That caused tampered decryptions to throwCryptographicExceptioninstead ofAuthenticationTagMismatchException.Native Android crypto interop
AEADBadTagException.Managed Android AEAD paths
AuthenticationTagMismatchException.Closes #128888