fix: drop JNI GlobalRef before detaching thread in memory pool errors#3832
Draft
andygrove wants to merge 1 commit intoapache:mainfrom
Draft
fix: drop JNI GlobalRef before detaching thread in memory pool errors#3832andygrove wants to merge 1 commit intoapache:mainfrom
andygrove wants to merge 1 commit intoapache:mainfrom
Conversation
When memory pool JNI calls throw Java exceptions on tokio worker threads, the GlobalRef inside JavaException can outlive the AttachGuard and be dropped on a detached thread, triggering warnings and expensive attach/detach cycles. Add CometError::drop_throwable() to convert JavaException errors into string-only Internal errors while the thread is still JVM-attached, and apply it in all memory pool JNI methods. Closes apache#2470
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #2470.
Rationale for this change
When running with reduced off-heap memory, JNI
GlobalRefobjects insideJavaExceptionerrors can be dropped on tokio worker threads that are not attached to the JVM. This triggers warnings from thejnicrate ("Dropping a GlobalRef in a detached thread") and causes expensive temporary attach/detach cycles.The root cause: memory pool methods (
acquire_memory/release_memory) call JNI via a temporaryAttachGuard. If the JNI call throws a Java exception, aCometError::JavaExceptionis created containing aGlobalRefto the throwable. When the method returns, theAttachGuarddrops and detaches the thread — but theGlobalRefinside the error outlives it. As the error propagates through DataFusion on the now-detached tokio thread and is eventually converted toDataFusionError::Execution(String), theGlobalRefis dropped on the detached thread, triggering the warning.What changes are included in this PR?
CometError::drop_throwable()which convertsJavaExceptionerrors (containing aGlobalRef) into string-onlyInternalerrors, ensuring theGlobalRefis dropped while the thread is still JVM-attached..map_err(CometError::drop_throwable)in all four memory pool JNI methods:acquire_from_spark,release_to_spark(unified pool),acquire,release(fair pool).The
GlobalRefis safe to drop early here because these errors always propagate throughDataFusionError::Execution(String)which stringifies the error anyway — the throwable reference is never used to re-throw the original Java exception from this path.How are these changes tested?
This is difficult to test in a unit test since it requires a full Spark executor environment with memory pressure on tokio worker threads. The fix is verified by code inspection:
map_errexecutes while theAttachGuard(env) is still in scope, so theGlobalRefis released on an attached thread. Clippy passes cleanly.