Skip to content

fix: drop JNI GlobalRef before detaching thread in memory pool errors#3832

Draft
andygrove wants to merge 1 commit intoapache:mainfrom
andygrove:fix-globalref-detached-thread-warning
Draft

fix: drop JNI GlobalRef before detaching thread in memory pool errors#3832
andygrove wants to merge 1 commit intoapache:mainfrom
andygrove:fix-globalref-detached-thread-warning

Conversation

@andygrove
Copy link
Copy Markdown
Member

Which issue does this PR close?

Closes #2470.

Rationale for this change

When running with reduced off-heap memory, JNI GlobalRef objects inside JavaException errors can be dropped on tokio worker threads that are not attached to the JVM. This triggers warnings from the jni crate ("Dropping a GlobalRef in a detached thread") and causes expensive temporary attach/detach cycles.

The root cause: memory pool methods (acquire_memory/release_memory) call JNI via a temporary AttachGuard. If the JNI call throws a Java exception, a CometError::JavaException is created containing a GlobalRef to the throwable. When the method returns, the AttachGuard drops and detaches the thread — but the GlobalRef inside the error outlives it. As the error propagates through DataFusion on the now-detached tokio thread and is eventually converted to DataFusionError::Execution(String), the GlobalRef is dropped on the detached thread, triggering the warning.

What changes are included in this PR?

  • Add CometError::drop_throwable() which converts JavaException errors (containing a GlobalRef) into string-only Internal errors, ensuring the GlobalRef is dropped while the thread is still JVM-attached.
  • Apply .map_err(CometError::drop_throwable) in all four memory pool JNI methods: acquire_from_spark, release_to_spark (unified pool), acquire, release (fair pool).

The GlobalRef is safe to drop early here because these errors always propagate through DataFusionError::Execution(String) which stringifies the error anyway — the throwable reference is never used to re-throw the original Java exception from this path.

How are these changes tested?

This is difficult to test in a unit test since it requires a full Spark executor environment with memory pressure on tokio worker threads. The fix is verified by code inspection: map_err executes while the AttachGuard (env) is still in scope, so the GlobalRef is released on an attached thread. Clippy passes cleanly.

When memory pool JNI calls throw Java exceptions on tokio worker threads,
the GlobalRef inside JavaException can outlive the AttachGuard and be
dropped on a detached thread, triggering warnings and expensive
attach/detach cycles.

Add CometError::drop_throwable() to convert JavaException errors into
string-only Internal errors while the thread is still JVM-attached, and
apply it in all memory pool JNI methods.

Closes apache#2470
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Dropping a GlobalRef in a detached thread

1 participant