fix: wait for spawned tokio task before releasing native plan#3833
Draft
andygrove wants to merge 1 commit intoapache:mainfrom
Draft
fix: wait for spawned tokio task before releasing native plan#3833andygrove wants to merge 1 commit intoapache:mainfrom
andygrove wants to merge 1 commit intoapache:mainfrom
Conversation
When a tokio task is spawned for async execution (no JVM data sources), the stream and its MemoryReservations are owned by the tokio task. If releasePlan drops the ExecutionContext without waiting for the task to complete, the reservations are released asynchronously on the tokio thread — potentially after Spark has already called cleanUpAllAllocatedMemory(), causing "release called on X bytes but task only has 0 bytes" warnings. Store the JoinHandle from the spawned task and block on it in releasePlan after dropping the batch receiver (to signal the task to exit). This ensures all memory is released back to Spark before releasePlan returns to the JVM. Also fixes Source 2 of apache#2470 — GlobalRefs held by the stream are now dropped while the JVM thread is still the caller, avoiding detached thread warnings. Closes apache#2453
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #2453.
Rationale for this change
When a tokio task is spawned for async execution (the path taken when there are no JVM data sources), the execution stream and its
MemoryReservations are owned by the tokio task. IfreleasePlandrops theExecutionContextwithout waiting for the task to complete, the reservations are released asynchronously on the tokio thread — potentially after Spark has already calledcleanUpAllAllocatedMemory(), causing:The sequence that triggers this:
TaskCompletionListenercallsCometExecIterator.close()→releasePlan()via JNIreleasePlandropsExecutionContext, which dropsbatch_receiver— signaling the tokio task to stopreleasePlanreturns to JVM before the tokio task finishes cleanupcleanUpAllAllocatedMemory()— zeroes out the task's allocationMemoryReservations →release_to_spark()→ Spark sees "0 bytes"This also fixes Source 2 of #2470 —
GlobalRefs held by the stream are now dropped while the JVM thread is still the caller, avoiding the "Dropping a GlobalRef in a detached thread" warning.What changes are included in this PR?
task_handlefield toExecutionContextto store theJoinHandlefrom the spawned tokio taskreleasePlan, drop thebatch_receiverfirst (to signal the task to exit its loop), thenblock_onthe handle to wait for the tokio task to complete before dropping the contextThis guarantees all memory releases and
GlobalRefdrops happen beforereleasePlanreturns to the JVM.How are these changes tested?
This race condition requires a full Spark executor environment where the task completion sequence (
TaskCompletionListener→cleanUpAllAllocatedMemory) races with async tokio task cleanup. It is not reproducible in unit tests. The fix is verified by code inspection:block_on(handle)ensures the tokio task completes (dropping the stream and all reservations) beforereleasePlanreturns. Clippy passes cleanly.