Skip to content

feat(grpc): Add test-only tracking for orphaned tasks#2638

Open
arjan-bal wants to merge 6 commits into
grpc:masterfrom
arjan-bal:task-leak-check
Open

feat(grpc): Add test-only tracking for orphaned tasks#2638
arjan-bal wants to merge 6 commits into
grpc:masterfrom
arjan-bal:task-leak-check

Conversation

@arjan-bal
Copy link
Copy Markdown
Contributor

@arjan-bal arjan-bal commented May 12, 2026

This pull request introduces a TrackedRuntime utility to monitor spawned tasks during tests, ensuring they complete before the test exits. This utility is currently integrated into the DNS resolver and Tonic transport tests. Once the channel builder API supports passing a custom runtime during channel creation, coverage for channel internals can also be added.

The Tokio runtime dependency is gated behind a feature flag in the grpc crate. This ensures that only components explicitly guarded by the flag can use it; otherwise, the code will fail to compile when the flag is disabled, which will be caught by the udeps CI job.

tonic/grpc/Cargo.toml

Lines 21 to 32 in 49d28e6

# The following feature is used to ensure all modules use the runtime
# abstraction instead of using tokio directly.
# Using tower/buffer enables tokio's rt feature even though it's possible to
# create Buffers with a user provided executor.
_runtime-tokio = [
"tokio/rt",
"tokio/net",
"tokio/time",
"dep:socket2",
"dep:tower",
"dep:futures",
]

By ensuring all tasks are spawned using the Runtime abstraction, we make it possible to monitor them using the TrackedRuntime utility. Instrumenting tests only requires adding a couple of lines at the beginning and end of each test.

While we could potentially hide this boilerplate using a macro that wraps the test and passed an instrumented runtime as an argument, I opted against it to keep the implementation simple and easy to understand.

@sauravzg
Copy link
Copy Markdown
Contributor

sauravzg commented May 13, 2026

I haven't gone through the full PR, but here is my braindump which may not be coherent.

@arjan-bal arjan-bal self-assigned this May 14, 2026
Copy link
Copy Markdown
Member

@dfawley dfawley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally would prefer task leak checks to be implicit and probably on by default for all unit tests

I agree. (Maybe we need a magic cargo <command> for running our tests or something?)

This current approach means that every test that wants to use this needs to manually create a runtime and pass it into their channel/etc. That part actually would be easy enough to fix -- instead of default_runtime() returning a GrpcRuntime instance, it could return a factory. Our test factory could register itself as the default in our tests, and then use a task local to reuse one shared GrpcRuntime instance for all servers & clients within that test task.

That would mostly resolve the concern of it being automatic, though it would mean all tests would need something to call the wait function or else they would error. Or: maybe it wouldn't error, if any orphaned task also had a reference to the runtime in it.

Actually, that same problem occurs with this code as-is, too -- if you forget to call the wait function, and leak a task with a clone of the runtime in it, you won't get an error.

Comment thread grpc/src/rt/tracker.rs
tracker.wait_timeout = Duration::from_millis(1);

tracked_rt.spawn(Box::pin(async {
tokio::time::sleep(DEFAULT_TEST_DURATION).await;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's less of a problem since I think rust by default runs all tests in parallel? But can we have like a tracker.wait_for_tasks_with_timeout(Duration::from_millis(100)) or something so this test doesn't take 10 wall-time seconds to run?

@arjan-bal arjan-bal assigned arjan-bal and unassigned dfawley, arjan-bal and sauravzg May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants