HDDS-14830. Handle interrupt gracefully in XceiverClientGrpc.sendCommandWithRetry by chihsuan · Pull Request #10178 · apache/ozone

chihsuan · 2026-05-02T12:18:08Z

What changes were proposed in this pull request?

XceiverClientGrpc.sendCommandWithRetry retries a request across the pipeline's datanodes and rethrows the last captured ioException after the loop. The catch (InterruptedException) branch logged the error and restored the interrupt flag, but did not assign ioException or exit the loop. With the flag restored, each subsequent Future.get() throws InterruptedException immediately, so the loop falls through every datanode and exits with ioException == null. The post-loop Objects.requireNonNull(ioException, ...) then throws NullPointerException, masking the real interrupt — which is what the EC reconstruction worker pool sees during datanode shutdown.

The fix throws InterruptedIOException (with the original InterruptedException as cause) directly from the catch block. This matches the existing convention already used in XceiverClientGrpc.sendCommand (lines 396-411) and XceiverClientSpi.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-14830

How was this patch tested?

New unit test TestXceiverClientGrpc#testInterruptedCommandThrowsInterruptedIOException reproduces the original NPE deterministically (pre-sets the test thread's interrupt flag and stubs sendCommandAsync to return a never-completing future) and asserts:

InterruptedIOException is thrown — the regression that this PR fixes (was NullPointerException).
The cause is the original InterruptedException — debugging info preserved.
Thread.currentThread().isInterrupted() is still true — callers can still detect the interrupt.

The flag is cleared in finally before client.close() runs, so channel shutdown is not affected by the test setup.

The test fails on master without the production-code change in this PR, confirming it exercises the actual bug path:

Local checks: mvn checkstyle:check -pl hadoop-hdds/client,hadoop-ozone/integration-test → 0 violations.
Full CI: Runs via the build-branch workflow on the fork. https://github.com/chihsuan/ozone/actions/runs/25251708235

…andWithRetry

Gargi-jais11

Thanks @chihsuan for working on this. LGTM!

HDDS-14830. Handle interrupt gracefully in XceiverClientGrpc.sendComm…

57eb41b

…andWithRetry

chihsuan marked this pull request as ready for review May 2, 2026 13:45

ivandika3 requested a review from jojochuang May 2, 2026 14:05

Gargi-jais11 reviewed May 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HDDS-14830. Handle interrupt gracefully in XceiverClientGrpc.sendCommandWithRetry#10178

HDDS-14830. Handle interrupt gracefully in XceiverClientGrpc.sendCommandWithRetry#10178
chihsuan wants to merge 1 commit intoapache:masterfrom
chihsuan:HDDS-14830

chihsuan commented May 2, 2026 •

edited

Loading

Uh oh!

Gargi-jais11 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

chihsuan commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

Uh oh!

Gargi-jais11 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chihsuan commented May 2, 2026 •

edited

Loading