Skip to content

Ignore stale classic remoting reader ACKs#3173

Open
He-Pin wants to merge 1 commit into
apache:mainfrom
He-Pin:issue-3160-stale-reader-acks
Open

Ignore stale classic remoting reader ACKs#3173
He-Pin wants to merge 1 commit into
apache:mainfrom
He-Pin:issue-3160-stale-reader-acks

Conversation

@He-Pin

@He-Pin He-Pin commented Jun 24, 2026

Copy link
Copy Markdown
Member

Motivation

Classic remoting ACKs forwarded by EndpointReader were not tied to the reader UID, so a delayed ACK from an old reader could affect the resend buffer after a new UID was confirmed.

Modification

Tag decoded ACKs with the EndpointReader UID and have ReliableDeliverySupervisor only process ACKs from the currently confirmed UID.

Add a deterministic supervisor-level regression test covering stale and current reader ACKs. This keeps the regression focused on the actor/UID contract rather than relying on multi-node timing races.

Result

Stale reader ACKs are ignored while current-reader ACKs still acknowledge the active resend buffer.

Tests

  • sbt "remote / Test / testOnly org.apache.pekko.remote.ReliableDeliverySupervisorSpec"
  • sbt headerCreateAll
  • sbt +headerCheckAll
  • scalafmt --mode diff-ref=origin/main
  • scalafmt --list --mode diff-ref=origin/main
  • git diff --check
  • qodercli -p --output-format stream-json --attachment /tmp/pekko-3160-qoder-review.diff (No must-fix findings)
  • sbt sortImports (failed: scalafix/scala.meta NoSuchMethodError in stream-tests and multi-node-testkit)

References

Fixes #3160

@He-Pin He-Pin marked this pull request as draft June 24, 2026 16:41
Motivation:
Classic remoting ACKs forwarded by EndpointReader were not tied to the reader UID, so a delayed ACK from an old reader could affect the resend buffer after a new UID was confirmed.

Modification:
Tag decoded ACKs with the EndpointReader UID and have ReliableDeliverySupervisor only process ACKs from the currently confirmed UID. Add a supervisor-level regression test covering stale and current reader ACKs.

Result:
Stale reader ACKs are ignored while current-reader ACKs still acknowledge the active resend buffer.

Tests:
- sbt "remote / Test / testOnly org.apache.pekko.remote.ReliableDeliverySupervisorSpec"
- sbt headerCreateAll
- sbt +headerCheckAll
- scalafmt --mode diff-ref=origin/main
- scalafmt --list --mode diff-ref=origin/main
- git diff --check
- qodercli -p --output-format stream-json --attachment /tmp/pekko-3160-qoder-review.diff (No must-fix findings)
- sbt sortImports (failed: scalafix/scala.meta NoSuchMethodError in stream-tests and multi-node-testkit)

References:
Fixes apache#3160
@He-Pin He-Pin force-pushed the issue-3160-stale-reader-acks branch from 4a9b2ec to 2d6a120 Compare June 24, 2026 16:51
@He-Pin He-Pin marked this pull request as ready for review June 24, 2026 18:20
@He-Pin He-Pin added bug Something isn't working cluster labels Jun 24, 2026
@He-Pin He-Pin added this to the 2.0.0-M4 milestone Jun 24, 2026
@He-Pin He-Pin requested a review from Copilot June 24, 2026 19:09

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a classic remoting reliability gap where delayed ACKs decoded by an old EndpointReader instance (stale UID) could incorrectly affect the current resend buffer after a new UID is confirmed. It does so by tagging decoded ACKs with the reader UID and only processing ACKs that match the currently confirmed UID, with a focused regression test to prevent reintroductions.

Changes:

  • Introduce ReliableDeliverySupervisor.AckFromReader(readerUid, ack) and ignore untagged Ack messages in the supervisor.
  • Update EndpointReader (both normal and notReading modes) to forward AckFromReader(uid, ack) instead of a bare Ack.
  • Add a deterministic regression spec verifying stale-reader ACKs are ignored while current-reader ACKs still advance the resend buffer.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
remote/src/main/scala/org/apache/pekko/remote/Endpoint.scala Tags forwarded ACKs with EndpointReader UID and gates ACK processing in ReliableDeliverySupervisor to the confirmed UID.
remote/src/test/scala/org/apache/pekko/remote/ReliableDeliverySupervisorSpec.scala Adds regression coverage for ignoring stale-reader ACKs and accepting current-reader ACKs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working cluster

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Classic remoting should ignore ACKs from stale EndpointReader UIDs

2 participants