Skip to content

Ignore stale classic remoting ACKs (backport #3158 to 1.7.x)#3168

Open
He-Pin wants to merge 1 commit into
1.7.xfrom
backport/stale-ack-fix-3126-to-1.7
Open

Ignore stale classic remoting ACKs (backport #3158 to 1.7.x)#3168
He-Pin wants to merge 1 commit into
1.7.xfrom
backport/stale-ack-fix-3126-to-1.7

Conversation

@He-Pin

@He-Pin He-Pin commented Jun 24, 2026

Copy link
Copy Markdown
Member

Motivation

Backport of #3158 to 1.7.x.

Classic remoting can receive a delayed system-message ACK from an old association after reconnect/reset. If that stale ACK has a cumulative sequence number newer than the sender's current resend buffer, the old code treated it as a protocol violation, wrapped it as HopelessAssociation, and gated/quarantined the remote system.

See also: akka/akka-core#24654 (same bug in Akka, closed without fix in 2019).

Modification

  • Treat Ack(cumulativeAck > maxSeq) as stale in AckedSendBuffer and return the current buffer unchanged.
  • Log and skip stale ACKs in ReliableDeliverySupervisor before they can be wrapped into HopelessAssociation.
  • Keep the existing ResendUnfulfillableException path for real unfulfillable NACKs.
  • Add regression coverage for empty/non-empty resend buffers, stale ACKs with NACKs, and valid ACK processing after a stale ACK.

Result

Delayed ACKs from a previous association no longer trigger quarantine for the current association, while normal ACK/NACK processing remains unchanged.

Tests

  • sbt "remote / Test / testOnly org.apache.pekko.remote.AckedDeliverySpec" / passed (main branch)
  • Cherry-pick applied cleanly with no conflicts

References

Fixes #3126. Backport of #3158.

Stale system-message acknowledgements can arrive after classic remoting reconnects and the sender has reset its resend buffer for the current UID. Treating a cumulative ACK newer than the sender buffer as a protocol violation gates/quarantines an otherwise valid association.

Modification:

Make AckedSendBuffer ignore cumulative ACKs beyond the highest sequence number it has seen, log and skip those ACKs in ReliableDeliverySupervisor, and add regression coverage for empty/non-empty buffers, stale ACKs with NACKs, and valid ACK processing after a stale ACK.

Result:

Late ACKs from an old receive buffer no longer trigger HopelessAssociation/quarantine, while real unfulfillable NACKs still fail through the existing ResendUnfulfillableException path.

Tests:

- sbt "remote / Test / testOnly org.apache.pekko.remote.AckedDeliverySpec" / passed: 13 tests

- sbt checkCodeStyle / passed

- scalafmt --list --mode diff-ref=origin/main / passed: no files listed

- git diff --check / passed: no output

- subAgent review / FINAL PASS

- qodercli review (non-interactive, log: /tmp/tmo-qoder-review.log) / PASS

References:

Fixes #3126
@He-Pin He-Pin added bug Something isn't working t:stream Pekko Streams labels Jun 24, 2026
@He-Pin He-Pin added this to the 1.7.0 milestone Jun 24, 2026
@pjfanning pjfanning requested review from jrudolph and raboof June 24, 2026 09:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working t:stream Pekko Streams

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant