Skip to content

Artery system message delivery should ignore ACKs from stale remote UIDs #3174

Description

@He-Pin

Motivation

While reviewing the stale ACK fix in #3173 / #3160, we found a similar stale ACK risk in the Artery system message delivery path.

Classic remoting and Artery use different implementations, so the Classic EndpointReader UID fix does not cover this path. Artery uses SystemMessageDelivery for acked system message delivery.

Current behavior

Artery ACK/NACK messages carry the remote unique address:

  • remote/src/main/scala/org/apache/pekko/remote/artery/SystemMessageDelivery.scala:53
  • remote/src/main/scala/org/apache/pekko/remote/artery/SystemMessageDelivery.scala:54

However, SystemMessageDelivery.notify currently accepts ACK/NACK replies by comparing only the address:

  • remote/src/main/scala/org/apache/pekko/remote/artery/SystemMessageDelivery.scala:157-160
case ack: Ack   => if (ack.from.address == remoteAddress) ackCallback.invoke(ack)
case nack: Nack => if (nack.from.address == remoteAddress) nackCallback.invoke(nack)

When a remote system restarts with a new UID, the association moves to a new incarnation:

  • remote/src/main/scala/org/apache/pekko/remote/artery/Association.scala:306-317

SystemMessageDelivery then notices the incarnation change on send/resend and clears delivery state, including resetting the sequence number:

  • remote/src/main/scala/org/apache/pekko/remote/artery/SystemMessageDelivery.scala:222-225
  • remote/src/main/scala/org/apache/pekko/remote/artery/SystemMessageDelivery.scala:242-246
  • remote/src/main/scala/org/apache/pekko/remote/artery/SystemMessageDelivery.scala:296-302

ACK processing then clears unacknowledged messages by sequence number only:

  • remote/src/main/scala/org/apache/pekko/remote/artery/SystemMessageDelivery.scala:189-197

This means a delayed ACK/NACK from an older UID for the same address can be accepted after a new incarnation has started, if the old UID is not quarantined and the old UID mapping is still known.

Why the old UID can still reach this path

Inbound decoding and filtering are based on originUid and the association registry:

  • remote/src/main/scala/org/apache/pekko/remote/artery/Codecs.scala:427-428
  • remote/src/main/scala/org/apache/pekko/remote/artery/Handshake.scala:328-332
  • remote/src/main/scala/org/apache/pekko/remote/artery/InboundQuarantineCheck.scala:45-71

The registry adds uid -> association mappings:

  • remote/src/main/scala/org/apache/pekko/remote/artery/Association.scala:1180-1197

A new UID handshake does not appear to remove the old UID mapping. The unused-quarantined cleanup checks the association current state, so when the same association currently has a healthy new UID, an old UID mapping may remain:

  • remote/src/main/scala/org/apache/pekko/remote/artery/Association.scala:1236-1256

Possible impact

If an old-UID ACK with a low sequence number is accepted after the new incarnation has reset SystemMessageDelivery sequence numbers, it can incorrectly acknowledge system messages from the new incarnation. That can cause new system messages to be removed from the unacknowledged buffer even though the ACK belongs to the previous remote UID.

This is similar in shape to the stale ACK issue fixed for Classic remoting, but it is in the Artery system message delivery implementation.

Proposed change

In SystemMessageDelivery.notify, validate the full UniqueAddress of ACK/NACK replies against the current association state instead of comparing only the address.

For example, accept the reply only when:

outboundContext.associationState.uniqueRemoteAddress().contains(reply.from)

This should be applied to both Ack and Nack.

Suggested tests

Add focused coverage in remote/src/test/scala/org/apache/pekko/remote/artery/SystemMessageDeliverySpec.scala:

  1. Establish an old UniqueAddress for the same remote address.
  2. Send at least one acked system message and then move the association to a new UniqueAddress / incarnation.
  3. Trigger SystemMessageDelivery state clearing and sequence number reset for the new incarnation.
  4. Inject Ack(1, oldUniqueAddress) and verify it does not acknowledge the new incarnation message.
  5. Inject Ack(1, newUniqueAddress) and verify the current message is acknowledged.
  6. Add the same stale-source check for Nack.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions