Skip to content

[Detail Bug] Append session NO_SIDE_EFFECTS retries can duplicate records due to incorrect frame_signal reset #45

@detail-app

Description

@detail-app

Detail Bug Report

https://app.detail.dev/org_89d327b3-b883-4365-b6a3-46b6701342a9/bugs/bug_d9956c71-cb46-45c6-832d-cdf3a351da0c

Introduced in #1 by @quettabit on Apr 7, 2026

Summary

  • Context: The append session retry logic uses frame_signal to track whether HTTP/2 DATA frames were written to the socket, enabling safe retry decisions with the NO_SIDE_EFFECTS policy.
  • Bug: In retry scenarios, frame_signal.reset() is called when all resend acknowledgments are processed, but fresh data may have already been sent, causing the signal to be incorrectly cleared.
  • Actual vs. expected: The signal should remain True if any data (resend or fresh) was sent, but it gets reset to False after resend acks even when fresh data was already transmitted.
  • Impact: With NO_SIDE_EFFECTS policy, retries are incorrectly allowed for errors like 500/503/504/408/TransportError, leading to duplicate records in the stream.

Code with Bug

# _run_attempt in src/s2_sdk/_s2s/_append_session.py

# Lines 194-197 - The buggy reset logic
if resend_remaining > 0:
    resend_remaining -= 1
    if resend_remaining == 0 and frame_signal is not None:
        frame_signal.reset()  # <-- BUG 🔴 clears the signal even if fresh DATA has already been sent

Explanation

  • _run_attempt uses frame_signal to decide if it is safe to retry with AppendRetryPolicy.NO_SIDE_EFFECTS. The retry logic treats frame_signal being unset (False) as evidence that no side-effecting DATA was written.
  • The ack loop decrements resend_remaining as resend acknowledgments arrive and calls frame_signal.reset() when resend_remaining reaches zero.
  • However, resend and fresh payloads can both be sent before all resend acks are processed (the send path calls on_write()/signal() after releasing the write lock, while acks are processed concurrently). This allows the sequence:
    1. resend sent → signal=True; 2) fresh sent → signal=True; 3) resend ack processed → reset()signal=False; 4) an error occurs before the fresh ack.
  • At that point there is still fresh data in flight, but frame_signal is False. In _retrier.py, this makes not_signalled=True, which bypasses the has_inflight guard and allows a retry even for errors that are not considered “no side effects” (e.g., 500/503/504/408/TransportError). This can cause duplicate appends.

Recommended Fix

Track whether fresh data has been added to inflight_inputs:

# In _run_attempt, before the message loop
initial_inflight_count = len(inflight_inputs)

# In the ack loop (lines 194-197)
if resend_remaining > 0:
    resend_remaining -= 1
    if resend_remaining == 0 and frame_signal is not None:
        # Only reset if no fresh data has been added
        if len(inflight_inputs) == initial_inflight_count:
            frame_signal.reset()

Safety: len(inflight_inputs) is atomic in Python (GIL). The check provides eventual consistency for the retry decision.

History

This bug was introduced in commit 3dc9795. The commit added the initial implementation of the S2 SDK including the retry mechanism for append sessions. The frame_signal.reset() logic was designed to clear the signal once all resend data was acknowledged, but it didn't account for the concurrent execution model where fresh data could already be sent via _body_gen while resend acknowledgments are being processed in the ack loop. The bug slipped in because the reset condition only checks resend_remaining == 0 without verifying that no fresh data has been added to inflight_inputs.

Metadata

Metadata

Assignees

Labels

detail-bugbug flagged by https://detail.dev/

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions