Skip to content

Fix/retry backoff and retry after 21#22

Merged
MarketDataDev03 merged 6 commits into
mainfrom
fix/retry-backoff-and-retry-after-21
May 15, 2026
Merged

Fix/retry backoff and retry after 21#22
MarketDataDev03 merged 6 commits into
mainfrom
fix/retry-backoff-and-retry-after-21

Conversation

@MarketDataDev03
Copy link
Copy Markdown
Collaborator

Retry/backoff updates per spec sections 9.3, 9.4, 9.5

Branch: fix/retry-backoff-and-retry-after-21main

Summary

Aligns the SDK's retry and status-check behavior with the SDK requirements spec sections 9.3 (backoff strategy), 9.4 (Retry-After header), and 9.5 (API status check). No breaking changes: only one new optional constructor parameter and behavioral changes contained to the retry path.

Three feature commits + one cleanup commit:

Commit Scope
ee99d38 9.3 — backoff schedule 1s/2s/4s, configurable max_retries
93291f6 9.4 — honor Retry-After header (delta-seconds and HTTP-date)
a017ba0 9.5 — /status/ cache with 270s/300s thresholds + async refresh
2a085f7 Bug fixes from review + dead code removal

9.3 — Backoff strategy

Spec: initial * 2^retry with retry 0-indexed, initial 1s fixed, default 3 retries (4 total attempts), schedule 1s / 2s / 4s, max_retries configurable via constructor.

Before: RETRY_BACKOFF=0.5 with min=0.5/max=5 clamps. First attempt was outside the retry adapter, so tenacity only produced 2 effective waits instead of 3. Schedule was effectively ~0.5s / 1s.

After:

  • internal_settings.py: INITIAL_RETRY_DELAY = 1.0. MIN_RETRY_BACKOFF / MAX_RETRY_BACKOFF removed.
  • retry.py: custom _compute_wait(retry_state) callable returns initial_delay * 2 ** (retry_state.attempt_number - 1) with no clamps.
  • client.py: MarketDataClient(max_retries=3) constructor parameter, validated >= 0.
  • api_error.py: first attempt moved into the retry adapter so tenacity controls all 3 waits between the 4 attempts.

9.4 — Retry-After header

Spec: if response includes Retry-After, respect the server-specified delay and override the calculated backoff.

Before: header was never read.

After:

  • retry.py: new public helper parse_retry_after(value: str | None) -> float | None accepts both delta-seconds ("120", "3.5") and HTTP-date ("Wed, 21 Oct 2026 07:28:00 GMT"). Returns None on missing/invalid/NaN/Inf values so callers fall back.
  • retry.py:_compute_wait: reads Retry-After from the last failed response (exc.response.headers) before falling back to the exponential formula.

9.5 — API status check

Spec: dual-threshold cache (refresh at 270s, validity 300s) with non-blocking async refresh. offline → fail immediately; online/unknown → continue with retry. No blocking refresh inside the retry loop.

Before: single 270s threshold, blocking refresh() call from inside api_error_handler before invoking the retry adapter.

After:

  • internal_settings.py: CACHE_VALIDITY_INTERVAL = 5 min added; REFRESH_API_STATUS_INTERVAL = 4m30s unchanged.
  • api_status.py:
    • _last_refresh_at field tracks our own local refresh time (not the API's updated field).
    • cache_age returns time since the last successful local refresh, timedelta.max when never refreshed.
    • is_cache_stale (cache_age >= 300s) and should_refresh (cache_age >= 270s) properties.
    • get_api_status implements the three branches: fresh → read cache, refresh-zone → read cache + async refresh, stale/empty → async refresh + return UNKNOWN.
    • _trigger_async_refresh spawns threading.Thread(daemon=True). Guarded by threading.Lock + _refresh_in_flight flag so only one refresh runs at a time.
    • _async_refresh clears the in-flight flag in finally and logs unexpected exceptions via client.logger.exception (so network failures in the background thread don't disappear silently).
    • Reads of cache fields (service/status/online) are now guarded by the lock to avoid partial reads while a refresh thread is updating.
  • api_error.py: the blocking refresh() call is gone. The before_sleep callback only does get_api_status (which internally decides whether to trigger an async refresh).

Dead code removed in 2a085f7

The old last_updated property (which derived the timestamp from the API's updated field) and the unused updated/uptimePct30d/uptimePct90d instance attributes were removed. update() no longer reads those keys from the response payload (the API still sends them, they're just ignored).

Behavioral changes summary

Area Before After
Retry schedule ~0.5s, 1s (2 waits) 1s, 2s, 4s (3 waits, spec-compliant)
Total attempts 4 (correct in count, but uneven schedule) 4
max_retries constant MAX_RETRY_ATTEMPTS=3 configurable per client; default 3
Retry-After header ignored honored, overrides exponential
Status check blocking refresh once after 1st failure non-blocking async refresh, runs before every retry
Status cache freshness model single 270s threshold from API's updated field 270s/300s thresholds from our own refresh timestamp
Background refresh failures n/a (was sync) logged via client.logger.exception

API impact

  • Additive only. New optional parameter MarketDataClient(max_retries: int = 3).
  • No public class, method, or exception was renamed or removed.
  • RETRY_BACKOFF, MIN_RETRY_BACKOFF, MAX_RETRY_BACKOFF removed from internal_settings.py — these were not part of the public surface (they live under internal_settings).

Files changed

 src/marketdata/api_error.py         |  43 +++--
 src/marketdata/api_status.py        |  99 ++++++++---
 src/marketdata/client.py            |  11 +-
 src/marketdata/internal_settings.py |   5 +-
 src/marketdata/retry.py             |  51 +++++-
 src/tests/conftest.py               |  10 ++
 src/tests/test_api_error.py         |  55 +++---
 src/tests/test_api_status.py        | 329 +++++++++++++++++++++---------------
 src/tests/test_client.py            | 151 +++++++++++++----
 src/tests/test_retry.py             | 137 +++++++++++++++
 uv.lock                             |   2 +-
 11 files changed, 639 insertions(+), 254 deletions(-)

Note: uv.lock change is a one-line bump of the package self-reference (1.1.0 → 1.2.0) — pre-existing drift between pyproject.toml (already at 1.2.0) and the lockfile. Picked up automatically by uv sync while developing this branch.

Versioning

This branch does not bump pyproject.toml. The version bump to 1.3.0 and CHANGELOG.md entry will go in a separate "Release v1.3.0" PR per the team's release workflow.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 14, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (421d17d) to head (30771b3).

Additional details and impacted files
@@            Coverage Diff            @@
##              main       #22   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           51        51           
  Lines         2212      2277   +65     
=========================================
+ Hits          2212      2277   +65     
Flag Coverage Δ
unittests 100.00% <100.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@MarketDataDev03 MarketDataDev03 marked this pull request as ready for review May 14, 2026 14:10
@MarketDataDev03 MarketDataDev03 merged commit 66bfae6 into main May 15, 2026
7 checks passed
@MarketDataDev01 MarketDataDev01 linked an issue May 15, 2026 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Change retry times to match sdk-requirements + retry-after

2 participants