feat(scraper): Smart retry – only retry transient errors, fail fast on permanent by kenzaelk98 · Pull Request #345 · arabold/docs-mcp-server

kenzaelk98 · 2026-02-18T14:03:27Z

Issue

The scraper retries every failed request (including permanent errors like 500) up to 7 times with exponential backoff. That wastes time on broken or misconfigured sites and clutters logs, since those errors will not succeed on retry.

Summary of the proposed enhancement

Add smart retry logic so only transient errors are retried; permanent errors (e.g. 4xx, 500) fail fast instead of using all retries.

Changes

Retry only on transient HTTP statuses: 408, 429, 502, 503, 504
Retry only on transient network errors: ETIMEDOUT, ECONNRESET, ECONNREFUSED
Do not retry on permanent errors (4xx except 408/429, 500, other 5xx)
Log once when skipping retry: Permanent error, not retrying: <status/code>
Existing max retries and exponential backoff are unchanged

Testing

Tested locally; behavior is as expected. Scraping one library completed about 10 minutes faster.

…n permanent (e.g. 500)

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

kenzaelk98 · 2026-02-19T06:48:18Z

hi @arabold, seems copilot encountered an error while reviewing 😕

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

kenzaelk98 · 2026-02-20T09:28:56Z

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@arabold it again did it 🙂‍↔️ - hopefully next one is the one🥲

arabold

Sorry, it took me a while to do a full review of this. Not sure why Copilot refused it twice now 🤷 Anyway, thanks for your contribution, @kenzaelk98 !

I left some comments and I would generally prefer keeping most of the existing functionality unchanged to avoid regressions. See my specific code comments. In addition the PR introduces a behavioral change here: completely unknown errors (no HTTP status, no recognizable error code) are stop being retried.

I think this is debatable, and I can see both sides. The current "retry unknown errors" behavior is the safer default for a scraper, but its true that non-network errors would also be retried (like an accidental NULL or undefined type error, for example). What issue did you observe in specific? Your PR suggests only not retrying 500 errors, but was that the main intention?

src/scraper/fetcher/HttpFetcher.ts

kenzaelk98 · 2026-02-23T09:23:28Z

What issue did you observe in specific? Your PR suggests only not retrying 500 errors, but was that the main intention?

Thank you for the review! ☺️
I get that 500 can be transient (overload, deploys) and that many libraries retry it. For a scraper that hits many different, often unreliable doc/API hosts, the cost of retrying permanent 500s felt higher than the benefit of retrying the occasional transient one. When 500s are permanent (e.g. broken endpoints), retrying them 7 times with exponential backoff adds roughly a minute per failed request — in one run that meant about 10 minutes lost on a single library before failing fast.

As a middle ground, I’ve kept 500 retryable but capped it at 3 attempts total (1 initial + 2 retries). That way transient 500s still get a couple of retries, while permanent 500s fail in ~3–4 seconds instead of ~63. Other status codes (408, 429, 502, 503, 504, 525) and network errors still use the full maxRetries (default 7).

I’ve added 525 back as retryable (Cloudflare/cert rotation) and reverted to a blocklist for network errors so we don’t miss transient codes like EAI_AGAIN, EPIPE, EHOSTUNREACH. I kept the shouldRetry() refactor and the “permanent error, not retrying” log.

…3 attempts), blocklist for network errors

Cherry-pick PR arabold#345 from upstream (kenzaelk98). Only retry on transient HTTP statuses (408, 429, 502, 503, 504) and transient network errors (ETIMEDOUT, ECONNRESET, ECONNREFUSED). Permanent errors (4xx except 408/429, 500) fail immediately instead of exhausting all retries with exponential backoff. ~10min faster on large scrapes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

arabold · 2026-03-29T15:23:25Z

Sorry for the lack of feedback on this. I was going back and forth as adding 100+ lines of code for just adjusting the retry count doesn't seem justified. Here's an alternative proposal that incorporates your core idea and adds a true fail-fast implementation, that will abort any crawl if a certain failure threshold is reached: #377

feat(scraper): smart retry – retry only transient errors, fail fast o…

1223994

…n permanent (e.g. 500)

arabold requested a review from Copilot February 18, 2026 14:40

Copilot started reviewing on behalf of arabold February 18, 2026 14:49 View session

Copilot AI reviewed Feb 18, 2026

View reviewed changes

arabold requested a review from Copilot February 19, 2026 14:37

Copilot started reviewing on behalf of arabold February 19, 2026 14:37 View session

Copilot AI reviewed Feb 19, 2026

View reviewed changes

arabold reviewed Feb 22, 2026

View reviewed changes

src/scraper/fetcher/HttpFetcher.ts Show resolved Hide resolved

src/scraper/fetcher/HttpFetcher.ts Outdated Show resolved Hide resolved

src/scraper/fetcher/HttpFetcher.ts Show resolved Hide resolved

feat(scraper): address review - 525 and 500 retryable (500 capped at …

f5a301a

…3 attempts), blocklist for network errors

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(scraper): Smart retry – only retry transient errors, fail fast on permanent#345

feat(scraper): Smart retry – only retry transient errors, fail fast on permanent#345
kenzaelk98 wants to merge 2 commits intoarabold:mainfrom
kenzaelk98:feature/smart-retry-logic

kenzaelk98 commented Feb 18, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

kenzaelk98 commented Feb 19, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

kenzaelk98 commented Feb 20, 2026

Uh oh!

arabold left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kenzaelk98 commented Feb 23, 2026 •

edited

Loading

Uh oh!

arabold commented Mar 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

kenzaelk98 commented Feb 18, 2026

Issue

Summary of the proposed enhancement

Changes

Testing

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

kenzaelk98 commented Feb 19, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

kenzaelk98 commented Feb 20, 2026

Uh oh!

arabold left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kenzaelk98 commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arabold commented Mar 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kenzaelk98 commented Feb 23, 2026 •

edited

Loading