fix(scraper): add fail-fast threshold for unhealthy targets#377
Open
fix(scraper): add fail-fast threshold for unhealthy targets#377
Conversation
There was a problem hiding this comment.
Pull request overview
Adds an OpenSpec change describing a new “fail-fast” crawl policy based on child-page failure rate, alongside a reduced default HTTP retry budget, to avoid wasting time on broadly unhealthy scrape targets while preserving existing root-page fail-fast behavior.
Changes:
- Defines new spec requirements/scenarios for default HTTP retries, root-page abort semantics, child-page failure-rate threshold aborts, and refresh deletion exclusions.
- Documents the design decisions (single exposed threshold, internal minimum sample size, exclusion rules).
- Adds an implementation task checklist for config/schema/tests, fetcher retry behavior, and strategy/tool integration.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| openspec/changes/fail-fast-scrape-thresholds/tasks.md | Implementation checklist for config defaults, failure-rate accounting, retry behavior, and test verification. |
| openspec/changes/fail-fast-scrape-thresholds/specs/scrape-failure-policy/spec.md | Spec scenarios for retry behavior, root-page abort, failure-rate threshold evaluation, and refresh deletion exclusions. |
| openspec/changes/fail-fast-scrape-thresholds/specs/configuration/spec.md | Config-facing spec for scraper.abortOnFailureRate and the updated scraper.fetcher.maxRetries default. |
| openspec/changes/fail-fast-scrape-thresholds/proposal.md | High-level rationale/scope/impact for the change set. |
| openspec/changes/fail-fast-scrape-thresholds/design.md | Design rationale and trade-offs for the failure-rate threshold + minimum sample size approach. |
| openspec/changes/fail-fast-scrape-thresholds/.openspec.yaml | Metadata marking this as a spec-driven change with creation date. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
openspec/changes/fail-fast-scrape-thresholds/specs/configuration/spec.md
Outdated
Show resolved
Hide resolved
openspec/changes/fail-fast-scrape-thresholds/specs/configuration/spec.md
Outdated
Show resolved
Hide resolved
openspec/changes/fail-fast-scrape-thresholds/specs/scrape-failure-policy/spec.md
Outdated
Show resolved
Hide resolved
117c536 to
e8df0fa
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
404are handled as deletions, while normal non-refresh404pages remain terminal failures instead of deletion eventsValidation