Wire ProxyRotator into Fetcher + add local CI without GitHub Actions#17
Conversation
The ProxyRotator was implemented but never used: the client only applied a single static proxy at build time, so configured rotation had no effect. - Add FetcherConfig.proxy_list plus rotating_proxies() builder for rotation - Add protocol_proxy() builder and apply the per-protocol proxies map at client build time (closes #8) - Build one reqwest client per rotation proxy and select round-robin per request attempt so a failing proxy is swapped on retry - Add next_index() to ProxyRotator for indexing the client pool Closes #16, closes #8 https://claude.ai/code/session_012RmdaovmNWZVAim4XxCWwn
…ub Actions) scripts/ci.sh mirrors the GitHub Actions workflow (fmt, clippy, build, test) so the same checks run locally or in any environment. .githooks/pre-push runs it before every push; enable once with: git config core.hooksPath .githooks https://claude.ai/code/session_012RmdaovmNWZVAim4XxCWwn
Format the entire repo with rustfmt so the fmt check (local CI gate and the GitHub Actions fmt job) passes. No behavioural changes. https://claude.ai/code/session_012RmdaovmNWZVAim4XxCWwn
📝 WalkthroughWalkthroughAdds rotating-proxy support (config + ProxyRotator index API + Fetcher multi-client pool + tests), a local pre-push hook and CI script, and widespread non-functional reformatting and module declaration reordering across the codebase. ChangesProxy Rotation & HTTP Client Refactoring
Module Organization & Code Formatting
Local CI Infrastructure
🎯 3 (Moderate) | ⏱️ ~25 minutes
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@scripts/ci.sh`:
- Around line 11-12: The local precommit script runs cargo clippy with
--all-targets which diverges from GitHub Actions CI; update the clippy
invocation in scripts/ci.sh by removing the --all-targets flag so the command
becomes cargo clippy -- -W clippy::all -D warnings (or alternatively, if you
prefer stricter checks, add --all-targets to the CI job instead of changing the
script) — locate the echo/cargo clippy lines in scripts/ci.sh and replace the
command string and actual invocation to match the chosen approach.
In `@src/fetchers/client.rs`:
- Around line 70-84: The builder currently calls ClientBuilder::proxy(...) with
config.proxy (Proxy::all) before iterating config.proxies (a HashMap), causing
non-deterministic precedence; change the logic to register scheme-specific
proxies (use reqwest::Proxy::http and ::https and any other scheme-specific
entries) before adding any wildcard Proxy::all, and iterate config.proxies in a
deterministic order (e.g., collect keys and sort them or iterate the known
schemes in order "http","https" then any sorted remaining keys) so that
builder.proxy(...) calls are applied in the intended precedence; refer to the
builder variable, config.proxy, config.proxies, and
Proxy::all/Proxy::http/Proxy::https when implementing this reorder and
deterministic iteration.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 077d5482-4d74-48a8-8007-c7234113ced7
📒 Files selected for processing (34)
.githooks/pre-pushscripts/ci.shsrc/core/attributes_handler.rssrc/core/mod.rssrc/core/storage.rssrc/fetchers/client.rssrc/fetchers/config.rssrc/fetchers/constants.rssrc/fetchers/mod.rssrc/fetchers/proxy.rssrc/lib.rssrc/main.rssrc/parser/mod.rssrc/parser/selector.rssrc/spiders/cache.rssrc/spiders/checkpoint.rssrc/spiders/engine.rssrc/spiders/mod.rssrc/spiders/request.rssrc/spiders/result.rssrc/spiders/robots.rssrc/spiders/scheduler.rssrc/spiders/session.rssrc/spiders/spider.rstests/core_attributes_handler.rstests/core_storage.rstests/fetchers_client.rstests/fetchers_config.rstests/integration_test.rstests/parser_selector.rstests/parser_selector_generation.rstests/spiders_request.rstests/spiders_result.rstests/spiders_scheduler.rs
…ministic - Collapse nested if into match guard in collect_text_recursive (clippy 1.95 collapsible_match, which failed the GitHub Actions Clippy job) - In build_client, apply scheme-specific proxies before any wildcard and iterate the proxies map in deterministic order so specific proxies are not shadowed by Proxy::all (addresses review feedback) - Align CI clippy job with scripts/ci.sh by adding --all-targets https://claude.ai/code/session_012RmdaovmNWZVAim4XxCWwn
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
.github/workflows/ci.yml (1)
42-51:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRestrict clippy job permissions to read-only.
The clippy job runs with default broad permissions but only needs to read the repository. Adding explicit minimal permissions follows the least-privilege security principle and reduces the attack surface.
🔒 Proposed fix to add read-only permissions
clippy: name: Clippy runs-on: ubuntu-latest + permissions: + contents: read steps: - uses: actions/checkout@v4As per static analysis, this addresses the excessive-permissions warning by explicitly scoping permissions.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/ci.yml around lines 42 - 51, The Clippy job ("name: Clippy", job id clippy) currently runs with default permissions; add an explicit minimal permissions block for the job (e.g., permissions: contents: read) so the workflow only has read access to the repository when running the "Run clippy" step (the cargo clippy command) to follow least-privilege principles.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In @.github/workflows/ci.yml:
- Around line 42-51: The Clippy job ("name: Clippy", job id clippy) currently
runs with default permissions; add an explicit minimal permissions block for the
job (e.g., permissions: contents: read) so the workflow only has read access to
the repository when running the "Run clippy" step (the cargo clippy command) to
follow least-privilege principles.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 4b03599f-63ff-4f28-b4e3-d857f84b4ca7
📒 Files selected for processing (3)
.github/workflows/ci.ymlsrc/fetchers/client.rssrc/parser/selector.rs
🚧 Files skipped from review as they are similar to previous changes (2)
- src/parser/selector.rs
- src/fetchers/client.rs
Summary
ProxyRotatorinto theFetcherso configured proxy rotation actually takes effect. Previously the rotator was dead code and the client only applied a single static proxy at build time.proxiesmap (http/https/all) at client-build time, which was declared but never read.scripts/ci.sh+.githooks/pre-push) that mirrors the GitHub Actions workflow so the same checks run without GitHub Actions.rustfmtso the fmt check passes.Details
FetcherConfig.proxy_list+rotating_proxies()builder supply a rotation list. When non-empty,Fetcher::newbuilds onereqwest::Clientper proxy and aProxyRotator;next_client()selects round-robin per request attempt, so a failing proxy is swapped on retry.protocol_proxy(scheme, url)builder +ProxyRotator::next_index()added.scripts/ci.shrunscargo fmt --check,cargo clippy -- -D warnings,cargo build,cargo test. Enable the hook once withgit config core.hooksPath .githooks.Test plan
cargo test— 184 tests passing (new tests covernext_index,protocol_proxy,rotating_proxies)cargo clippy --all-targets -- -D warningscleancargo fmt -- --checkcleanCloses #16
Closes #8
Generated by Claude Code
Summary by CodeRabbit
New Features
Chores