Skip to content

Wire ProxyRotator into Fetcher + add local CI without GitHub Actions#17

Merged
Liohtml merged 4 commits into
masterfrom
claude/rustscrapling-proxy-rotation-nu5ZJ
May 28, 2026
Merged

Wire ProxyRotator into Fetcher + add local CI without GitHub Actions#17
Liohtml merged 4 commits into
masterfrom
claude/rustscrapling-proxy-rotation-nu5ZJ

Conversation

@Liohtml
Copy link
Copy Markdown
Owner

@Liohtml Liohtml commented May 27, 2026

Summary

  • Wire ProxyRotator into the Fetcher so configured proxy rotation actually takes effect. Previously the rotator was dead code and the client only applied a single static proxy at build time.
  • Apply the per-protocol proxies map (http/https/all) at client-build time, which was declared but never read.
  • Add a local CI gate (scripts/ci.sh + .githooks/pre-push) that mirrors the GitHub Actions workflow so the same checks run without GitHub Actions.
  • Format the whole repo with rustfmt so the fmt check passes.

Details

  • FetcherConfig.proxy_list + rotating_proxies() builder supply a rotation list. When non-empty, Fetcher::new builds one reqwest::Client per proxy and a ProxyRotator; next_client() selects round-robin per request attempt, so a failing proxy is swapped on retry.
  • protocol_proxy(scheme, url) builder + ProxyRotator::next_index() added.
  • scripts/ci.sh runs cargo fmt --check, cargo clippy -- -D warnings, cargo build, cargo test. Enable the hook once with git config core.hooksPath .githooks.

Test plan

  • cargo test — 184 tests passing (new tests cover next_index, protocol_proxy, rotating_proxies)
  • cargo clippy --all-targets -- -D warnings clean
  • cargo fmt -- --check clean
  • Verify rotation against live proxies (not exercised in CI)

Closes #16
Closes #8


Generated by Claude Code

Summary by CodeRabbit

  • New Features

    • Proxy rotation for HTTP requests with configurable rotating lists and per-protocol overrides.
  • Chores

    • Added a local CI runner and a pre-push hook to enforce checks before pushing.
    • Broad code formatting and module reorganization for improved maintainability.

Review Change Stack

claude added 3 commits May 27, 2026 06:25
The ProxyRotator was implemented but never used: the client only applied a
single static proxy at build time, so configured rotation had no effect.

- Add FetcherConfig.proxy_list plus rotating_proxies() builder for rotation
- Add protocol_proxy() builder and apply the per-protocol proxies map at
  client build time (closes #8)
- Build one reqwest client per rotation proxy and select round-robin per
  request attempt so a failing proxy is swapped on retry
- Add next_index() to ProxyRotator for indexing the client pool

Closes #16, closes #8

https://claude.ai/code/session_012RmdaovmNWZVAim4XxCWwn
…ub Actions)

scripts/ci.sh mirrors the GitHub Actions workflow (fmt, clippy, build, test)
so the same checks run locally or in any environment. .githooks/pre-push runs
it before every push; enable once with: git config core.hooksPath .githooks

https://claude.ai/code/session_012RmdaovmNWZVAim4XxCWwn
Format the entire repo with rustfmt so the fmt check (local CI gate and the
GitHub Actions fmt job) passes. No behavioural changes.

https://claude.ai/code/session_012RmdaovmNWZVAim4XxCWwn
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 27, 2026

📝 Walkthrough

Walkthrough

Adds rotating-proxy support (config + ProxyRotator index API + Fetcher multi-client pool + tests), a local pre-push hook and CI script, and widespread non-functional reformatting and module declaration reordering across the codebase.

Changes

Proxy Rotation & HTTP Client Refactoring

Layer / File(s) Summary
Proxy Configuration API
src/fetchers/config.rs
Adds proxy_list: Vec<String> to FetcherConfig; Default initializes it; FetcherConfigBuilder gains protocol_proxy(...) and rotating_proxies(...) builder methods.
ProxyRotator Index Extraction
src/fetchers/proxy.rs
Adds next_index() that advances the atomic cursor and computes modulo; next() delegates to next_index() to return the chosen proxy.
Fetcher Multi-Client Architecture
src/fetchers/client.rs
Fetcher now stores clients: Vec<reqwest::Client> and rotator: Option<ProxyRotator>. Fetcher::new builds one client per rotating-proxy entry when proxy_list is provided; build_client accepts a proxy override and applies per-scheme/wildcard precedence. Requests use next_client() for each attempt.
Proxy Configuration & Rotation Tests
tests/fetchers_config.rs
Adds tests covering ProxyRotator::next_index() round-robin behavior, protocol_proxy() and rotating_proxies() builders, and verifies default proxy_list is empty; reformats related assertions.

Module Organization & Code Formatting

Layer / File(s) Summary
Module Declaration Reordering
src/core/mod.rs, src/fetchers/mod.rs, src/lib.rs, src/parser/mod.rs, src/spiders/mod.rs
Reordered pub mod declarations and crate re-exports to group modules; exported symbols unchanged.
Core & Support Formatting
src/core/attributes_handler.rs, src/core/storage.rs, src/spiders/checkpoint.rs, src/spiders/cache.rs, ...
Expanded many compact single-line functions and struct initializers into multi-line blocks and simplified some error-map expressions; behavior preserved.
Spiders Formatting
src/spiders/request.rs, src/spiders/result.rs, src/spiders/robots.rs, src/spiders/scheduler.rs, src/spiders/session.rs, src/spiders/spider.rs
Made parsing, default impls, and helper calls multi-line and more explicit; no logic changes.
Parser & Constants Formatting
src/fetchers/constants.rs, src/parser/selector.rs, src/main.rs
Reformatted BLOCKED_RESOURCE_TYPES, Selector::attrib() mapping, and CLI match arms to multi-line form; semantics unchanged.
Test Assertion & Import Formatting
tests/*
Standardized multi-line formatting for many assertions, builder chains, and imports across test suites; semantics preserved; new proxy tests added.

Local CI Infrastructure

Layer / File(s) Summary
Local CI Hook & Script
.githooks/pre-push, scripts/ci.sh
Adds a Git pre-push hook that runs scripts/ci.sh. scripts/ci.sh runs cargo fmt -- --check, cargo clippy --all-targets -- -W clippy::all -D warnings, cargo build --verbose, and cargo test --verbose with set -euo pipefail.

🎯 3 (Moderate) | ⏱️ ~25 minutes

🐰 A proxy pool now stands tall,
Round-robin picks one for the call,
With builders so neat,
The config's complete—
No more dead code haunting the hall!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 64.71% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the two main changes: wiring ProxyRotator into Fetcher and adding local CI without GitHub Actions.
Linked Issues check ✅ Passed All requirements from #16 and #8 are met: ProxyRotator is wired into Fetcher with per-proxy clients and round-robin rotation, proxy_list configuration and rotating_proxies() builder are added, protocol_proxy() is implemented for per-scheme proxies, and local CI is added.
Out of Scope Changes check ✅ Passed Code changes are scoped to proxy rotation wiring, FetcherConfig proxy methods, and local CI setup. Remaining changes are code formatting and reorganization of imports/module declarations.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/rustscrapling-proxy-rotation-nu5ZJ

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@scripts/ci.sh`:
- Around line 11-12: The local precommit script runs cargo clippy with
--all-targets which diverges from GitHub Actions CI; update the clippy
invocation in scripts/ci.sh by removing the --all-targets flag so the command
becomes cargo clippy -- -W clippy::all -D warnings (or alternatively, if you
prefer stricter checks, add --all-targets to the CI job instead of changing the
script) — locate the echo/cargo clippy lines in scripts/ci.sh and replace the
command string and actual invocation to match the chosen approach.

In `@src/fetchers/client.rs`:
- Around line 70-84: The builder currently calls ClientBuilder::proxy(...) with
config.proxy (Proxy::all) before iterating config.proxies (a HashMap), causing
non-deterministic precedence; change the logic to register scheme-specific
proxies (use reqwest::Proxy::http and ::https and any other scheme-specific
entries) before adding any wildcard Proxy::all, and iterate config.proxies in a
deterministic order (e.g., collect keys and sort them or iterate the known
schemes in order "http","https" then any sorted remaining keys) so that
builder.proxy(...) calls are applied in the intended precedence; refer to the
builder variable, config.proxy, config.proxies, and
Proxy::all/Proxy::http/Proxy::https when implementing this reorder and
deterministic iteration.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 077d5482-4d74-48a8-8007-c7234113ced7

📥 Commits

Reviewing files that changed from the base of the PR and between c2ac209 and 9607ffa.

📒 Files selected for processing (34)
  • .githooks/pre-push
  • scripts/ci.sh
  • src/core/attributes_handler.rs
  • src/core/mod.rs
  • src/core/storage.rs
  • src/fetchers/client.rs
  • src/fetchers/config.rs
  • src/fetchers/constants.rs
  • src/fetchers/mod.rs
  • src/fetchers/proxy.rs
  • src/lib.rs
  • src/main.rs
  • src/parser/mod.rs
  • src/parser/selector.rs
  • src/spiders/cache.rs
  • src/spiders/checkpoint.rs
  • src/spiders/engine.rs
  • src/spiders/mod.rs
  • src/spiders/request.rs
  • src/spiders/result.rs
  • src/spiders/robots.rs
  • src/spiders/scheduler.rs
  • src/spiders/session.rs
  • src/spiders/spider.rs
  • tests/core_attributes_handler.rs
  • tests/core_storage.rs
  • tests/fetchers_client.rs
  • tests/fetchers_config.rs
  • tests/integration_test.rs
  • tests/parser_selector.rs
  • tests/parser_selector_generation.rs
  • tests/spiders_request.rs
  • tests/spiders_result.rs
  • tests/spiders_scheduler.rs

Comment thread scripts/ci.sh
Comment thread src/fetchers/client.rs Outdated
…ministic

- Collapse nested if into match guard in collect_text_recursive (clippy 1.95
  collapsible_match, which failed the GitHub Actions Clippy job)
- In build_client, apply scheme-specific proxies before any wildcard and
  iterate the proxies map in deterministic order so specific proxies are not
  shadowed by Proxy::all (addresses review feedback)
- Align CI clippy job with scripts/ci.sh by adding --all-targets

https://claude.ai/code/session_012RmdaovmNWZVAim4XxCWwn
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
.github/workflows/ci.yml (1)

42-51: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Restrict clippy job permissions to read-only.

The clippy job runs with default broad permissions but only needs to read the repository. Adding explicit minimal permissions follows the least-privilege security principle and reduces the attack surface.

🔒 Proposed fix to add read-only permissions
 clippy:
   name: Clippy
   runs-on: ubuntu-latest
+  permissions:
+    contents: read
   steps:
     - uses: actions/checkout@v4

As per static analysis, this addresses the excessive-permissions warning by explicitly scoping permissions.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/ci.yml around lines 42 - 51, The Clippy job ("name:
Clippy", job id clippy) currently runs with default permissions; add an explicit
minimal permissions block for the job (e.g., permissions: contents: read) so the
workflow only has read access to the repository when running the "Run clippy"
step (the cargo clippy command) to follow least-privilege principles.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In @.github/workflows/ci.yml:
- Around line 42-51: The Clippy job ("name: Clippy", job id clippy) currently
runs with default permissions; add an explicit minimal permissions block for the
job (e.g., permissions: contents: read) so the workflow only has read access to
the repository when running the "Run clippy" step (the cargo clippy command) to
follow least-privilege principles.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 4b03599f-63ff-4f28-b4e3-d857f84b4ca7

📥 Commits

Reviewing files that changed from the base of the PR and between 9607ffa and 7457483.

📒 Files selected for processing (3)
  • .github/workflows/ci.yml
  • src/fetchers/client.rs
  • src/parser/selector.rs
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/parser/selector.rs
  • src/fetchers/client.rs

@Liohtml Liohtml merged commit 805f55b into master May 28, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants