Skip to content

feat: add pluggable fetcher system for URL-specific handling#9

Merged
chaliy merged 5 commits intomainfrom
claude/add-fetcher-system-TAVbE
Jan 17, 2026
Merged

feat: add pluggable fetcher system for URL-specific handling#9
chaliy merged 5 commits intomainfrom
claude/add-fetcher-system-TAVbE

Conversation

@chaliy
Copy link
Contributor

@chaliy chaliy commented Jan 17, 2026

What

Introduces a pluggable fetcher architecture that enables specialized content fetching based on URL patterns. The system is designed to scale to hundreds of fetchers.

Why

Different URL types require different handling strategies. For example, GitHub repository URLs should return structured metadata + README content, not raw HTML. This architecture enables building specialized fetchers for various content sources (GitHub, npm, documentation sites, etc.) while maintaining a clean API.

How

  • Fetcher trait: Defines name(), matches(url), and fetch(request, options) methods
  • FetcherRegistry: Dispatches URLs to the first matching fetcher in priority order
  • DefaultFetcher: Handles all HTTP/HTTPS URLs with HTML conversion (existing behavior)
  • GitHubRepoFetcher: Handles github.com/{owner}/{repo} URLs, returns repo metadata + README

Changes

  • Add crates/fetchkit/src/fetchers/ module with trait, registry, and built-in fetchers
  • Refactor client.rs to delegate to FetcherRegistry
  • Add FetchError::FetcherError variant for fetcher-specific errors
  • Add specs/fetchers.md specification
  • Add examples/fetch_urls.rs for testing different URL types
  • Add integration tests for fetcher system
  • Enable json feature for reqwest (GitHub API)

Risk

  • Low
  • Changes are additive; existing fetch() API unchanged
  • All 73 tests pass

Checklist

  • Unit tests are passed
  • Integration tests added
  • Example-based tests added
  • Documentation updated (specs/fetchers.md)
  • Specs are up to date

Introduce fetcher architecture enabling specialized content fetching
based on URL patterns. The system is designed to scale to hundreds
of fetchers.

Changes:
- Add Fetcher trait with name(), matches(), and fetch() methods
- Add FetcherRegistry for dispatching URLs to appropriate fetchers
- Implement DefaultFetcher (moved existing HTTP fetch logic)
- Implement GitHubRepoFetcher for github.com/{owner}/{repo} URLs
- Add specs/fetchers.md with system specification
- Refactor client.rs to delegate to FetcherRegistry
- Add FetcherError variant for fetcher-specific errors

The GitHub repo fetcher returns repository metadata and README content
in a markdown format optimized for LLM consumption.
- Add examples/fetch_urls.rs with test cases for various URLs
- Test cases include HTML, JSON, GitHub repos, raw files
- Enable markdown/text conversion by default in fetch()
- Add examples job to CI (continue-on-error for network deps)
- Update specs/fetchers.md with complete API documentation
- Add section on how to create new fetchers
- Add integration tests for FetcherRegistry
- Test URL validation, allow/block lists, conversion options
@chatgpt-codex-connector
Copy link

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

@chaliy chaliy merged commit d61cebc into main Jan 17, 2026
17 checks passed
@chaliy chaliy deleted the claude/add-fetcher-system-TAVbE branch January 17, 2026 06:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants