feat: add pluggable fetcher system for URL-specific handling#9
Merged
Conversation
Introduce fetcher architecture enabling specialized content fetching
based on URL patterns. The system is designed to scale to hundreds
of fetchers.
Changes:
- Add Fetcher trait with name(), matches(), and fetch() methods
- Add FetcherRegistry for dispatching URLs to appropriate fetchers
- Implement DefaultFetcher (moved existing HTTP fetch logic)
- Implement GitHubRepoFetcher for github.com/{owner}/{repo} URLs
- Add specs/fetchers.md with system specification
- Refactor client.rs to delegate to FetcherRegistry
- Add FetcherError variant for fetcher-specific errors
The GitHub repo fetcher returns repository metadata and README content
in a markdown format optimized for LLM consumption.
- Add examples/fetch_urls.rs with test cases for various URLs - Test cases include HTML, JSON, GitHub repos, raw files - Enable markdown/text conversion by default in fetch() - Add examples job to CI (continue-on-error for network deps)
- Update specs/fetchers.md with complete API documentation - Add section on how to create new fetchers - Add integration tests for FetcherRegistry - Test URL validation, allow/block lists, conversion options
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Introduces a pluggable fetcher architecture that enables specialized content fetching based on URL patterns. The system is designed to scale to hundreds of fetchers.
Why
Different URL types require different handling strategies. For example, GitHub repository URLs should return structured metadata + README content, not raw HTML. This architecture enables building specialized fetchers for various content sources (GitHub, npm, documentation sites, etc.) while maintaining a clean API.
How
name(),matches(url), andfetch(request, options)methodsgithub.com/{owner}/{repo}URLs, returns repo metadata + READMEChanges
crates/fetchkit/src/fetchers/module with trait, registry, and built-in fetchersclient.rsto delegate to FetcherRegistryFetchError::FetcherErrorvariant for fetcher-specific errorsspecs/fetchers.mdspecificationexamples/fetch_urls.rsfor testing different URL typesjsonfeature for reqwest (GitHub API)Risk
fetch()API unchangedChecklist