Conversation
Implement a new backend for downloading files from Hugging Face Hub repositories using the hf:// URL scheme. Features: - Support for models, datasets, and spaces repositories - URL parsing with revision/branch support (e.g., hf://owner/repo@v1.0) - Authentication via HF_TOKEN environment variable - Git LFS file support for large model files - Repository listing for recursive downloads Signed-off-by: pmady <pmady@users.noreply.github.com>
Register the hf:// scheme backend in load_builtin_backends() and update tests to include the new backend in expected backends list. Signed-off-by: pmady <pmady@users.noreply.github.com>
Add serde and serde_json workspace dependencies required for parsing Hugging Face API responses. Signed-off-by: pmady <pmady@users.noreply.github.com>
Add --hf-token argument for Hugging Face authentication and include usage examples in the CLI help documentation. Examples added: - Download single file: dfget hf://owner/repo/path -O /tmp/file - Download repository: dfget hf://owner/repo -O /tmp/repo/ -r - With authentication: dfget hf://... --hf-token=<token> Signed-off-by: pmady <pmady@users.noreply.github.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files
🚀 New features to boost your workflow:
|
Apply cargo fmt to fix formatting in huggingface.rs and lib.rs. Signed-off-by: pmady <pmady@users.noreply.github.com>
|
Hi maintainers, could you please add the |
|
@pmady Thanks for contributing this PR. Can you add document for https://d7y.io/docs/next/operations/integrations/hugging-face/ and https://d7y.io/docs/next/reference/commands/client/dfget/#download-with-different-protocols. d7y.io Repo: https://github.com/dragonflyoss/d7y.io |
There was a problem hiding this comment.
Pull request overview
Adds a new hf:// backend so dfdaemon/dfget can download from Hugging Face Hub repositories (models/datasets/spaces), including repo listing for recursive downloads, and introduces a CLI flag intended for HF authentication.
Changes:
- Add
huggingfacebackend implementation and register schemehfinBackendFactory. - Extend
dfgetCLI help/examples and add--hf-tokenargument. - Add
serde/serde_jsondeps for HF API response parsing.
Reviewed changes
Copilot reviewed 3 out of 5 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
dragonfly-client/src/bin/dfget/main.rs |
Adds HF usage examples and --hf-token CLI option. |
dragonfly-client-backend/src/lib.rs |
Registers the new hf backend and updates backend factory tests. |
dragonfly-client-backend/src/huggingface.rs |
Implements HF backend: URL parsing, stat/list/get/exists, plus unit tests. |
dragonfly-client-backend/Cargo.toml |
Adds serde dependencies needed by the new backend. |
Cargo.lock |
Locks new transitive deps from serde additions. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Update copyright year to 2026 - Remove environment variable fallback for HF_TOKEN, keep only CLI option - Implement ParsedHfUrl with TryFrom<Url> and TryFrom<&str> traits - Make ParsedHfUrl and RepoType public structs - Update tests to use TryFrom pattern Signed-off-by: pmady <pmady@users.noreply.github.com>
The HF backend was instantiated with HuggingFace::new() at startup, making the --hf-token CLI flag ineffective since the token was stored on the struct but never received from dfget. Changes: - dfget: inject --hf-token as Authorization header into request_header so it flows through gRPC to dfdaemon and into the backend - HF backend: remove stored token field, read auth from request http_header instead via build_headers() method - Remove new_with_token() constructor since it is no longer needed Signed-off-by: pmady <pmady@users.noreply.github.com>
- Fix URL parsing: remove redundant early-return branch, always require
owner/repo (two segments) after optional type prefix
- Fix list_files to return hf:// URLs instead of https:// so downstream
downloads continue using the HF backend (preserving auth and semantics)
- Use versioned DEFAULT_USER_AGENT matching the HTTP backend pattern
(concat!("dragonfly", "/", env!("CARGO_PKG_VERSION"))) and allow
user-supplied User-Agent to override it
- Fix dataset test to use proper owner/repo URL format
- Add comprehensive test coverage: dataset, space, explicit model type,
invalid scheme, missing repo, build_hf_url, build_headers behavior
Signed-off-by: pmady <pmady@users.noreply.github.com>
|
@gaius-qi I've created a documentation PR at dragonflyoss/d7y.io#386 that adds:
|
|
@pmady Thanks, I'll finish the review by this week. |
What does this PR do?
This PR adds support for downloading files from Hugging Face Hub repositories using the `hf://` URL scheme, addressing issue dragonflyoss/dragonfly#4419.
Features
Usage Examples
Changes
Related Issues
Closes dragonflyoss/dragonfly#4419
Checklist