[Feat] Support Z-Image-Omni#22
Open
Jayce-Ping wants to merge 12 commits into
Open
Conversation
Jayce-Ping
added a commit
to Jayce-Ping/Flow-Factory-Private
that referenced
this pull request
May 17, 2026
Move the two `huggingface_hub` imports added in the previous commit out of function bodies and up to the module's import block: - `from huggingface_hub import snapshot_download` in `src/flow_factory/utils/checkpoint.py` - `from huggingface_hub.errors import RepositoryNotFoundError, HfHubHTTPError` in `src/flow_factory/models/abc.py` `huggingface_hub` is already a hard dependency (`pyproject.toml:39`), so there is no import-cost concern; lazy-loading only hid the dependency surface from readers and `isort`. Codify the rule so this pattern is caught in review going forward: - Extend constraint X-GenGroup#22 ("Import Style") in `.agents/knowledge/constraints.md` with an explicit "Top-level imports only" bullet, listing the three sanctioned exceptions (optional deps via `try/except ImportError`, backend-gated imports under runtime feature checks like DeepSpeed/FSDP, and unresolvable circular imports). - Add a matching checklist item to the Code Style section of `.agents/skills/ff-review/SKILL.md`. Pre-existing inline FSDP/DeepSpeed imports in `models/abc.py` (lines ~997-1152) are grandfathered under the new rule's exception (b). Co-authored-by: Cursor <cursoragent@cursor.com>
Jayce-Ping
added a commit
to Jayce-Ping/Flow-Factory-Private
that referenced
this pull request
May 17, 2026
…olver
Unifies the two parallel Hugging Face Hub checkpoint-loading implementations
that landed independently on each branch.
Adopted approach (from feat/resume-from-huggingface):
- Pure helpers in utils/checkpoint.py:
parse_hf_checkpoint_path(path) -> (repo_id, subfolder, revision)
download_hf_checkpoint(repo_id, ...) -> local dir (allow_patterns for subdir)
- New orchestrator resolve_checkpoint_path(path, accelerator=None) combines:
* input validation (TypeError/ValueError on bad input),
* `hf://`-prefix override + local-wins-first ordering,
* subfolder support via parse_hf_checkpoint_path,
* per-node multi-node gating (is_local_main_process + wait_for_everyone),
* narrow re-raise of (RepositoryNotFoundError, HfHubHTTPError) as
FileNotFoundError with full context.
Removed (from flow-opd-ours 0ad2636):
- resolve_lora_dir + _HF_REPO_RE (regex parser without subfolder support).
- The redundant resolve_lora_dir() call at the top of BaseAdapter._load_lora;
BaseAdapter.load_checkpoint() now pre-resolves before dispatching, and
load_lora_as_named_parameters() resolves before calling _load_lora.
Rewired call sites:
- BaseAdapter.load_checkpoint -> resolve_checkpoint_path(path, accelerator=self.accelerator)
- utils/lora_loader.py::load_lora_as_named_parameters -> same
- Deleted BaseAdapter._resolve_checkpoint_path method (its logic moved into
the standalone resolve_checkpoint_path function so lora_loader can share it
without holding an adapter reference).
Net: single source of truth for HF resolution, broader scope (LoRA + full +
training-state, with subfolder support), OPD teacher_paths preserved, and
constraint X-GenGroup#22 (top-level imports) introduced by the merge applies cleanly.
Co-authored-by: Cursor <cursoragent@cursor.com>
5 tasks
Jayce-Ping
added a commit
to Jayce-Ping/Flow-Factory-Private
that referenced
this pull request
May 17, 2026
Three fixes from the Copilot review on the upstream PR: 1. Path-traversal validation (utils/checkpoint.py): parse_hf_checkpoint_path now rejects '.', '..', and backslash segments with an informative ValueError that preserves the original spec. Validates at the parser front door rather than at the download site so error messages keep the user's exact input. Without this, a spec like 'owner/repo/..' would escape the snapshot directory via os.path.join. 2. Un-gate _resolve_checkpoint_path (models/abc.py): Remove the `if is_local_main_process:` gate and the post-barrier double-call pattern. All ranks now call snapshot_download directly; huggingface_hub's per-blob WeakFileLock serializes concurrent calls within each filesystem domain (cross-node on POSIX-locking shared FS, per-node on non-shared FS), so we still get exactly one download per filesystem domain. This eliminates the distributed-deadlock hazard where a download failure on the gated rank would raise before reaching wait_for_everyone(), leaving siblings blocked at the barrier until NCCL watchdog timeout. The trailing wait_for_everyone() is kept to maintain lockstep entry into the downstream loaders. Residual asymmetric-failure risk (one rank's network blip while others succeed) is documented in the docstring. 3. Skill-checklist alignment (.agents/skills/ff-review/SKILL.md): Replace the duplicated import-exception list with a reference to constraint X-GenGroup#22, where the full set of three sanctioned exceptions (optional deps, backend-gated runtime feature checks, circular imports) lives. Prevents future drift between the two documents. Verified with 8 happy-path + 5 original-error + 6 path-traversal parser test cases (all pass). Co-authored-by: Cursor <cursoragent@cursor.com>
Jayce-Ping
added a commit
that referenced
this pull request
May 17, 2026
… paths (#160) * [adapter,hparams] feat: support resuming from Hugging Face checkpoint paths Extend `BaseAdapter.load_checkpoint` to transparently resolve a Hugging Face repo spec (either `hf://owner/repo[/subdir][@rev]` or bare `owner/repo[/...]`) to a local cache directory via `huggingface_hub.snapshot_download`, reusing the existing `lora` / `full` / `state` loading branches unchanged. Logic priority at `_resolve_checkpoint_path`: 1. `hf://` prefix -> force HF (overrides any colliding local dir) 2. Local path exists -> return as-is 3. Otherwise parse as `owner/repo[/subfolder][@revision]` and download Multi-node-safe: download gated on `is_local_main_process` (one per node), not `is_main_process` (one global). On non-shared filesystems each node populates its own HF cache once; on shared filesystems huggingface_hub's per-blob `WeakFileLock` dedupes the concurrent `snapshot_download` calls so only one node transfers bytes. Fail-fast: narrow `except (RepositoryNotFoundError, HfHubHTTPError)` re-raised as `FileNotFoundError` with full context (path, repo_id, subfolder, revision, HF_TOKEN hint) for actionable error messages. Also updates the `resume_path` help text and normalizes the inline comment on all 59 example YAMLs to document the new HF support. No new config fields; `resume_path: Optional[str]` keeps the same shape. Note: pre-existing black/isort lint debt in the touched src files is unrelated to this change. Co-authored-by: Cursor <cursoragent@cursor.com> * [adapter,docs] refactor: hoist HF imports to module top; codify rule Move the two `huggingface_hub` imports added in the previous commit out of function bodies and up to the module's import block: - `from huggingface_hub import snapshot_download` in `src/flow_factory/utils/checkpoint.py` - `from huggingface_hub.errors import RepositoryNotFoundError, HfHubHTTPError` in `src/flow_factory/models/abc.py` `huggingface_hub` is already a hard dependency (`pyproject.toml:39`), so there is no import-cost concern; lazy-loading only hid the dependency surface from readers and `isort`. Codify the rule so this pattern is caught in review going forward: - Extend constraint #22 ("Import Style") in `.agents/knowledge/constraints.md` with an explicit "Top-level imports only" bullet, listing the three sanctioned exceptions (optional deps via `try/except ImportError`, backend-gated imports under runtime feature checks like DeepSpeed/FSDP, and unresolvable circular imports). - Add a matching checklist item to the Code Style section of `.agents/skills/ff-review/SKILL.md`. Pre-existing inline FSDP/DeepSpeed imports in `models/abc.py` (lines ~997-1152) are grandfathered under the new rule's exception (b). Co-authored-by: Cursor <cursoragent@cursor.com> * [adapter,hparams] fix: address Copilot review on PR #160 Three fixes from the Copilot review on the upstream PR: 1. Path-traversal validation (utils/checkpoint.py): parse_hf_checkpoint_path now rejects '.', '..', and backslash segments with an informative ValueError that preserves the original spec. Validates at the parser front door rather than at the download site so error messages keep the user's exact input. Without this, a spec like 'owner/repo/..' would escape the snapshot directory via os.path.join. 2. Un-gate _resolve_checkpoint_path (models/abc.py): Remove the `if is_local_main_process:` gate and the post-barrier double-call pattern. All ranks now call snapshot_download directly; huggingface_hub's per-blob WeakFileLock serializes concurrent calls within each filesystem domain (cross-node on POSIX-locking shared FS, per-node on non-shared FS), so we still get exactly one download per filesystem domain. This eliminates the distributed-deadlock hazard where a download failure on the gated rank would raise before reaching wait_for_everyone(), leaving siblings blocked at the barrier until NCCL watchdog timeout. The trailing wait_for_everyone() is kept to maintain lockstep entry into the downstream loaders. Residual asymmetric-failure risk (one rank's network blip while others succeed) is documented in the docstring. 3. Skill-checklist alignment (.agents/skills/ff-review/SKILL.md): Replace the duplicated import-exception list with a reference to constraint #22, where the full set of three sanctioned exceptions (optional deps, backend-gated runtime feature checks, circular imports) lives. Prevents future drift between the two documents. Verified with 8 happy-path + 5 original-error + 6 path-traversal parser test cases (all pass). Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com>
Jayce-Ping
added a commit
to Jayce-Ping/Flow-Factory-Private
that referenced
this pull request
May 17, 2026
Adopt main's `_resolve_checkpoint_path` adapter method for HF Hub checkpoint resolution (drops the duplicate top-level `resolve_checkpoint_path` helper from this branch). Picks up main's path-traversal security check in `parse_hf_checkpoint_path`. Updates `ff-review` SKILL.md wording for constraint X-GenGroup#22 to main's more complete phrasing. Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.