Fix: Allow loading models from directories for local checkpoints for MUSK and Hibou-L#160
Conversation
JiwaniZakir
left a comment
There was a problem hiding this comment.
The local_ckpts.json file commits two hardcoded local Windows paths (D:/trident_cache/...) for hibou_l and musk — these should be reset to empty strings before merging, as every user's cache location will differ and this will break for anyone cloning the repo on a non-Windows system or with a different cache directory.
The calls to ensure_valid_weights_path in _get_weights_path (lines 138 and 141) are commented out rather than removed or updated to handle directory paths. If the intent is that directories are now valid inputs, the method should be updated to accept both files and directories (e.g., using os.path.exists) rather than silently disabling the validation, which removes a useful guard for typos or misconfigured paths.
For the MUSK case in _build, passing local_dir=weights_path to load_model_and_may_interpolate assumes that function handles a bare directory path correctly — it's worth confirming that the directory contains the expected checkpoint file structure that hf_hub-style loading expects, since the directory name (models--xiangjx--musk) suggests it's a raw HuggingFace cache layout, which may differ from what timm's loader anticipates.
Hi there,
This PR addresses an issue that prevents loading certain models like MUSK and Hibou-Large from local checkpoints.
The Problem:
The current implementation of
BasePatchEncoder._get_weights_pathincludes a validation step (ensure_valid_weights_path) that strictly requires the weights path to be a single file. However, the specific load functions for MUSK and Hibou-Large require a path to a directory, not a single checkpoint file. This strict file check makes it impossible to load these models from a local path.The Solution:
To resolve this, I have commented out the call to
ensure_valid_weights_pathwithin_get_weights_path.This is a minimal change that makes the loading mechanism more flexible. It allows the function to return a path to a directory, which is the expected behavior for loading these types of models. The existing logic for handling single-file checkpoints remains unaffected.
This enhancement will make it much easier for users to work with a broader range of local models, especially in offline environments.
Thanks for your consideration!