Migrate model/metadata loading from wandb to HuggingFace#54
Open
avantikalal wants to merge 6 commits into
Open
Migrate model/metadata loading from wandb to HuggingFace#54avantikalal wants to merge 6 commits into
avantikalal wants to merge 6 commits into
Conversation
…t markers Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
- Add HF_MODEL_REPO, HF_DATA_REPO, HF_METADATA_FILENAME constants - Rewrite hub/__init__.py to use hf_hub_download (fixes broken grelu.resources import) - Add hub/wandb.py for internal/private model access via wandb.Api() - Update hub/download.py to use hf_hub_download with local_dir - Remove login_wandb() from conftest.py (HF is public, no auth needed) - Update test_hub.py with hf/wandb markers Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
tox>=4.24 uses dataclass(slots=True) which requires Python 3.10+. pipx run was downloading latest tox regardless of the pip-installed version. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Check if model path exists before loading the model.
There was a problem hiding this comment.
Pull request overview
This PR migrates Decima’s public model/metadata loading from the previously broken wandb/gReLU artifact mechanism to the HuggingFace Hub, while keeping wandb-based loading available for private/internal usage via a new decima.hub.wandb module.
Changes:
- Switched
decima.hubanddecima.hub.downloadto usehuggingface_hub.hf_hub_downloadfor model weights and metadata. - Added HF Hub constants (
HF_MODEL_REPO,HF_DATA_REPO,HF_METADATA_FILENAME) and thehuggingface_hubdependency. - Updated tests and CI configuration to introduce
hf/wandbmarkers (but currently leaves HF tests enabled by default).
Reviewed changes
Copilot reviewed 9 out of 10 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
src/decima/constants.py |
Adds HF Hub repo/filename constants used by the new loading path. |
src/decima/hub/__init__.py |
Replaces wandb artifact loading with HF Hub downloads for model + metadata. |
src/decima/hub/download.py |
Updates download/cache utilities to fetch weights/metadata from HF Hub. |
src/decima/hub/wandb.py |
Introduces wandb-only loader utilities using wandb.Api() for private/internal usage. |
tests/test_hub.py |
Moves hub tests to HF and marks download tests with @pytest.mark.hf. |
tests/conftest.py |
Removes unconditional wandb login from test startup. |
setup.cfg |
Adds huggingface_hub, adjusts gReLU pin, and defines pytest markers/default marker filter. |
docs/tutorials/5-gene-expression-prediction.ipynb |
Updates tutorial outputs/content consistent with HF-based loading. |
.github/workflows/run-tests.yml |
Tweaks CI tox install/run invocation (but currently still runs HF tests by default). |
Comments suppressed due to low confidence (2)
tests/test_hub.py:28
- These tests are marked
hfbut notlong_running, and they download full model weights/metadata. Ifhfis ever included in default runs (or a user forgets to exclude it), this will be extremely slow and bandwidth-heavy. Consider additionally marking themlong_running(so they are skipped unless explicitly enabled) or otherwise gating them behind an opt-in flag/env var.
@pytest.mark.hf
def test_load_decima_model():
model_0 = load_decima_model()
assert model_0 is not None
assert isinstance(model_0, LightningModel)
model_2 = load_decima_model(model=2)
assert model_2 is not None
@pytest.mark.hf
def test_load_decima_metadata():
metadata = load_decima_metadata()
assert isinstance(metadata, anndata.AnnData)
assert metadata.shape == (8856, 18457)
docs/tutorials/5-gene-expression-prediction.ipynb:2223
- This tutorial cell shows a failing command (
../tests/data/seqs.fastanot found). Either update the path to a file that exists in the repo (or add the referenced file), or replace the cell with an embedded example so readers can follow the tutorial without hitting a file-not-found error.
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"cat: ../tests/data/seqs.fasta: No such file or directory\n"
]
}
],
"source": [
"! cat ../tests/data/seqs.fasta | cut -c 1-200"
]
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Exclude hf pytest marker by default to avoid CI network downloads - Wrap hf_hub_download return values in Path() for consistent return types - Raise ValueError in load_decima_metadata for unrecognized model names - Remove unused metadata parameter from download_decima_metadata Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
grelu.resources.get_artifact()/ wandb-based loading withhf_hub_downloadfrom HuggingFace HubGenentech/decima-model, metadata:Genentech/decima-datadecima.hub.wandbsubmodule (useswandb.Api()directly, no gReLU dependency)>=1.0.10)Changes
src/decima/constants.pyHF_MODEL_REPO,HF_DATA_REPO,HF_METADATA_FILENAMEsrc/decima/hub/__init__.pysrc/decima/hub/wandb.pywandb.Api()directlysrc/decima/hub/download.pyhf_hub_downloadwithlocal_dirsetup.cfghuggingface_hubdep, addwandb/hfpytest markerstests/test_hub.py@pytest.mark.hftests/conftest.pylogin_wandb()call (HF is public)Internal user migration
Test plan
test_hf_constants— constants correctpytest -m "not wandb and not hf and not long_running"test_load_resultpasses (downloadsmetadata.h5adfrom HF, ~3.3 GB)@pytest.mark.hftests (full model download) — require HF network access@pytest.mark.wandbtests — require wandb credentials🤖 Generated with Claude Code