Skip to content

Migrate model/metadata loading from wandb to HuggingFace#54

Open
avantikalal wants to merge 6 commits into
mainfrom
huggingface
Open

Migrate model/metadata loading from wandb to HuggingFace#54
avantikalal wants to merge 6 commits into
mainfrom
huggingface

Conversation

@avantikalal
Copy link
Copy Markdown
Collaborator

Summary

  • Replaces broken grelu.resources.get_artifact() / wandb-based loading with hf_hub_download from HuggingFace Hub
  • Model weights: Genentech/decima-model, metadata: Genentech/decima-data
  • Preserves full wandb support for internal/private models in new decima.hub.wandb submodule (uses wandb.Api() directly, no gReLU dependency)
  • Drops upper bound on gReLU pin (>=1.0.10)

Changes

File Change
src/decima/constants.py Add HF_MODEL_REPO, HF_DATA_REPO, HF_METADATA_FILENAME
src/decima/hub/__init__.py Rewrite: HuggingFace primary backend, local paths preserved
src/decima/hub/wandb.py New: all wandb functions via wandb.Api() directly
src/decima/hub/download.py Update: hf_hub_download with local_dir
setup.cfg Add huggingface_hub dep, add wandb/hf pytest markers
tests/test_hub.py Update to HF; mark model/metadata tests @pytest.mark.hf
tests/conftest.py Remove login_wandb() call (HF is public)

Internal user migration

# Before
from decima.hub import load_decima_model

# After (for private/internal wandb models)
from decima.hub.wandb import load_decima_model
model = load_decima_model("private_model", host="https://internal.wandb.company.com")

Test plan

  • test_hf_constants — constants correct
  • 39 tests pass with pytest -m "not wandb and not hf and not long_running"
  • test_load_result passes (downloads metadata.h5ad from HF, ~3.3 GB)
  • @pytest.mark.hf tests (full model download) — require HF network access
  • @pytest.mark.wandb tests — require wandb credentials

🤖 Generated with Claude Code

avantikalal and others added 3 commits May 7, 2026 11:38
…t markers

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
- Add HF_MODEL_REPO, HF_DATA_REPO, HF_METADATA_FILENAME constants
- Rewrite hub/__init__.py to use hf_hub_download (fixes broken grelu.resources import)
- Add hub/wandb.py for internal/private model access via wandb.Api()
- Update hub/download.py to use hf_hub_download with local_dir
- Remove login_wandb() from conftest.py (HF is public, no auth needed)
- Update test_hub.py with hf/wandb markers

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
tox>=4.24 uses dataclass(slots=True) which requires Python 3.10+.
pipx run was downloading latest tox regardless of the pip-installed version.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@avantikalal avantikalal requested a review from MuhammedHasan May 7, 2026 21:05
Comment thread src/decima/hub/wandb.py
Comment thread src/decima/hub/wandb.py
Comment thread src/decima/hub/wandb.py
Comment thread src/decima/hub/wandb.py
Comment thread src/decima/hub/wandb.py
Comment thread src/decima/hub/__init__.py Outdated
Check if model path exists before loading the model.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR migrates Decima’s public model/metadata loading from the previously broken wandb/gReLU artifact mechanism to the HuggingFace Hub, while keeping wandb-based loading available for private/internal usage via a new decima.hub.wandb module.

Changes:

  • Switched decima.hub and decima.hub.download to use huggingface_hub.hf_hub_download for model weights and metadata.
  • Added HF Hub constants (HF_MODEL_REPO, HF_DATA_REPO, HF_METADATA_FILENAME) and the huggingface_hub dependency.
  • Updated tests and CI configuration to introduce hf/wandb markers (but currently leaves HF tests enabled by default).

Reviewed changes

Copilot reviewed 9 out of 10 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/decima/constants.py Adds HF Hub repo/filename constants used by the new loading path.
src/decima/hub/__init__.py Replaces wandb artifact loading with HF Hub downloads for model + metadata.
src/decima/hub/download.py Updates download/cache utilities to fetch weights/metadata from HF Hub.
src/decima/hub/wandb.py Introduces wandb-only loader utilities using wandb.Api() for private/internal usage.
tests/test_hub.py Moves hub tests to HF and marks download tests with @pytest.mark.hf.
tests/conftest.py Removes unconditional wandb login from test startup.
setup.cfg Adds huggingface_hub, adjusts gReLU pin, and defines pytest markers/default marker filter.
docs/tutorials/5-gene-expression-prediction.ipynb Updates tutorial outputs/content consistent with HF-based loading.
.github/workflows/run-tests.yml Tweaks CI tox install/run invocation (but currently still runs HF tests by default).
Comments suppressed due to low confidence (2)

tests/test_hub.py:28

  • These tests are marked hf but not long_running, and they download full model weights/metadata. If hf is ever included in default runs (or a user forgets to exclude it), this will be extremely slow and bandwidth-heavy. Consider additionally marking them long_running (so they are skipped unless explicitly enabled) or otherwise gating them behind an opt-in flag/env var.
@pytest.mark.hf
def test_load_decima_model():
    model_0 = load_decima_model()
    assert model_0 is not None
    assert isinstance(model_0, LightningModel)

    model_2 = load_decima_model(model=2)
    assert model_2 is not None


@pytest.mark.hf
def test_load_decima_metadata():
    metadata = load_decima_metadata()
    assert isinstance(metadata, anndata.AnnData)
    assert metadata.shape == (8856, 18457)

docs/tutorials/5-gene-expression-prediction.ipynb:2223

  • This tutorial cell shows a failing command (../tests/data/seqs.fasta not found). Either update the path to a file that exists in the repo (or add the referenced file), or replace the cell with an embedded example so readers can follow the tutorial without hitting a file-not-found error.
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "cat: ../tests/data/seqs.fasta: No such file or directory\n"
     ]
    }
   ],
   "source": [
    "! cat ../tests/data/seqs.fasta | cut -c 1-200"
   ]

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread setup.cfg Outdated
Comment thread src/decima/hub/download.py
Comment thread src/decima/hub/download.py Outdated
Comment thread src/decima/hub/__init__.py
Comment thread .github/workflows/run-tests.yml
- Exclude hf pytest marker by default to avoid CI network downloads
- Wrap hf_hub_download return values in Path() for consistent return types
- Raise ValueError in load_decima_metadata for unrecognized model names
- Remove unused metadata parameter from download_decima_metadata

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants