Skip to content

Stream HF→OLMo state conversion to lower load_hf_model peak memory#661

Open
finbarrtimbers wants to merge 1 commit into
mainfrom
streaming-hf-conversion
Open

Stream HF→OLMo state conversion to lower load_hf_model peak memory#661
finbarrtimbers wants to merge 1 commit into
mainfrom
streaming-hf-conversion

Conversation

@finbarrtimbers
Copy link
Copy Markdown
Collaborator

Summary

  • load_hf_model previously instantiated the full HF model twice (once on rank 0 to warm the cache, then again on every rank) just to extract its state_dict. For a 32B bf16 model that's ~64GB resident per rank during conversion. This PR drops the model materialization entirely: it reads AutoConfig, then streams tensors directly from the on-disk safetensors files (sharded or single-file) via safe_open.
  • Conversion is now streaming end-to-end. StateConverter.iter_convert(...) yields (dest_key, tensor) pairs and frees each mapping's source/intermediate tensors before moving on; convert(...) is a thin dict(self.iter_convert(...)) wrapper. A new iter_convert_state_from_hf(...) plumbs the same pattern through the HF-side converter (with the gemma3 +1.0 norm transform applied per-key inline). load_hf_model consumes it directly so each tensor is redistributed into its target DTensor and the source HF tensor is freed before the next read.
  • Peak conversion memory drops from ~full-model (HF state dict + model object + converted state dict) to roughly one mapping's source tensors + the chunks being yielded — order of hundreds of MB at peak instead of tens of GB for large models.
  • Pin huggingface-hub<1.0 in pyproject.toml to keep transformers happy (4.57.x requires <1.0); without this, uv run pytest resolved to huggingface-hub 1.12 and broke from transformers import ... on import.

Test plan

  • uv run pytest src/test/nn/hf/convert_test.py src/test/nn/conversion/ — 34/34 pass, including a new test_iter_convert_state_from_hf_matches_convert_state_from_hf covering embeddings, lm_head, attention QKV/O, MLP, layernorms, and q/k norms.
  • make style-check / make lint-check clean.
  • Smoke-test loading an HF checkpoint end-to-end via load_hf_model on a real model (recommend reviewer or follow-up CI run, since local env can't pull large checkpoints).

🤖 Generated with Claude Code

…-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7f595db9c8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +102 to +104
assert file_exists(f"{prefix}/model.safetensors.index.json") or file_exists(
f"{prefix}/model.safetensors"
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep pytorch_model.bin support in HF checkpoint loading

load_hf_model now enforces safetensors-only inputs, so any HF checkpoint that only contains pytorch_model.bin (common in older/internal repos) will fail immediately instead of loading. This is a behavioral regression from the previous implementation, which could load .bin checkpoints through AutoModelForCausalLM.from_pretrained, and it will break existing conversion workflows unless users manually re-export models to safetensors first.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant