Stream HF→OLMo state conversion to lower load_hf_model peak memory by finbarrtimbers · Pull Request #661 · allenai/OLMo-core

finbarrtimbers · 2026-04-27T15:40:12Z

Summary

load_hf_model previously instantiated the full HF model twice (once on rank 0 to warm the cache, then again on every rank) just to extract its state_dict. For a 32B bf16 model that's ~64GB resident per rank during conversion. This PR drops the model materialization entirely: it reads AutoConfig, then streams tensors directly from the on-disk safetensors files (sharded or single-file) via safe_open.
Conversion is now streaming end-to-end. StateConverter.iter_convert(...) yields (dest_key, tensor) pairs and frees each mapping's source/intermediate tensors before moving on; convert(...) is a thin dict(self.iter_convert(...)) wrapper. A new iter_convert_state_from_hf(...) plumbs the same pattern through the HF-side converter (with the gemma3 +1.0 norm transform applied per-key inline). load_hf_model consumes it directly so each tensor is redistributed into its target DTensor and the source HF tensor is freed before the next read.
Peak conversion memory drops from ~full-model (HF state dict + model object + converted state dict) to roughly one mapping's source tensors + the chunks being yielded — order of hundreds of MB at peak instead of tens of GB for large models.
Pin huggingface-hub<1.0 in pyproject.toml to keep transformers happy (4.57.x requires <1.0); without this, uv run pytest resolved to huggingface-hub 1.12 and broke from transformers import ... on import.

Test plan

uv run pytest src/test/nn/hf/convert_test.py src/test/nn/conversion/ — 34/34 pass, including a new test_iter_convert_state_from_hf_matches_convert_state_from_hf covering embeddings, lm_head, attention QKV/O, MLP, layernorms, and q/k norms.
make style-check / make lint-check clean.
Smoke-test loading an HF checkpoint end-to-end via load_hf_model on a real model (recommend reviewer or follow-up CI run, since local env can't pull large checkpoints).

🤖 Generated with Claude Code

…-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7f595db9c8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-27T15:44:22Z

+    assert file_exists(f"{prefix}/model.safetensors.index.json") or file_exists(
+        f"{prefix}/model.safetensors"
+    )


Keep pytorch_model.bin support in HF checkpoint loading

load_hf_model now enforces safetensors-only inputs, so any HF checkpoint that only contains pytorch_model.bin (common in older/internal repos) will fail immediately instead of loading. This is a behavioral regression from the previous implementation, which could load .bin checkpoints through AutoModelForCausalLM.from_pretrained, and it will break existing conversion workflows unless users manually re-export models to safetensors first.

Useful? React with 👍 / 👎.

Stream HF→OLMo state conversion to lower load_hf_model peak memory Co…

7f595db

…-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

chatgpt-codex-connector Bot reviewed Apr 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stream HF→OLMo state conversion to lower load_hf_model peak memory#661

Stream HF→OLMo state conversion to lower load_hf_model peak memory#661
finbarrtimbers wants to merge 1 commit into
mainfrom
streaming-hf-conversion

finbarrtimbers commented Apr 27, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

finbarrtimbers commented Apr 27, 2026

Summary

Test plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant