Proposal
Add a TransformerBridge adapter for NemotronHForCausalLM (NVIDIA Nemotron-H), a hybrid that mixes Mamba-2 and attention layers.
Motivation
Nemotron-H keeps only a small fraction of attention layers (around 8%) and replaces the rest with Mamba-2. That makes the few attention layers a clean target for interpretability: researchers can ask what those layers do that the state-space layers cannot. The line has strong, ongoing NVIDIA releases and wide adoption, and it complements the existing Mamba and Mamba2 support.
We have a limited amount of support for Mamba layers, and working on this will open some new avenues to support possible work on those Mamba layers as well.
Gap scan (2026-06-18): 53 models, ~4.99M downloads, the highest-ranked hybrid state-space gap.
Pitch
Build on the existing Mamba2 components for the state-space layers and standard attention hooks for the interleaved attention layers. A tiny test checkpoint (trl-internal-testing/tiny-NemotronHForCausalLM-nano) keeps CI cheap.
- Claude Code users can scaffold with
/add-model-support nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16.
- Register at the four sites listed in contributing.md.
- Verify smallest-first:
trl-internal-testing/tiny-NemotronHForCausalLM-nano, then nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16.
Additional context
Checklist
Proposal
Add a TransformerBridge adapter for
NemotronHForCausalLM(NVIDIA Nemotron-H), a hybrid that mixes Mamba-2 and attention layers.Motivation
Nemotron-H keeps only a small fraction of attention layers (around 8%) and replaces the rest with Mamba-2. That makes the few attention layers a clean target for interpretability: researchers can ask what those layers do that the state-space layers cannot. The line has strong, ongoing NVIDIA releases and wide adoption, and it complements the existing Mamba and Mamba2 support.
We have a limited amount of support for Mamba layers, and working on this will open some new avenues to support possible work on those Mamba layers as well.
Gap scan (2026-06-18): 53 models, ~4.99M downloads, the highest-ranked hybrid state-space gap.
Pitch
Build on the existing Mamba2 components for the state-space layers and standard attention hooks for the interleaved attention layers. A tiny test checkpoint (
trl-internal-testing/tiny-NemotronHForCausalLM-nano) keeps CI cheap./add-model-support nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16.trl-internal-testing/tiny-NemotronHForCausalLM-nano, thennvidia/NVIDIA-Nemotron-3-Nano-4B-BF16.Additional context
hf_scraperarchitecture-gaps pass (2026-06-18).Checklist