diff --git a/docs/training.md b/docs/training.md
index bf9bf90..fe07f57 100644
--- a/docs/training.md
+++ b/docs/training.md
@@ -79,7 +79,7 @@ uvx hf@latest download Wan-AI/Wan2.2-TI2V-5B Wan2.2_VAE.pth \
Reasoner Alignment SFT with LLaVA-OneVision (vfm-vlm)
-Alignment SFT for the Reasoner variant on the [lmms-lab/LLaVA-OneVision-Data](https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-Data) dataset (streamed from HF Hub). Skips Step 2: the backbone is `Qwen/Qwen3-VL-8B-Instruct` (set by the parent experiment's `vlm_policy=qwen3_vl_8b_instruct` default) and is fetched from the HF Hub by the model downloader at startup — no DCP conversion needed and no env-var plumbing required.
+Alignment SFT for the Reasoner variant on the [lmms-lab/LLaVA-OneVision-Data](https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-Data) dataset (streamed from HF Hub). Skips Step 2: by default the backbone `Qwen/Qwen3-VL-8B-Instruct` is fetched from the HF Hub by the model downloader at startup — no DCP conversion needed and no required env vars. To instead start from a merged Cosmos3 reasoner snapshot (Cosmos3-Nano LM merged onto the Qwen3-VL visual tower), build it with `convert_model_to_vlm_safetensors` (see [Step 2](#step-2--prepare-checkpoint)) and point `VLM_SAFETENSORS_PATH` at it — same mechanism as the VideoPhy-2 recipe below.
Launch shell: `examples/launch_sft_llava_ov.sh`
@@ -91,6 +91,11 @@ Launch shell: `examples/launch_sft_llava_ov.sh`
# (optional) HF_TOKEN raises HF Hub rate limits for the streamed dataset
# revision lookup — useful if you're running 8-rank fan-out from a single IP:
# export HF_TOKEN=hf_...
+#
+# (optional) VLM_SAFETENSORS_PATH starts training from a local pre-converted
+# Qwen3-VL safetensors snapshot (e.g. Cosmos3-Nano LM merged with the Qwen3-VL
+# visual tower) instead of the public HF backbone:
+# export VLM_SAFETENSORS_PATH=$PWD/examples/checkpoints/Cosmos3-Nano-VLM
```
@@ -127,7 +132,7 @@ python -m cosmos_framework.scripts.convert_model_to_dcp \
`$BASE_CHECKPOINT_NAME` (e.g. `Cosmos3-Nano`, `Cosmos3-Super`) is a registered name in the checkpoint catalog; the converter downloads the matching repo from the Hugging Face Hub and writes the DCP into `examples/checkpoints/$BASE_CHECKPOINT_NAME`.
-**Reasoner Alignment SFT with LLaVA-OneVision (vfm-vlm):** Skip this step — the Reasoner alignment SFT loads `Qwen/Qwen3-VL-8B-Instruct` from the HF Hub at startup (no DCP conversion, no env vars).
+**Reasoner Alignment SFT with LLaVA-OneVision (vfm-vlm):** Skip this step — the Reasoner alignment SFT loads `Qwen/Qwen3-VL-8B-Instruct` from the HF Hub at startup (no DCP conversion required). To start from a merged Cosmos3 reasoner snapshot instead, build one with `convert_model_to_vlm_safetensors` (see the VideoPhy-2 note below) and pass it via `VLM_SAFETENSORS_PATH`.
**Reasoner Alignment SFT with VideoPhy-2 (Cosmos3-Nano):** Use `cosmos_framework.scripts.convert_model_to_vlm_safetensors` instead.
@@ -154,12 +159,12 @@ bash examples/launch_sft_vision_nano.sh
Each launcher's default paths come from the `DATASET_PATH` + `BASE_CHECKPOINT_PATH` defaults declared at the top of its `.sh` (each uses `: "${VAR:=…}"` so any value you `export` in the shell before launching wins over the default):
-| Launch shell | Post-Training Task | Default $DATASET_PATH (under examples/data/) | Default $BASE_CHECKPOINT_PATH (under examples/checkpoints/) |
-| ------------------------------ | ------------------ | ---------------------------------------------------------- | ----------------------------------------------------------- |
-| `launch_sft_vision_nano.sh` | Generator SFT | `BridgeData2-Subset-Synthetic-Captions/sft_dataset_bridge` | `Cosmos3-Nano` |
-| `launch_sft_vision_super.sh` | Generator SFT | `BridgeData2-Subset-Synthetic-Captions/sft_dataset_bridge` | `Cosmos3-Super` |
-| `launch_sft_llava_ov.sh` | Reasoner SFT | (none; dataset streams from HF Hub) | (none; backbone fetched at startup) |
-| `launch_sft_videophy2_nano.sh` | Reasoner SFT | (none; set `VIDEOPHYSICS_ROOT` env) | (none; set `VLM_SAFETENSORS_PATH` env) |
+| Launch shell | Post-Training Task | Default $DATASET_PATH (under examples/data/) | Default $BASE_CHECKPOINT_PATH (under examples/checkpoints/) |
+| ------------------------------ | ------------------ | ---------------------------------------------------------- | ------------------------------------------------------------------ |
+| `launch_sft_vision_nano.sh` | Generator SFT | `BridgeData2-Subset-Synthetic-Captions/sft_dataset_bridge` | `Cosmos3-Nano` |
+| `launch_sft_vision_super.sh` | Generator SFT | `BridgeData2-Subset-Synthetic-Captions/sft_dataset_bridge` | `Cosmos3-Super` |
+| `launch_sft_llava_ov.sh` | Reasoner SFT | (none; dataset streams from HF Hub) | (none; backbone fetched at startup, or set `VLM_SAFETENSORS_PATH`) |
+| `launch_sft_videophy2_nano.sh` | Reasoner SFT | (none; set `VIDEOPHYSICS_ROOT` env) | (none; set `VLM_SAFETENSORS_PATH` env) |
`WAN_VAE_PATH` defaults to `examples/checkpoints/wan22_vae/Wan2.2_VAE.pth` for every non-reasoner recipe.
diff --git a/examples/launch_sft_llava_ov.sh b/examples/launch_sft_llava_ov.sh
index 7027a58..cc56d42 100755
--- a/examples/launch_sft_llava_ov.sh
+++ b/examples/launch_sft_llava_ov.sh
@@ -10,16 +10,32 @@
# [job].task = "vlm" — picks cosmos_framework/configs/base/vlm/config.py as the base config.
#
# The dataset streams from the HuggingFace Hub, so DATASET_PATH /
-# WAN_VAE_PATH / BASE_CHECKPOINT_PATH are NOT required; only HF_TOKEN may
-# be needed for gated tokenizer downloads. Two model knobs that the
-# SFTExperimentConfig dataclass does not model live in TAIL_OVERRIDES:
+# WAN_VAE_PATH / BASE_CHECKPOINT_PATH are NOT required.
#
-# model.config.policy.backbone.model_name=
-# data_setting.max_tokens=
+# Optional env:
+# HF_TOKEN for gated Qwen3-VL-8B-Instruct downloads.
+# VLM_SAFETENSORS_PATH local directory of pre-converted Qwen3-VL safetensors
+# (e.g. a Cosmos3-Nano LM merged with Qwen3-VL visual via
+# `cosmos_framework.scripts.convert_model_to_vlm_safetensors`).
+# When set, plumbed to backbone.safetensors_path via a
+# tail override. When unset, the framework falls back
+# to the public Qwen/Qwen3-VL-8B-Instruct HF snapshot.
#
# Usage (8-GPU allocation, inside the training container, from the repo root):
# bash examples/launch_sft_llava_ov.sh
TOML_FILE="examples/toml/sft_config/llava_ov.toml"
+TAIL_OVERRIDES=(
+ ${EXTRA_TAIL_OVERRIDES:-}
+)
+
+# When VLM_SAFETENSORS_PATH is set, plumb it to backbone.safetensors_path so the
+# framework loads weights from the local snapshot (e.g. a Cosmos3-Nano LM merged
+# with Qwen3-VL visual via `cosmos_framework.scripts.convert_model_to_vlm_safetensors`)
+# while keeping the public HF model_name for tokenizer/architecture discovery.
+if [[ -n "${VLM_SAFETENSORS_PATH:-}" ]]; then
+ TAIL_OVERRIDES+=("model.config.policy.backbone.safetensors_path=$VLM_SAFETENSORS_PATH")
+fi
+
source "$(dirname "${BASH_SOURCE[0]}")/_sft_launcher_common.sh"