diff --git a/skills/Megatron-Bridge/mbridge-recipe-recommender/BENCHMARK.md b/skills/Megatron-Bridge/mbridge-recipe-recommender/BENCHMARK.md deleted file mode 100644 index 4d4cb85d..00000000 --- a/skills/Megatron-Bridge/mbridge-recipe-recommender/BENCHMARK.md +++ /dev/null @@ -1,65 +0,0 @@ -# Evaluation Report - -Evaluation of the `mbridge-recipe-recommender` skill before publication through NVSkills-Eval. - -This benchmark summarizes 3-Tier Evaluation from NVSkills-Eval results for the skill. The goal is to document whether the skill is safe, discoverable, effective, and useful for agents before it is published for broader workflow use. - -## Evaluation Summary - -- Skill: `mbridge-recipe-recommender` -- Evaluation date: 2026-05-28 -- NVSkills-Eval profile: `external` -- Overall verdict: FAIL -- Tier 3 live agent evaluation: not available in this report - -## Agents Used - -- Tier 3 agent details were not available in this report. - -## Metrics Used - -Reported benchmark dimensions: - -- Security: checks whether skill-assisted execution avoids unsafe behavior such as secret leakage, destructive commands, or unauthorized access. -- Correctness: checks whether the agent follows the expected workflow and produces the correct final output. -- Discoverability: checks whether the agent loads the skill when relevant and avoids using it when irrelevant. -- Effectiveness: checks whether the agent performs measurably better with the skill than without it. -- Efficiency: checks whether the agent uses fewer tokens and avoids redundant work. - -Underlying evaluation signals used in this run: - -- No Tier 3 evaluation signal details were available in this report. - -## Test Tasks - -Tier 3 evaluation task details were not available in this report. - -## Results - -Tier 3 dimension rollup was not available in this report. - -## Tier 1: Static Validation Summary - -Tier 1 validation reported findings. NVSkills-Eval ran 9 checks and found 14 total findings. - -Top findings: - -- MEDIUM QUALITY/quality_correctness: SKILL_SPEC recommended field missing: 'metadata.author' (`skills/mbridge-recipe-recommender/SKILL.md`) -- MEDIUM QUALITY/quality_correctness: SKILL_SPEC recommended field missing: 'metadata.tags' (`skills/mbridge-recipe-recommender/SKILL.md`) -- MEDIUM SCHEMA/body_recommended_section: Missing recommended section: '## Instructions' (`skills/mbridge-recipe-recommender/SKILL.md`) -- MEDIUM SCHEMA/body_recommended_section: Missing recommended section: '## Examples' (`skills/mbridge-recipe-recommender/SKILL.md`) -- MEDIUM SCHEMA/author_missing: Author not specified in metadata (`skills/mbridge-recipe-recommender/SKILL.md`) - -## Tier 2: Deduplication Summary - -Tier 2 validation reported findings. NVSkills-Eval ran 2 checks and found 1 total findings. - -Top findings: - -- HIGH DUPLICATE/duplicate: Duplicate content found within SKILL.md: - "### Performance recipes (throughput benchmarks)" in SKILL.md (lines 39-55) - vs "## Performance Recipe Index" in SKILL.md (lines 207-216) (`SKILL.md:39`) - -## Publication Recommendation - -The skill should be reviewed before NVSkills-Eval publication. Skill owners should address the findings above and rerun NVSkills-Eval to refresh this benchmark. diff --git a/skills/Megatron-Bridge/mbridge-recipe-recommender/SKILL.md b/skills/Megatron-Bridge/mbridge-recipe-recommender/SKILL.md deleted file mode 100644 index 3a4f1cb0..00000000 --- a/skills/Megatron-Bridge/mbridge-recipe-recommender/SKILL.md +++ /dev/null @@ -1,417 +0,0 @@ ---- -name: mbridge-recipe-recommender -license: Apache-2.0 -description: Recommend and customize Megatron Bridge recipes for a user's model, GPU count, and training goal. Indexes library recipes (pretrain/SFT/PEFT) and performance recipes. -when_to_use: User wants a starting recipe or training config; 'which recipe', 'recommend recipe', 'how to train Llama', 'starting config for X GPUs', 'what recipe for SFT'. ---- - -# Auto Recipe — Recipe Index & Recommendation - -This skill indexes every shipped recipe and helps users pick the right starting -config, adjust parallelism, and avoid common pitfalls. - -## How to Use This Skill - -1. Ask the user for: **model name/size**, **GPU count & type**, **training goal** - (pretrain / SFT / PEFT), and **sequence length** (if non-default). -2. Look up the best-match recipe in the index below. -3. Recommend the recipe function name + entry-point command. -4. Provide adjustment advice (parallelism resizing, batch tuning, pitfalls). - ---- - -## Entry Points - -### Library recipes (functional training) - -```bash -# Pretrain with mock data -uv run python -m torch.distributed.run --nproc_per_node=8 scripts/training/run_recipe.py \ - --recipe \ - --dataset llm-pretrain-mock - -# SFT with SQuAD -uv run python -m torch.distributed.run --nproc_per_node=8 scripts/training/run_recipe.py \ - --recipe \ - --dataset llm-finetune - -# Override any field via CLI -uv run python -m torch.distributed.run --nproc_per_node=8 scripts/training/run_recipe.py \ - --recipe llama3_8b_pretrain_config \ - --dataset llm-pretrain-mock \ - 'model.tensor_model_parallel_size=2' \ - 'training.global_batch_size=64' -``` - -### Performance recipes (throughput benchmarks) - -```bash -python scripts/performance/run_script.py \ - --recipe \ - --gpu_type h100 \ - --num_gpus 64 \ - --data mock -``` - -> **Perf recipes are NOT fully validated for correctness.** Most conversations -> and testing were on mock data. They are designed for **upper-bound throughput -> measurement**, not production training. Always validate loss curves and -> convergence independently. - ---- - -## Recipe Unification (Coming Soon — PR #2803) - -PR [#2803](https://github.com/NVIDIA-NeMo/Megatron-Bridge/pull/2803) is -unifying performance recipes into the same **Python function** format used by -library recipes. Key changes: - -- Perf recipes move from `scripts/performance/configs/` → `src/megatron/bridge/recipes//_perf.py` -- Each perf recipe becomes a **self-contained Python function** (e.g. `llama3_8b_h100_bf16_pretrain_config()`) -- The old `WorkloadBaseConfig` → `set_workload_base_configs` → `get_perf_optimized_recipe` pipeline is removed -- Shared helpers: `_benchmark_common()` (50 iters, timing, TE RNG), `_perf_precision()` (bf16 / fp8_cs / fp8_mx / nvfp4) - -**Why Python, not YAML?** Previous YAML-based approaches had problems: -recipe logic was split across multiple indirection layers, configs were not -self-contained, and the two-level pipeline made maintenance and debugging -difficult. Python functions are explicit, greppable, and composable. - -After #2803 lands, both library and perf recipes will be invocable through the -same `run_recipe.py` entry point. - ---- - -## Library Recipe Index - -All recipes live under `src/megatron/bridge/recipes/`. Each function returns a -`ConfigContainer` with model, training, optimizer, and data settings. - -### Llama - -| Recipe | Mode | TP | PP | CP | SP | GPUs (min) | Seq Len | -|--------|------|----|----|----|----|------------|---------| -| `llama2_7b_pretrain_config` | Pretrain | 2 | 1 | — | — | 2 | 4K | -| `llama3_8b_pretrain_config` | Pretrain | 2 | 1 | — | ✓ | 2 | 8K | -| `llama3_8b_16k_pretrain_config` | Pretrain | 2 | 1 | 2 | ✓ | 4 | 16K | -| `llama3_8b_64k_pretrain_config` | Pretrain | 2 | 1 | 4 | ✓ | 8 | 64K | -| `llama3_8b_128k_pretrain_config` | Pretrain | 2 | 1 | 8 | ✓ | 16 | 128K | -| `llama3_70b_pretrain_config` | Pretrain | 8 | 4 | — | ✓ | 32 | 8K | -| `llama3_70b_16k_pretrain_config` | Pretrain | 8 | 4 | 2 | ✓ | 64 | 16K | -| `llama3_70b_64k_pretrain_config` | Pretrain | 8 | 4 | 4 | ✓ | 128 | 64K | -| `llama31_405b_pretrain_config` | Pretrain | 8 | 16 | — | ✓ | 128 | 8K | -| `llama3_8b_sft_config` | SFT | 2 | 1 | — | ✓ | 2 | 8K | -| `llama3_70b_sft_config` | SFT | 4 | 4 | — | ✓ | 16 | 8K | -| `llama31_405b_sft_config` | SFT | 8 | 8 | — | ✓ | 64 | 8K | -| `llama3_8b_peft_config` | PEFT | 1 | 1 | — | — | 1 | 8K | -| `llama3_70b_peft_config` | PEFT | 2 | 4 | — | ✓ | 8 | 8K | -| `llama31_405b_peft_config` | PEFT | 4 | 8 | — | ✓ | 32 | 8K | - -### Qwen2 / Qwen2.5 - -| Recipe | Mode | TP | PP | Sizes | -|--------|------|----|----|-------| -| `qwen2_*_{pretrain,sft,peft}_config` | All | 1–8 | 1–4 | 500M, 1.5B, 7B, 14B, 32B, 72B | -| `qwen25_*_{pretrain,sft,peft}_config` | All | 1–8 | 1–4 | 500M, 1.5B, 3B, 7B, 14B, 32B, 72B | - -### Qwen3 (Dense) - -| Recipe | Mode | TP | PP | CP | Sizes | -|--------|------|----|----|-----|-------| -| `qwen3_*_pretrain_config` | Pretrain | 1–8 | 1–2 | — | 600M–32B | -| `qwen3_*_sft_config` | SFT | 1–8 | 1–2 | — | 600M–32B | -| `qwen3_600m_sft_128k_config` | SFT | 1 | 1 | 8 | 600M (128K seq) | -| `qwen3_*_peft_config` | PEFT | 1 | 1 | — | 600M–32B | - -### Qwen3 MoE - -| Recipe | Mode | TP | PP | EP | CP | GPUs | -|--------|------|----|----|----|----|------| -| `qwen3_30b_a3b_pretrain_config` | Pretrain | 1 | 1 | 8 | — | 8 | -| `qwen3_30b_a3b_sft_config` | SFT | 1 | 1 | 8 | — | 8 | -| `qwen3_30b_a3b_peft_config` | PEFT | 1 | 1 | 1 | — | 1 | -| `qwen3_235b_a22b_pretrain_config` | Pretrain | 4 | 16 | 8 | 2 | 512+ | -| `qwen3_235b_a22b_sft_config` | SFT | 4 | 8 | 8 | — | 256 | -| `qwen3_235b_a22b_peft_config` | PEFT | 1 | 4 | 4 | — | 16 | - -### Qwen3-Next - -| Recipe | Mode | TP | PP | EP | -|--------|------|----|----|-----| -| `qwen3_next_80b_a3b_pretrain_config` | Pretrain | 1 | 4 | 8 | -| `qwen3_next_80b_a3b_sft_config` | SFT | 1 | 2 | 8 | -| `qwen3_next_80b_a3b_peft_config` | PEFT | 1 | 1 | 4 | - -### DeepSeek - -| Recipe | Mode | TP | PP | EP | GPUs | -|--------|------|----|----|-----|------| -| `deepseek_v2_lite_pretrain_config` | Pretrain | 1 | 1 | 8 | 8 | -| `deepseek_v2_pretrain_config` | Pretrain | 1 | 4 | 32 | 128 | -| `deepseek_v3_pretrain_config` | Pretrain | 2 | 16 | 64 | 2048 | -| `deepseek_v3_pretrain_config_32nodes` | Pretrain | 2 | 8 | 32 | 256 | - -### GLM-4.5 - -| Recipe | Mode | TP | PP | EP | GPUs | -|--------|------|----|----|-----|------| -| `glm45_355b_pretrain_config` | Pretrain | 2 | 8 | 16 | 256 | -| `glm45_air_106b_pretrain_config` | Pretrain | 1 | 4 | 8 | 32 | -| `glm45_355b_sft_config` | SFT | 2 | 8 | 16 | 256 | -| `glm45_air_106b_sft_config` | SFT | 1 | 4 | 8 | 32 | -| `glm45_355b_peft_config` | PEFT | 2 | 4 | 4 | 32 | -| `glm45_air_106b_peft_config` | PEFT | 1 | 2 | 4 | 8 | - -### Gemma - -| Recipe | Mode | TP | PP | Sizes | -|--------|------|----|----|-------| -| `gemma2_*_{pretrain,sft,peft}_config` | All | 2–8 | 1–2 | 2B, 9B, 27B | -| `gemma3_1b_{pretrain,sft,peft}_config` | All | 1 | 1 | 1B (32K seq) | - -### NemotronH / Nemotron - -| Recipe | Mode | TP | PP | EP | Notes | -|--------|------|----|----|-----|-------| -| `nemotronh_{4b,8b,47b,56b}_*_config` | P/S/PEFT | 1–8 | 1–4 | — | Dense SSM-hybrid | -| `nemotron_3_nano_*_config` | P/S/PEFT | varies | 1 | 8 | MoE + Mamba | -| `nemotron_3_super_*_config` | P/S/PEFT | 4 | 1 | 8 | MoE + Mamba, ~40% CUDA graph gain | -| `nemotron_nano_{9b,12b}_v2_*_config` | P/S/PEFT | varies | 1 | — | Dense | - -### Other Models - -| Recipe | Mode | Notes | -|--------|------|-------| -| `moonlight_16b_{pretrain,sft,peft}_config` | All | MoE EP=8 | -| `olmoe_7b_{pretrain,sft,peft}_config` | All | MoE EP=8 | -| `ministral3_{3b,8b,14b}_{sft,peft}_config` | SFT/PEFT | Dense | -| `gpt_oss_20b_*_config` | All | MoE + FP8/MXFP8 variants | -| `gpt_oss_120b_*_config` | All | MoE | -| `vanilla_gpt_pretrain_config` | Pretrain | MLM/Bridge parity baseline | -| `gpt3_175b_pretrain_config` | Pretrain | TP=4, PP=8, VP=6 | -| `kimi_k2_pretrain_config` | Pretrain | 1T MoE, TP=2 PP=16 EP=32 | - -### VLM Recipes - -| Recipe | Mode | TP | PP | EP | GPUs | -|--------|------|----|----|-----|------| -| `gemma3_vl_{4b,12b,27b}_{sft,peft}_config` | SFT/PEFT | 1–8 | 1–2 | — | 1–16 | -| `qwen25_vl_{3b,7b,32b,72b}_{sft,peft}_config` | SFT/PEFT | 1–8 | 1–4 | — | 1–32 | -| `qwen3_vl_{8b,30b_a3b,235b_a22b}_{sft,peft}_config` | SFT/PEFT | 1–4 | 1–8 | 1–32 | 1–512 | -| `qwen35_vl_*_{sft,peft}_config` | SFT/PEFT | varies | varies | varies | varies | -| `glm_45v_{sft,peft}_config` | SFT/PEFT | 1 | 8 | 4–16 | 64–512 | -| `nemotron_nano_v2_vl_12b_{sft,peft}_config` | SFT/PEFT | 2–4 | 1 | — | 8 | - -### Diffusion Recipes - -| Recipe | Mode | TP | CP | -|--------|------|----|----| -| `wan_1_3B_{pretrain,sft}_config` | P/SFT | 1 | 8 | -| `wan_14B_{pretrain,sft}_config` | P/SFT | 2 | 4 | -| `flux_12b_{pretrain,sft}_config` | P/SFT | 2 | 1 | - ---- - -## Performance Recipe Index - -All perf recipes live under `scripts/performance/`. They are invoked via -`run_script.py` and use `WorkloadBaseConfig` presets per GPU type. - -> **Important:** Perf recipes are designed for **upper-bound throughput -> benchmarks**, not production training. They run **50 iterations** on **mock -> data** by default. Throughput numbers are aspirational targets, not validated -> convergence configs. - -### Llama 3 / 3.1 - -| Model | GPUs | GPU Types | Key Features | -|-------|------|-----------|--------------| -| Llama 3 8B | 8 | H100, B200, B300, GB200, GB300, R100 | CUDA graphs (local), FSDP on GB variants | -| Llama 3 70B | 64 | H100, B200, B300, GB200, GB300 | TP comm overlap (userbuffers), FSDP, CUDA graphs | -| Llama 3.1 405B | 128–1024 | H100, B200, B300, GB200, GB300 | TP+CP comm overlap (userbuffers), FSDP, heavy PP/VP | - -SFT/LoRA variants also exist (e.g. 8B SFT with packed sequences, 70B SFT on 32 GPUs). - -### DeepSeek V3 - -| Model | GPUs | GPU Types | Key Features | -|-------|------|-----------|--------------| -| DeepSeek V3 (671B MoE) | 256–1024 | H100, B200, B300, GB200, GB300 | HybridEP dispatcher, MLA recompute, CUDA graphs (TE scoped) | - -### Qwen3 MoE - -| Model | GPUs | GPU Types | Key Features | -|-------|------|-----------|--------------| -| Qwen3 30B-A3B | 8–16 | H100, B200, B300, GB200, GB300 | MoE alltoall/flex dispatcher | -| Qwen3 235B-A22B | 64–256 | H100, B200, B300, GB200, GB300 | TP comm overlap, CUDA graphs, MoE a2a overlap | -| Qwen3-Next 80B-A3B | 64–128 | H100, B200, B300, GB200, GB300 | EP 64–128 | - -### Qwen3-VL - -| Model | GPUs | GPU Types | Key Features | -|-------|------|-----------|--------------| -| Qwen3-VL 30B-A3B | 8–16 | H100, B200, B300, GB200, GB300 | VLM + MoE | -| Qwen3-VL 235B-A22B | 64–256 | H100, B200, B300, GB200, GB300 | VLM + MoE, TP comm overlap | - -### Kimi K2 - -| Model | GPUs | GPU Types | Key Features | -|-------|------|-----------|--------------| -| Kimi K2 (1T MoE) | 256–1024 | H100, B200, B300, GB200, GB300 | Muon/Adam optimizer, HybridEP, pipeline layout helpers | - -### NemotronH - -| Model | GPUs | GPU Types | Key Features | -|-------|------|-----------|--------------| -| Nemotron 3 Nano (30B MoE+Mamba) | 8–16 | H100, B200, B300, GB200, GB300 | TE CUDA graphs (attn+mamba+moe), HybridEP | -| Nemotron 3 Super | 64 | H100, B200, B300, GB200, GB300 | TE CUDA graphs, EP=64 | -| NemotronH 56B | 64 | H100, B200, B300 | TP=2–8, TE graphs (mamba+attn) | - -### GPT-OSS - -| Model | GPUs | GPU Types | Key Features | -|-------|------|-----------|--------------| -| GPT-OSS 120B | 64 | H100, B200, GB200 | EP=64, HybridEP on GB200 | - ---- - -## Recommendation Decision Tree - -```text -User wants to train a model -│ -├─ Know the model name? -│ ├─ Yes → Look up in Library Recipe Index above -│ │ ├─ Has a recipe for their size + mode? → Use it directly -│ │ └─ No exact match? → Use closest size, adjust parallelism -│ └─ No → Ask for model name, size, and HF model ID -│ -├─ What's the training goal? -│ ├─ Pretrain → Use *_pretrain_config -│ ├─ SFT (full fine-tune) → Use *_sft_config -│ └─ PEFT (LoRA/DoRA) → Use *_peft_config (lowest GPU requirement) -│ -├─ How many GPUs? -│ ├─ 1 GPU → Only PEFT recipes work (TP=1, PP=1) -│ ├─ 8 GPUs (1 node) → Most 8B–16B models, small MoE (EP=8) -│ ├─ 16–64 GPUs → 70B dense, medium MoE -│ └─ 128+ GPUs → 405B+, large MoE (DeepSeek V3, Kimi K2) -│ -├─ Want throughput benchmarks? -│ ├─ Yes → Use perf recipes (scripts/performance/) -│ │ └─ ⚠️ These run on mock data for upper-bound perf only -│ └─ No → Use library recipes (scripts/training/run_recipe.py) -│ -└─ Long context? - ├─ > 8K → Need CP (context parallelism), check *_16k / *_64k / *_128k variants - └─ ≤ 8K → Default recipes work -``` - ---- - -## Adjustment Advice (When Recommending) - -### Parallelism Resizing Rules - -When the user's GPU count differs from the recipe default: - -1. **TP must divide `num_key_value_heads`** (GQA constraint). E.g. if - `num_key_value_heads=8`, valid TP = {1, 2, 4, 8}. -2. **TP should stay within a single node** (NVLink). TP > 8 requires - inter-node NVLink (e.g., GB200 NVL72). -3. **PP adds pipeline bubbles.** Minimize PP; only increase when TP alone can't - fit the model. Use VP (virtual pipeline) to mitigate bubble overhead. -4. **EP doesn't reduce dense-layer memory.** Only expert parameters shard with - EP. Shared attention/embeddings are replicated. For "OOM with MoE", increase - EP first, not TP. -5. **SP should be True whenever TP > 1.** It eliminates redundant activation - copies and is essentially free. -6. **CP requires all-to-all or ring attention.** Check `cp_comm_type`. For - GQA models, `a2a+p2p` hierarchical CP allows CP > num_kv_heads. -7. **world_size = DP × TP × PP × CP × EP.** DP is implicit. Make sure the - product of explicit parallelisms divides your total GPU count. - -### Batch Size Tuning - -- Start with the recipe's `micro_batch_size`. If OOM, reduce to 1. -- `global_batch_size` determines learning dynamics. Scale with DP: - `GBS = micro_batch_size × DP × gradient_accumulation_steps`. -- For MoE, `micro_batch_size=1` is typical at scale. - -### Common Pitfalls to Warn About - -| Pitfall | Symptom | Fix | -|---------|---------|-----| -| TP > num_kv_heads | Crash: "TP must divide num_query_groups" | Reduce TP to a divisor of num_kv_heads | -| PP without VP | Poor throughput (large bubble) | Set `virtual_pipeline_model_parallel_size` | -| EP too low for large MoE | OOM on expert params | Increase EP; each expert lives on EP/num_experts ranks | -| CUDA graphs + packed sequences | Assert: "CUDA graph accepts only Tensor inputs" | Disable packing or use `local` full-iteration graphs | -| CUDA graphs + full recompute | Assert: "full recompute only with full iteration CUDA graph" | Disable recompute or switch to `local` impl | -| `use_te_rng_tracker` not set | Assert on provider init when CUDA graphs enabled | Set `cfg.model.use_te_rng_tracker = True` and `cfg.rng.te_rng_tracker = True` | -| FSDP + TP > 1 on H100 | Possible comm bottleneck | Prefer FSDP with TP=1 or TP=2 on H100; FSDP shines on GB/B-series | -| Long context without CP | OOM on activations | Add CP=2/4/8; use `*_16k`, `*_64k`, or `*_128k` recipe variants | -| MoE `overlap_grad_reduce` on H100 | May hurt perf (False in many H100 presets) | Set `overlap_grad_reduce=False` for MoE on H100 | -| VLM SFT missing image data | Runs but produces garbage | Provide actual multimodal dataset or use mock VLM data | -| Qwen35-VL MoE FSDP | Tested on Blackwell only | May not work on H100; validate first | - -### Recipe Override Examples - -```bash -# Scale Llama3 8B from 2 GPUs to 8 GPUs (increase DP) -uv run python -m torch.distributed.run --nproc_per_node=8 scripts/training/run_recipe.py \ - --recipe llama3_8b_pretrain_config \ - --dataset llm-pretrain-mock - -# Reduce parallelism for Qwen3-MoE 30B to fit on 4 GPUs -uv run python -m torch.distributed.run --nproc_per_node=4 scripts/training/run_recipe.py \ - --recipe qwen3_30b_a3b_sft_config \ - --dataset llm-finetune \ - 'model.expert_model_parallel_size=4' - -# Add long context to an existing recipe -uv run python -m torch.distributed.run --nproc_per_node=8 scripts/training/run_recipe.py \ - --recipe llama3_8b_pretrain_config \ - --dataset llm-pretrain-mock \ - 'model.seq_length=32768' \ - 'model.context_parallel_size=4' - -# Enable CUDA graphs on any recipe -uv run python -m torch.distributed.run --nproc_per_node=8 scripts/training/run_recipe.py \ - --recipe qwen3_30b_a3b_pretrain_config \ - --dataset llm-pretrain-mock \ - 'model.cuda_graph_impl=transformer_engine' \ - 'model.cuda_graph_scope=[attn,moe_router,moe_preprocess]' \ - 'model.use_te_rng_tracker=True' \ - 'rng.te_rng_tracker=True' -``` - ---- - -## Quick Reference: Which Recipe for My Situation? - -| I want to... | Start with | GPUs needed | -|---|---|---| -| Try Bridge for the first time | `llama3_8b_sft_config` + mock data | 2 | -| Fine-tune a 7-8B model | `llama3_8b_sft_config` or `qwen3_8b_sft_config` | 2–8 | -| LoRA on 1 GPU | `llama3_8b_peft_config` or `qwen3_8b_peft_config` | 1 | -| Pretrain a dense 70B | `llama3_70b_pretrain_config` | 32–64 | -| Train a small MoE | `qwen3_30b_a3b_pretrain_config` | 8 | -| Train a large MoE (235B+) | `qwen3_235b_a22b_pretrain_config` | 256–512 | -| Benchmark throughput | Perf recipes via `run_script.py` | Varies | -| Long-context training | `llama3_8b_128k_pretrain_config` or add CP override | 16+ | -| VLM fine-tuning | `qwen3_vl_8b_sft_config` or `gemma3_vl_*_sft_config` | 4–8 | -| Diffusion training | `wan_1_3B_pretrain_config` or `flux_12b_pretrain_config` | 8 | - ---- - -## Code Anchors - -| What | Path | -|------|------| -| Library recipes root | `src/megatron/bridge/recipes/` | -| Recipe `__init__.py` (all exports) | `src/megatron/bridge/recipes/__init__.py` | -| Common recipe helpers | `src/megatron/bridge/recipes/common.py` | -| Training entry point | `scripts/training/run_recipe.py` | -| Perf recipes root | `scripts/performance/` | -| Perf entry point | `scripts/performance/run_script.py` | -| Perf workload configs | `scripts/performance/configs//` | -| Perf overrides (benchmark defaults) | `scripts/performance/utils/overrides.py` | diff --git a/skills/Megatron-Bridge/mbridge-recipe-recommender/evals/evals.json b/skills/Megatron-Bridge/mbridge-recipe-recommender/evals/evals.json deleted file mode 100644 index fe51488c..00000000 --- a/skills/Megatron-Bridge/mbridge-recipe-recommender/evals/evals.json +++ /dev/null @@ -1 +0,0 @@ -[] diff --git a/skills/Megatron-Bridge/mbridge-recipe-recommender/skill-card.md b/skills/Megatron-Bridge/mbridge-recipe-recommender/skill-card.md deleted file mode 100644 index 0b0053bd..00000000 --- a/skills/Megatron-Bridge/mbridge-recipe-recommender/skill-card.md +++ /dev/null @@ -1,52 +0,0 @@ -## Description:
-Recommend and customize Megatron Bridge recipes for a user's model, GPU count, and training goal. Indexes library recipes (pretrain/SFT/PEFT) and performance recipes.
- -This skill is ready for commercial/non-commercial use.
- -## Owner -NVIDIA
- -### License/Terms of Use:
-Apache 2.0
-## Use Case:
-Developers and engineers training large language models with Megatron Bridge who need a starting recipe or training configuration matched to their model, GPU count, and training goal.
- -### Deployment Geography for Use:
-Global
- -## Known Risks and Mitigations:
-Risk: Review before execution as proposals could introduce incorrect or misleading guidance into skills.
-Mitigation: Review and scan skill before deployment.
- -## Reference(s):
-- [Megatron Bridge Documentation](https://docs.nvidia.com/nemo/megatron-bridge/latest/)
-- [Megatron Bridge GitHub Repository](https://github.com/NVIDIA-NeMo/Megatron-Bridge)
- - -## Skill Output:
-**Output Type(s):** [Configuration instructions, Shell commands]
-**Output Format:** [Markdown with inline bash code blocks]
-**Output Parameters:** [1D]
-**Other Properties Related to Output:** [None]
- -## Evaluation Tasks:
-NVSkills-Eval 3-Tier evaluation with external profile; Tier 1 static validation (9 checks, 14 findings), Tier 2 deduplication (2 checks, 1 finding). Tier 3 live agent evaluation not available.
- -## Evaluation Metrics Used:
-Reported benchmark dimensions:
-- Security: Checks whether skill-assisted execution avoids unsafe behavior such as secret leakage, destructive commands, or unauthorized access.
-- Correctness: Checks whether the agent follows the expected workflow and produces the correct final output.
-- Discoverability: Checks whether the agent loads the skill when relevant and avoids using it when irrelevant.
-- Effectiveness: Checks whether the agent performs measurably better with the skill than without it.
-- Efficiency: Checks whether the agent uses fewer tokens and avoids redundant work.
- - - -## Skill Version(s):
-7306f258 (source: git SHA, committed 2026-05-28)
- -## Ethical Considerations:
-NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal team to ensure this skill meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
- -(For Release on NVIDIA Platforms Only)
-Please report quality, risk, security vulnerabilities or NVIDIA AI Concerns [here](https://app.intigriti.com/programs/nvidia/nvidiavdp/detail).
diff --git a/skills/Megatron-Bridge/mbridge-recipe-recommender/skill.oms.sig b/skills/Megatron-Bridge/mbridge-recipe-recommender/skill.oms.sig deleted file mode 100644 index 6d4feb86..00000000 --- a/skills/Megatron-Bridge/mbridge-recipe-recommender/skill.oms.sig +++ /dev/null @@ -1 +0,0 @@ -{"mediaType":"application/vnd.dev.sigstore.bundle.v0.3+json","verificationMaterial":{"x509CertificateChain":{"certificates":[{"rawBytes":"MIICgzCCAgmgAwIBAgIUKIyS7SxNteQIiWzK1dWj85E6520wCgYIKoZIzj0EAwMwVTELMAkGA1UEBhMCVVMxGzAZBgNVBAoMEk5WSURJQSBDb3Jwb3JhdGlvbjEpMCcGA1UEAwwgTlZJRElBIEFnZW50IENhcGFiaWxpdGllcyBJQ0EgMDEwHhcNMjYwNDAxMDAwMDAwWhcNMjgwNDIyMTUzMzA5WjBUMQswCQYDVQQGEwJVUzEbMBkGA1UECgwSTlZJRElBIENvcnBvcmF0aW9uMSgwJgYDVQQDDB9OVklESUEgQWdlbnQgU2tpbGxzIFNpZ25pbmcgMDAxMHYwEAYHKoZIzj0CAQYFK4EEACIDYgAEYoRM9bQl/dGlwSRNi6bTpIJUXH8Nv9GciP6LSflJYYMLCc296kpyuTSsk5ddbAWiDcFX3C/ydX3jwc+qCLYP6uHy9XphyLjOQ27Yb2J6rBLVtRBS1mgGco/Gr7fL6ODco4GaMIGXMB0GA1UdDgQWBBRQ/5ZW3nJ6lmo9SVk7I15o7UGmpTAfBgNVHSMEGDAWgBRPGpILxMBBleJSsBGjrMKsby1CgjAMBgNVHRMBAf8EAjAAMA4GA1UdDwEB/wQEAwIHgDA3BggrBgEFBQcBAQQrMCkwJwYIKwYBBQUHMAGGG2h0dHA6Ly9vY3NwLm5kaXMubnZpZGlhLmNvbTAKBggqhkjOPQQDAwNoADBlAjAUygu/GiOCIXrgGr4SmLgeEVDcEitfFUv7ALbvLVGVyMysB3mxmO/uInZfXzWcJZsCMQDxuoxj4ZmO30jhkPIcCxGFCOvnUsnfU3TfGcouYm4M6iRpbKvtVnHPiy4bi6pcKf0="},{"rawBytes":"MIICiDCCAg6gAwIBAgIUZsIuSv9NkpJCNqtYEfCouVv5BzowCgYIKoZIzj0EAwMwUTELMAkGA1UEBhMCVVMxGzAZBgNVBAoMEk5WSURJQSBDb3Jwb3JhdGlvbjElMCMGA1UEAwwcTlZJRElBIEFnZW50IENhcGFiaWxpdGllcyBDQTAgFw0yNjA0MDEwMDAwMDBaGA85OTk5MTIzMTIzNTk1OVowVTELMAkGA1UEBhMCVVMxGzAZBgNVBAoMEk5WSURJQSBDb3Jwb3JhdGlvbjEpMCcGA1UEAwwgTlZJRElBIEFnZW50IENhcGFiaWxpdGllcyBJQ0EgMDEwdjAQBgcqhkjOPQIBBgUrgQQAIgNiAASI72cR3ctKGg4VWnB3bNja6g1Z2PnOmFEopkPof+QeIcPk9rT+g9MjJnq51EQXL93a7C2GJ9J985G4o2V85VD7wJ1RaXhluHW2rf3y8bQGeAYaKMr5s/hUgn+M3/9WlWejgaAwgZ0wHQYDVR0OBBYEFE8akgvEwEGV4lKwEaOswqxvLUKCMB8GA1UdIwQYMBaAFItnoAjjfuCEUvzyvWyI2vOGvwPjMBIGA1UdEwEB/wQIMAYBAf8CAQAwDgYDVR0PAQH/BAQDAgEGMDcGCCsGAQUFBwEBBCswKTAnBggrBgEFBQcwAYYbaHR0cDovL29jc3AubmRpcy5udmlkaWEuY29tMAoGCCqGSM49BAMDA2gAMGUCMQCeIMMfAbyzPDacw2MxG+Yt1cikrJX/DVxiGfXuHmkkXn6VgSzE79+lkqDErpVO2gYCMCNEColOyvUvkzZGUEI1hQ3PfMgi3FIo9tHoBKMw4/wGBLFpu/0ubtmbBXM6/UMOEw=="},{"rawBytes":"MIICRTCCAcygAwIBAgIUeJdY3rV86EdvFmG7L8LJBsyQFYkwCgYIKoZIzj0EAwMwUTELMAkGA1UEBhMCVVMxGzAZBgNVBAoMEk5WSURJQSBDb3Jwb3JhdGlvbjElMCMGA1UEAwwcTlZJRElBIEFnZW50IENhcGFiaWxpdGllcyBDQTAgFw0yNjA0MDEwMDAwMDBaGA85OTk5MTIzMTIzNTk1OVowUTELMAkGA1UEBhMCVVMxGzAZBgNVBAoMEk5WSURJQSBDb3Jwb3JhdGlvbjElMCMGA1UEAwwcTlZJRElBIEFnZW50IENhcGFiaWxpdGllcyBDQTB2MBAGByqGSM49AgEGBSuBBAAiA2IABAYpiXCDjJ9NT2eSDhyHJVSw1Tbze18cGG2F/578oWvHxg23eQAhNRYdq88i1iOshZSO6C29doKui5Xpmo/7Ctw9Sx4PP2RzOmIuOLCuTdNtKcTRwi4GEsd5BAFvWj42M6NjMGEwHQYDVR0OBBYEFItnoAjjfuCEUvzyvWyI2vOGvwPjMB8GA1UdIwQYMBaAFItnoAjjfuCEUvzyvWyI2vOGvwPjMA8GA1UdEwEB/wQFMAMBAf8wDgYDVR0PAQH/BAQDAgEGMAoGCCqGSM49BAMDA2cAMGQCMCwtAjWLaNwgGWNCgdyNoTyvNhqWRECRJV2r3+7w8g0PL6NHLOsbkgE09BH95h8XlgIwTaQmbbUh2ChAJ5TA1wRiVDnCcvbzHlZl2jM2FcwQQZlk19LOAbyGMRixbu2Ww/rj"}]},"tlogEntries":[]},"dsseEnvelope":{"payload":"ewogICJfdHlwZSI6ICJodHRwczovL2luLXRvdG8uaW8vU3RhdGVtZW50L3YxIiwKICAic3ViamVjdCI6IFsKICAgIHsKICAgICAgIm5hbWUiOiAibWJyaWRnZS1yZWNpcGUtcmVjb21tZW5kZXIiLAogICAgICAiZGlnZXN0IjogewogICAgICAgICJzaGEyNTYiOiAiNWM3ZDgxMjA3OWNhMjIyN2IwMGFkMmM5OGE5MGJjODI0OTdkYTUxNzZiZWUzZTU0ZjcwNWExYTlmZTM0MDFiMSIKICAgICAgfQogICAgfQogIF0sCiAgInByZWRpY2F0ZVR5cGUiOiAiaHR0cHM6Ly9tb2RlbF9zaWduaW5nL3NpZ25hdHVyZS92MS4wIiwKICAicHJlZGljYXRlIjogewogICAgInJlc291cmNlcyI6IFsKICAgICAgewogICAgICAgICJhbGdvcml0aG0iOiAic2hhMjU2IiwKICAgICAgICAiZGlnZXN0IjogIjdhY2JkMGE4YTgzOWVjOWQ2MzljMmQ0ODA4ODBjYWY1ZjA2ODFmYWIzYTkxY2QzYTVkZDRiZTVhN2RjNDNlYzEiLAogICAgICAgICJuYW1lIjogIkJFTkNITUFSSy5tZCIKICAgICAgfSwKICAgICAgewogICAgICAgICJhbGdvcml0aG0iOiAic2hhMjU2IiwKICAgICAgICAiZGlnZXN0IjogIjMyZjhjYTI5NzE2OTg0YzJmNjg0NDJmNWE2YmQ0ZjZmNWU2NjIxNWQ3ZDg2MWU4ZDIwM2ZkOTU4NjZjYzE0YjciLAogICAgICAgICJuYW1lIjogIlNLSUxMLm1kIgogICAgICB9LAogICAgICB7CiAgICAgICAgImFsZ29yaXRobSI6ICJzaGEyNTYiLAogICAgICAgICJkaWdlc3QiOiAiMzc1MTdlNWYzZGM2NjgxOWY2MWY1YTdiYjhhY2UxOTIxMjgyNDE1ZjEwNTUxZDJkZWZhNWMzZWIwOTg1YjU3MCIsCiAgICAgICAgIm5hbWUiOiAiZXZhbHMvZXZhbHMuanNvbiIKICAgICAgfSwKICAgICAgewogICAgICAgICJhbGdvcml0aG0iOiAic2hhMjU2IiwKICAgICAgICAiZGlnZXN0IjogIjA2NGI0NDYwZGM1MzUxZmJjOTNhZjJjMTMxNjA0NjhkNWI3NGRiOWZhODQ1ZDRhMDkwZTZmMTQ4YjAxODQ5OGEiLAogICAgICAgICJuYW1lIjogInNraWxsLWNhcmQubWQiCiAgICAgIH0KICAgIF0sCiAgICAic2VyaWFsaXphdGlvbiI6IHsKICAgICAgImhhc2hfdHlwZSI6ICJzaGEyNTYiLAogICAgICAiYWxsb3dfc3ltbGlua3MiOiBmYWxzZSwKICAgICAgImlnbm9yZV9wYXRocyI6IFsKICAgICAgICAiLmdpdGF0dHJpYnV0ZXMiLAogICAgICAgICIuZ2l0aWdub3JlIiwKICAgICAgICAiLmdpdCIsCiAgICAgICAgIi5naXRodWIiCiAgICAgIF0sCiAgICAgICJtZXRob2QiOiAiZmlsZXMiCiAgICB9CiAgfQp9","payloadType":"application/vnd.in-toto+json","signatures":[{"sig":"MGUCMHjpQT/AoEkyjwnuT43rs6/Jt9ROjHajO1uaC7dOvn8S1Be76aNinj82OD3mT758AAIxAMu/N5gYdrlc44vXpje/d7aNC2957Jw9rjEUZCc7kvnQU7DgZPRJJCC9qOddZ4F5oA==","keyid":""}]}} \ No newline at end of file