[ci] refactor: tier functional tests into L0/L1/L2 by yaoyu-33 · Pull Request #2715 · NVIDIA-NeMo/Megatron-Bridge

yaoyu-33 · 2026-03-10T05:37:27Z

Summary

Splits all 41 functional test launcher scripts from the flat L2_Launch_* prefix into three explicit tiers with matching CI jobs:

Tier	Scripts	CI Trigger
L0 (7)	training, converter, data, deepseek, qwen, recipes_llama_1b, utils	Every PR + main push + schedule
L1 (17)	gemma, glm, gpt_oss, ministral3, mistral, nemotron, nemotronh, olmoe, qwen_vl, qwen35_vl, ptq, quant_export, recipes_{gpt_oss,llama_3b,ministral3,nemotronh,qwen}	Main push + schedule (not PRs)
L2 (17)	ckpts×4, gemma_vl, llama_nemotron, nemotron_vl, quant variants×3, qat, recipes_{gemma_vl,cuda_graphs,distill,qwen_vl}	Schedule / workflow_dispatch only

Changes:

Rename scripts: L2_Launch_* → L0_/L1_/L2_Launch_*
Replace single cicd-functional-tests matrix job in cicd-main.yml with three jobs (cicd-functional-tests-l0/l1/l2) gated by event/ref conditions
Fix run_ci_tests.sh local mode: previously hardcoded 4 pytest groups (missed data, utils, quantization, ckpts); now globs tier scripts via --tier L0|L1|L2 flag
Update CONTRIBUTING.md with tier table and guidance

Test plan

CI on this PR should only run L0 (7 jobs) in cicd-functional-tests-l0
L1 and L2 jobs should be skipped on this PR
Verify bash scripts/run_ci_tests.sh --skip-lint --skip-unit runs 7 L0 scripts
Verify bash scripts/run_ci_tests.sh --tier L2 --skip-lint --skip-unit runs all 41 scripts

🤖 Generated with Claude Code

Summary by CodeRabbit

Chores
- Restructured CI/CD pipeline with tiered testing approach (L0, L1, L2) for improved test organization and execution efficiency.
Documentation
- Updated contributor guidelines to reflect new tiered test model and launcher script naming conventions.

…ts.sh Rename all functional test launcher scripts from the flat L2_Launch_* prefix to explicit L0/L1/L2 tier prefixes: L0 (7 scripts) — PR gate: core smoke tests that must pass on every PR training, utils, converter, data, models_gpt_oss, models_llama_nemotron, recipes_llama_1b L1 (16 scripts) — daily / merge-to-main: broader model & recipe coverage models_{deepseek,gemma,glm,ministral3,mistral,nemotron,nemotronh,olmoe,qwen}, post_training_quantization, quantization_export, recipes_{gpt_oss,llama_3b,ministral3,nemotronh,qwen} L2 (18 scripts) — nightly / weekly: VL models, ckpt conversion, heavy quant ckpts_* (6), models_{gemma_vl,nemotron_vl,qwen_vl,qwen35_vl}, models_{nemotronh,qwen,qwen_vl}_quantization, quantization_aware_training, recipes_{gemma_vl,llama_cuda_graphs,llama_distill,qwen_vl} Update run_ci_tests.sh: - Add --tier L0|L1|L2 flag (default: L0); tiers are cumulative - Both local and docker modes now glob L{n}_Launch_*.sh by tier, eliminating the previous local-mode hardcoded pytest groups that missed data, utils, quantization, and ckpts test groups Signed-off-by: Yu Yao <yaoyu.094@gmail.com>

…-main.yml Replace the single cicd-functional-tests matrix job with three tiered jobs: cicd-functional-tests-l0 — 7 scripts, runs on every PR / push / schedule cicd-functional-tests-l1 — 16 scripts, runs on main push + schedule (not PRs) cicd-functional-tests-l2 — 17 scripts, runs on schedule / workflow_dispatch only L1 is gated by `!startsWith(github.ref, 'refs/heads/pull-request/')`. L2 is gated by `github.event_name == 'schedule' || 'workflow_dispatch'`. Also fixes: L1_Launch_quantization_export was commented-out previously; now included in the L1 tier. L2_Launch_recipes_qwen_vl was missing from the old matrix; added to L2. Signed-off-by: Yu Yao <yaoyu.094@gmail.com>

Replace the old L2_Launch_* guidance with a table explaining all three tiers and their CI triggers, and update the example to use the correct tier prefix and job name. Signed-off-by: Yu Yao <yaoyu.094@gmail.com>

L0 (added): models_deepseek, models_qwen L0 (removed): models_gpt_oss, models_llama_nemotron L1 (added): models_gpt_oss, models_qwen_vl, models_qwen35_vl L1 (removed): models_deepseek, models_qwen L2 (added): models_llama_nemotron L2 (removed): models_qwen_vl, models_qwen35_vl Signed-off-by: Yu Yao <yaoyu.094@gmail.com>

copy-pr-bot · 2026-03-10T05:37:31Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

yaoyu-33 · 2026-03-10T05:37:33Z

/ok to test 179e15c

…scripts Break combined model launchers into version-specific scripts for better CI parallelism and tier granularity: qwen: L0_Launch_models_qwen.sh (all) → L0_Launch_models_qwen3.sh (qwen3, qwen3_moe, qwen3_next) [L0] L1_Launch_models_qwen2.sh (qwen2) [L1] gemma: L1_Launch_models_gemma.sh (all) → L1_Launch_models_gemma1.sh (gemma1) L1_Launch_models_gemma2.sh (gemma2) L1_Launch_models_gemma3.sh (gemma3) qwen_vl: L1_Launch_models_qwen_vl.sh (all, overlapped qwen35) → L1_Launch_models_qwen25_vl.sh (qwen25_vl) L1_Launch_models_qwen3_vl.sh (qwen3_vl conversion + generation) L1_Launch_models_qwen35_vl.sh already existed — no change Also fixes a duplication: the old qwen_vl launcher ran the entire qwen_vl/ dir, causing qwen35_vl tests to execute twice in CI. Signed-off-by: Yu Yao <yaoyu.094@gmail.com>

coderabbitai · 2026-03-10T05:48:20Z

📝 Walkthrough

Walkthrough

The changes introduce a tiered CI/CD testing framework (L0, L1, L2) that splits a monolithic functional test job into three separate jobs with distinct trigger conditions and test matrices. The workflow now includes MCore submodule handling, while supporting documentation and test execution scripts are updated to implement and orchestrate the tiered testing model.

Changes

Cohort / File(s)	Summary
CI Workflow Restructuring `.github/workflows/cicd-main.yml`	Replaced single `cicd-functional-tests` job with three tiered jobs (`cicd-functional-tests-l0`, `cicd-functional-tests-l1`, `cicd-functional-tests-l2`). Added MCore submodule update steps, updated job dependencies, and redefined test matrices with tiered script naming (L0_Launch_, L1_Launch_, L2_Launch_*). Configured distinct trigger conditions per tier (L0: all PRs/main/schedule; L1: main/schedule; L2: schedule/workflow_dispatch).
Documentation Update `CONTRIBUTING.md`	Updated contribution guidelines to document the tiered launcher model, replacing L2-only instructions with a table defining tier purposes and triggers, and updated guidance to select appropriate tier when adding launcher scripts.
Test Execution Script `scripts/run_ci_tests.sh`	Introduced tiered test execution with TIER variable validation and `tierPatterns()` helper function. Replaced hard-coded test invocations with dynamic loop-based discovery of L{n}Launch*.sh scripts, propagating tier awareness through logging and docker execution paths.

Sequence Diagram

sequenceDiagram
    actor Dev as Developer
    participant GHA as GitHub Actions
    participant L0 as L0 Tests
    participant L1 as L1 Tests
    participant L2 as L2 Tests
    participant MCore as MCore Submodule
    participant Report as Test Report

    Dev->>GHA: Trigger (PR/Push/Schedule)
    
    alt PR or Push to main
        GHA->>MCore: Update submodule (if needed)
        MCore->>GHA: Commit validated
        GHA->>L0: Run cicd-functional-tests-l0
        L0->>L0: Execute L0_Launch_*.sh scripts
        L0->>Report: Report results
    end
    
    alt Push to main or Schedule
        GHA->>L1: Run cicd-functional-tests-l1
        L1->>L1: Execute L1_Launch_*.sh scripts
        L1->>Report: Report results
    end
    
    alt Schedule or workflow_dispatch
        GHA->>L2: Run cicd-functional-tests-l2
        L2->>L2: Execute L2_Launch_*.sh scripts
        L2->>Report: Report results
    end
    
    Report->>GHA: Aggregate Nemo_CICD_Test (depends on L0/L1/L2)
    GHA->>Dev: Final status

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Results For Major Changes	⚠️ Warning	Major CI/CD refactoring lacks documented test results; PR contains only stated intentions rather than evidence of validation.	Document actual test results confirming tiered job structure works correctly, including L0/L1/L2 execution validation and resolution of identified unresolved issues.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title clearly and concisely describes the main change: refactoring functional tests into three tiers (L0/L1/L2) in the CI pipeline, which is the primary objective of the changeset.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch yuya/tier-functional-tests-L0-L1-L2

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/cicd-main.yml:
- Around line 458-465: The current job condition for cicd-functional-tests-l1
allows other branch pushes and workflow_dispatch; update the job's if expression
(the block currently using success() && !cancelled() && !startsWith(github.ref,
'refs/heads/pull-request/') && (github.event.inputs.test_suite ...)) to
explicitly gate runs to main branch pushes and scheduled runs only by adding a
check such as requiring (github.ref == 'refs/heads/main' || github.event_name ==
'schedule') alongside the existing conditions; ensure the combined logic still
respects force_run_all and input test_suite checks so only main pushes or
schedule events (and forced runs) can trigger cicd-functional-tests-l1.
- Around line 607-609: The Nemo_CICD_Test summary job is treating intentionally
skipped functional-test tiers (cicd-functional-tests-l0,
cicd-functional-tests-l1, cicd-functional-tests-l2) as failures; update the
result-aggregation query used by Nemo_CICD_Test to only count completed jobs
with conclusion != "success" AND conclusion != "skipped" (or explicitly filter
out conclusion == "skipped") so that skipped tiers do not increment the failure
count. Locate the query/step in the Nemo_CICD_Test job that inspects job
conclusions and add the additional filter (exclude "skipped") to the conditions
that mark a job as a failure.
- Around line 417-421: The workflow step named "main" currently passes the
timeout input with the heterogeneous-matrix expression `${{ matrix.timeout || 30
}}` which yields a string; update the timeout input to use fromJSON for proper
typing by replacing that expression with `${{ fromJSON(matrix.timeout || '30')
}}` where the step uses the action `./.github/actions/test-template` and the
`timeout` input is set, and apply the same change to the other occurrences
listed (the other `timeout` matrix usages).
- Around line 371-380: The downstream functional jobs depend on cicd-unit-tests
and currently use success() which returns false if cicd-unit-tests was skipped,
preventing functional-only runs; update the job condition to allow a skipped
dependency by replacing success() with an explicit check like
(needs.cicd-unit-tests.result == 'success' || needs.cicd-unit-tests.result ==
'skipped') (or equivalent using needs.<job>.result) in the jobs that reference
success() (e.g., the job with needs: [pre-flight, cicd-unit-tests] and its if:
block), and apply the same change to the other similar conditions noted (the
other occurrences that check success()) so functional-only dispatches run when
cicd-unit-tests is skipped.

In `@scripts/run_ci_tests.sh`:
- Around line 95-98: The --tier case currently assigns TIER="${2:-}" then does
shift 2 which will error if no argument was supplied; modify the --tier branch
to first check that "$2" is non-empty (and optionally not another flag) and
print the usage/error message if missing, then only set TIER="$2" and perform
shift 2; update the branch handling for the --tier case (reference the --tier
case label, the TIER variable, and the shift 2) to validate the value before
shifting.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3cd59a68-25ac-49e8-be9e-0a107da53885

📥 Commits

Reviewing files that changed from the base of the PR and between a44fff2 and 179e15c.

📒 Files selected for processing (27)

.github/workflows/cicd-main.yml
CONTRIBUTING.md
scripts/run_ci_tests.sh
tests/functional_tests/L0_Launch_converter.sh
tests/functional_tests/L0_Launch_data.sh
tests/functional_tests/L0_Launch_models_deepseek.sh
tests/functional_tests/L0_Launch_models_qwen.sh
tests/functional_tests/L0_Launch_recipes_llama_1b.sh
tests/functional_tests/L0_Launch_training.sh
tests/functional_tests/L0_Launch_utils.sh
tests/functional_tests/L1_Launch_models_gemma.sh
tests/functional_tests/L1_Launch_models_glm.sh
tests/functional_tests/L1_Launch_models_gpt_oss.sh
tests/functional_tests/L1_Launch_models_ministral3.sh
tests/functional_tests/L1_Launch_models_mistral.sh
tests/functional_tests/L1_Launch_models_nemotron.sh
tests/functional_tests/L1_Launch_models_nemotronh.sh
tests/functional_tests/L1_Launch_models_olmoe.sh
tests/functional_tests/L1_Launch_models_qwen35_vl.sh
tests/functional_tests/L1_Launch_models_qwen_vl.sh
tests/functional_tests/L1_Launch_post_training_quantization.sh
tests/functional_tests/L1_Launch_quantization_export.sh
tests/functional_tests/L1_Launch_recipes_gpt_oss.sh
tests/functional_tests/L1_Launch_recipes_llama_3b.sh
tests/functional_tests/L1_Launch_recipes_ministral3.sh
tests/functional_tests/L1_Launch_recipes_nemotronh.sh
tests/functional_tests/L1_Launch_recipes_qwen.sh

coderabbitai · 2026-03-10T05:48:23Z

+    needs: [pre-flight, cicd-unit-tests]
+    runs-on: ${{ needs.pre-flight.outputs.runner_prefix }}-gpu-x2
+    if: |
+      (
+        success()
+        || needs.pre-flight.outputs.is_ci_workload == 'true'
+        || needs.pre-flight.outputs.force_run_all == 'true'
+      )
+      && !cancelled()
+      && (github.event.inputs.test_suite == '' || github.event.inputs.test_suite == 'all' || github.event.inputs.test_suite == 'functional-only')


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n .github/workflows/cicd-main.yml | sed -n '365,385p'

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 1039

🏁 Script executed:

cat -n .github/workflows/cicd-main.yml | sed -n '450,470p'

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 1168

🏁 Script executed:

cat -n .github/workflows/cicd-main.yml | sed -n '536,556p'

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 1247

🏁 Script executed:

grep -n "cicd-unit-tests:" .github/workflows/cicd-main.yml

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 93

🏁 Script executed:

cat -n .github/workflows/cicd-main.yml | sed -n '150,200p'

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 2563

🏁 Script executed:

cat -n .github/workflows/cicd-main.yml | sed -n '295,340p'

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 2077

🏁 Script executed:

web_search GitHub Actions success() function skipped dependency behavior

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 268

🌐 Web query:

GitHub Actions success() function skipped job dependency

💡 Result:

In GitHub Actions, a job that has needs: effectively gets an implicit default condition of success() unless you override it with another status-check function. GitHub’s docs note that “a default status check of success() is applied unless you include one of these functions.” [1]

That matters because when an upstream needs job is skipped, downstream jobs are typically skipped too (“If a job fails or is skipped, all jobs that need it are skipped unless the jobs use a conditional expression that causes the job to continue.”) [2] Even though skipped jobs show as “Success” in the UI, they still have a needs.<job_id>.result of skipped (possible values: success, failure, cancelled, skipped). [3][4]

Fix: explicitly allow skipped

Use always() (or another status-check function) and then check needs.*.result:

jobs: deploy: needs: [build] if: ${{ always() && (needs.build.result == 'success' || needs.build.result == 'skipped') }} runs-on: ubuntu-latest steps: - run: echo "deploy"

Or for many dependencies:

if: ${{ always() && !contains(needs.*.result, 'failure') && !contains(needs.*.result, 'cancelled') }}

This overrides the default success() gating and lets the job run when dependencies were skipped. [1][3]

All three functional job stages will be skipped when test_suite == 'functional-only' is dispatched.

When functional-only is selected, the cicd-unit-tests job is skipped (it only runs for unit-only or all). Since L0, L1, and L2 jobs depend on cicd-unit-tests and use success() in their conditions, GitHub Actions will skip these downstream jobs as well, preventing the functional test dispatch from running anything.

Fix: explicitly allow skipped dependencies

- if: | - ( - success() + if: | + always() + && ( + needs.cicd-unit-tests.result == 'success' + || needs.cicd-unit-tests.result == 'skipped' || needs.pre-flight.outputs.is_ci_workload == 'true' || needs.pre-flight.outputs.force_run_all == 'true' ) && !cancelled() && (github.event.inputs.test_suite == '' || github.event.inputs.test_suite == 'all' || github.event.inputs.test_suite == 'functional-only')

Also applies to: 460-469, 546-555

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In @.github/workflows/cicd-main.yml around lines 371 - 380, The downstream functional jobs depend on cicd-unit-tests and currently use success() which returns false if cicd-unit-tests was skipped, preventing functional-only runs; update the job condition to allow a skipped dependency by replacing success() with an explicit check like (needs.cicd-unit-tests.result == 'success' || needs.cicd-unit-tests.result == 'skipped') (or equivalent using needs.<job>.result) in the jobs that reference success() (e.g., the job with needs: [pre-flight, cicd-unit-tests] and its if: block), and apply the same change to the other similar conditions noted (the other occurrences that check success()) so functional-only dispatches run when cicd-unit-tests is skipped.

coderabbitai · 2026-03-10T05:48:23Z

+      - name: main
+        uses: ./.github/actions/test-template
+        with:
+          script: ${{ matrix.script }}
+          timeout: ${{ matrix.timeout || 30 }}


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, find and examine the workflow file fd cicd-main.yml --type f

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 53

🏁 Script executed:

# Read the workflow file with line numbers to inspect the matrix definitions and timeout usage wc -l .github/workflows/cicd-main.yml

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 106

🏁 Script executed:

# Read lines around the mentioned locations (417-421, 502-506, 588-592) # Start with a broader view to understand matrix structure sed -n '350,610p' .github/workflows/cicd-main.yml | cat -n | head -100

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 4640

🏁 Script executed:

# Read the exact lines mentioned in the review sed -n '417,421p' .github/workflows/cicd-main.yml | cat -n

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 270

🏁 Script executed:

# Read lines around 502-506 sed -n '502,506p' .github/workflows/cicd-main.yml | cat -n

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 292

🏁 Script executed:

# Read lines around 588-592 sed -n '588,592p' .github/workflows/cicd-main.yml | cat -n

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 292

🏁 Script executed:

# Get broader context around each location to understand matrix definitions sed -n '380,430p' .github/workflows/cicd-main.yml | cat -n

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 2525

🏁 Script executed:

# Check if actionlint is used in the workflow grep -n "actionlint" .github/workflows/cicd-main.yml

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 53

🏁 Script executed:

# Identify which jobs are at lines 417, 502, 588 sed -n '350,430p' .github/workflows/cicd-main.yml | cat -n

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 3848

🏁 Script executed:

# Read the matrix definitions more broadly sed -n '350,625p' .github/workflows/cicd-main.yml | grep -A 30 "strategy:" | head -80

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 3073

🏁 Script executed:

# Check what matrix entries exist for L1 and L2 jobs sed -n '84,250p' .github/workflows/cicd-main.yml | cat -n

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 7379

🏁 Script executed:

# Check if there's a lint workflow or actionlint configuration find .github -name "*.yml" -o -name "*.yaml" | xargs grep -l "actionlint"

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 55

🌐 Web query:

GitHub Actions heterogeneous matrix timeout expression null coalescing operator

💡 Result:

GitHub Actions expressions do not have a null-coalescing operator (??). If you need a “default when missing” behavior (common with a heterogeneous matrix where only some entries define a key like timeout), you typically use || (OR) as the fallback operator—but note it treats all falsy values (null, '', 0, false) as “missing”. [1][2]

Pattern for heterogeneous matrix timeout-minutes

Define timeout only on matrix entries that need it, and default otherwise:

strategy: matrix: include: - name: fast timeout: 10 - name: slow timeout: 45 - name: defaulted # no timeout key timeout-minutes: ${{ fromJSON(matrix.timeout || '30') }}

matrix.timeout || '30' picks the matrix value if present/truthy, else '30'. (No ?? exists to fallback only on null/undefined.) [1][2]

fromJSON(...) converts the string to a number, which is commonly used when a numeric field is driven by expressions. [3]

Notes

This works because jobs.<job_id>.timeout-minutes supports the matrix context. [4]

If you need 0 to be a valid timeout (rare), || won’t work as intended because 0 is falsy. [2]

Sources: GitHub Community discussion on lack of ?? [1]; GitHub Actions expressions/operators & falsy rules [2]; fromJSON() usage in expressions [3]; contexts allowed for jobs.<job_id>.timeout-minutes (includes matrix) [4].

🏁 Script executed:

# Check the complete L0, L1, and L2 matrix definitions more carefully sed -n '357,410p' .github/workflows/cicd-main.yml | cat -n

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 2395

🏁 Script executed:

# Check the L1 matrix for any timeout entries sed -n '425,550p' .github/workflows/cicd-main.yml | cat -n

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 6250

🏁 Script executed:

# Check the L2 matrix for any timeout entries sed -n '540,625p' .github/workflows/cicd-main.yml | cat -n

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 3918

Use fromJSON() to ensure type safety in the heterogeneous matrix timeout expression.

The pattern ${{ matrix.timeout || 30 }} is correct for heterogeneous matrices, but the result is a string. To ensure proper type conversion when only some matrix entries define timeout, wrap the expression: ${{ fromJSON(matrix.timeout || '30') }}. This prevents any potential type mismatch when the action receives the timeout value.

Affected locations

L0: line 72

L1: line 86

L2: line 57

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In @.github/workflows/cicd-main.yml around lines 417 - 421, The workflow step named "main" currently passes the timeout input with the heterogeneous-matrix expression `${{ matrix.timeout || 30 }}` which yields a string; update the timeout input to use fromJSON for proper typing by replacing that expression with `${{ fromJSON(matrix.timeout || '30') }}` where the step uses the action `./.github/actions/test-template` and the `timeout` input is set, and apply the same change to the other occurrences listed (the other `timeout` matrix usages).

coderabbitai · 2026-03-10T05:48:23Z

+    if: |
+      (
+        success()
+        || needs.pre-flight.outputs.force_run_all == 'true'
+      )
+      && !cancelled()
+      && !startsWith(github.ref, 'refs/heads/pull-request/')
+      && (github.event.inputs.test_suite == '' || github.event.inputs.test_suite == 'all' || github.event.inputs.test_suite == 'functional-only')


⚠️ Potential issue | 🟠 Major

Restrict L1 to main pushes and schedules explicitly.

!startsWith(github.ref, 'refs/heads/pull-request/') still allows deploy-release/* pushes and workflow_dispatch, so cicd-functional-tests-l1 runs outside the documented trigger set.

Suggested gate

if: | ( success() || needs.pre-flight.outputs.force_run_all == 'true' ) && !cancelled() - && !startsWith(github.ref, 'refs/heads/pull-request/') + && ( + github.event_name == 'schedule' + || (github.event_name == 'push' && github.ref == 'refs/heads/main') + ) && (github.event.inputs.test_suite == '' || github.event.inputs.test_suite == 'all' || github.event.inputs.test_suite == 'functional-only')

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In @.github/workflows/cicd-main.yml around lines 458 - 465, The current job condition for cicd-functional-tests-l1 allows other branch pushes and workflow_dispatch; update the job's if expression (the block currently using success() && !cancelled() && !startsWith(github.ref, 'refs/heads/pull-request/') && (github.event.inputs.test_suite ...)) to explicitly gate runs to main branch pushes and scheduled runs only by adding a check such as requiring (github.ref == 'refs/heads/main' || github.event_name == 'schedule') alongside the existing conditions; ensure the combined logic still respects force_run_all and input test_suite checks so only main pushes or schedule events (and forced runs) can trigger cicd-functional-tests-l1.

coderabbitai · 2026-03-10T05:48:23Z

+      - cicd-functional-tests-l0
+      - cicd-functional-tests-l1
+      - cicd-functional-tests-l2


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, let's read the workflow file to understand the context cat -n .github/workflows/cicd-main.yml | head -650 | tail -100

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 4775

🏁 Script executed:

# Search for the l2 job definition rg -n "cicd-functional-tests-l2:" .github/workflows/cicd-main.yml

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 102

🏁 Script executed:

# Read the l2 job definition and its conditions sed -n '522,560p' .github/workflows/cicd-main.yml

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 1828

🏁 Script executed:

# Verify the skip condition for l2 job grep -A 8 "cicd-functional-tests-l2:" .github/workflows/cicd-main.yml | grep -A 8 "if:"

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 53

The final status job treats intentionally skipped tiers as failures.

Nemo_CICD_Test depends on cicd-functional-tests-l0/l1/l2, but its result query counts every completed non-success job as a failure. On a normal push event, cicd-functional-tests-l2 is skipped by design (triggered only on schedule or workflow_dispatch events), causing the summary job to fail even when the workflow is functioning correctly.

Suggested filter for real failures only

- FAILED_JOBS=$(gh run view $GITHUB_RUN_ID --json jobs --jq '[.jobs[] | select(.status == "completed" and .conclusion != "success")] | length') || echo 0 + FAILED_JOBS=$(gh run view $GITHUB_RUN_ID --json jobs --jq '[.jobs[] | select(.status == "completed" and .conclusion != "success" and .conclusion != "skipped" and .conclusion != "neutral")] | length') || echo 0

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In @.github/workflows/cicd-main.yml around lines 607 - 609, The Nemo_CICD_Test summary job is treating intentionally skipped functional-test tiers (cicd-functional-tests-l0, cicd-functional-tests-l1, cicd-functional-tests-l2) as failures; update the result-aggregation query used by Nemo_CICD_Test to only count completed jobs with conclusion != "success" AND conclusion != "skipped" (or explicitly filter out conclusion == "skipped") so that skipped tiers do not increment the failure count. Locate the query/step in the Nemo_CICD_Test job that inspects job conclusions and add the additional filter (exclude "skipped") to the conditions that mark a job as a failure.

coderabbitai · 2026-03-10T05:48:23Z

+    --tier)
+      TIER="${2:-}"
+      shift 2
+      ;;


⚠️ Potential issue | 🟡 Minor

Validate --tier before shift 2.

bash scripts/run_ci_tests.sh --tier currently exits with a raw shift error instead of a usage message. Please guard the new flag the same way you would any required-value option.

Suggested fix

--tier) - TIER="${2:-}" + if [[ $# -lt 2 || -z "${2:-}" ]]; then + echo "Missing value for --tier" >&2 + usage + exit 2 + fi + TIER="${2}" shift 2 ;;

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

--tier)

TIER="${2:-}"

shift 2

;;

--tier)

if [[ $# -lt 2 || -z "${2:-}" ]]; then

echo "Missing value for --tier" >&2

usage

exit 2

fi

TIER="${2}"

shift 2

;;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@scripts/run_ci_tests.sh` around lines 95 - 98, The --tier case currently assigns TIER="${2:-}" then does shift 2 which will error if no argument was supplied; modify the --tier branch to first check that "$2" is non-empty (and optionally not another flag) and print the usage/error message if missing, then only set TIER="$2" and perform shift 2; update the branch handling for the --tier case (reference the --tier case label, the TIER variable, and the shift 2) to validate the value before shifting.

Three fixes from CodeRabbit review: 1. L1 trigger condition: replace !startsWith(pull-request/) with an explicit allowlist (github.ref == 'refs/heads/main' || schedule || workflow_dispatch) so L1 does not accidentally run on arbitrary feature branches. 2. Nemo_CICD_Test rollup: exclude jobs with conclusion == "skipped" from the failure count and listing query. L1/L2 are intentionally skipped on PRs and should not fail the overall gate. 3. Timeout type: use fromJSON(matrix.timeout || '30') instead of matrix.timeout || 30 so the value is passed as a number, not a string, in all three tier job blocks. Signed-off-by: Yu Yao <yaoyu.094@gmail.com>

yaoyu-33 · 2026-03-10T05:54:59Z

/ok to test 9572c11

Signed-off-by: oliver könig <okoenig@nvidia.com>

…tests-L0-L1-L2

ko3n1g · 2026-03-17T09:16:36Z

/ok to test b064a25

Replace blunt SKIPPING_IS_ALLOWED flag with per-tier EXPECT_L0/L1/L2 booleans derived from the L0/L1/L2 if-conditions. Fail when any job in an expected tier is unexpectedly skipped, instead of silently passing. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

ko3n1g · 2026-03-17T09:35:42Z

/ok to test 6078f65

Signed-off-by: oliver könig <okoenig@nvidia.com>

Remove `-x` from bash shell options to suppress command tracing noise. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

ko3n1g · 2026-03-17T10:35:55Z

/ok to test 404eab4

yaoyu-33 added 4 commits March 9, 2026 21:53

[doc] update CONTRIBUTING.md for L0/L1/L2 functional test tiers

f45fe04

Replace the old L2_Launch_* guidance with a table explaining all three tiers and their CI triggers, and update the example to use the correct tier prefix and job name. Signed-off-by: Yu Yao <yaoyu.094@gmail.com>

yaoyu-33 requested a review from a team as a code owner March 10, 2026 05:37

copy-pr-bot bot had a problem deploying to test March 10, 2026 05:38 Error

coderabbitai bot reviewed Mar 10, 2026

View reviewed changes

copy-pr-bot bot temporarily deployed to test March 10, 2026 05:55 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci March 10, 2026 05:57 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci March 10, 2026 06:05 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci March 10, 2026 06:10 Inactive

add label based testing

e3d949f

Signed-off-by: oliver könig <okoenig@nvidia.com>

copy-pr-bot bot temporarily deployed to nemo-ci March 16, 2026 21:16 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci March 16, 2026 23:30 Failure

ko3n1g added 2 commits March 17, 2026 09:13

disable L1_Launch_quantization_export

fa1aadc

Signed-off-by: oliver könig <okoenig@nvidia.com>

Merge remote-tracking branch 'origin/main' into yuya/tier-functional-…

b064a25

…tests-L0-L1-L2

copy-pr-bot bot temporarily deployed to test March 17, 2026 09:17 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci March 17, 2026 09:19 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci March 17, 2026 09:27 Error

copy-pr-bot bot temporarily deployed to test March 17, 2026 09:37 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci March 17, 2026 09:37 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci March 17, 2026 09:45 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci March 17, 2026 10:00 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci March 17, 2026 10:00 Failure

copy-pr-bot bot temporarily deployed to nemo-ci March 17, 2026 10:00 Inactive

ko3n1g and others added 2 commits March 17, 2026 10:31

Mark flaky test_pretrain_with_mup test

73cffb4

Signed-off-by: oliver könig <okoenig@nvidia.com>

[ci] fix: reduce debug output in Nemo_CICD_Test

404eab4

Remove `-x` from bash shell options to suppress command tracing noise. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

ko3n1g approved these changes Mar 17, 2026

View reviewed changes

-    --tier)
-      TIER="${2:-}"
-      shift 2
-      ;;
+    --tier)
+      if [[ $# -lt 2 || -z "${2:-}" ]]; then
+        echo "Missing value for --tier" >&2
+        usage
+        exit 2
+      fi
+      TIER="${2}"
+      shift 2
+      ;;

Conversation

yaoyu-33 commented Mar 10, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Mar 10, 2026

Uh oh!

yaoyu-33 commented Mar 10, 2026

Uh oh!

coderabbitai bot commented Mar 10, 2026

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 10, 2026

Choose a reason for hiding this comment

Fix: explicitly allow skipped

Uh oh!

coderabbitai bot Mar 10, 2026

Choose a reason for hiding this comment

Pattern for heterogeneous matrix timeout-minutes

Notes

Uh oh!

coderabbitai bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

yaoyu-33 commented Mar 10, 2026

Uh oh!

ko3n1g commented Mar 17, 2026

Uh oh!

ko3n1g commented Mar 17, 2026

Uh oh!

ko3n1g commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yaoyu-33 commented Mar 10, 2026 •

edited by coderabbitai bot

Loading

Fix: explicitly allow `skipped`

Pattern for heterogeneous matrix `timeout-minutes`