Skip to content

[ci] refactor: tier functional tests into L0/L1/L2#2715

Merged
ko3n1g merged 13 commits intomainfrom
yuya/tier-functional-tests-L0-L1-L2
Mar 17, 2026
Merged

[ci] refactor: tier functional tests into L0/L1/L2#2715
ko3n1g merged 13 commits intomainfrom
yuya/tier-functional-tests-L0-L1-L2

Conversation

@yaoyu-33
Copy link
Copy Markdown
Contributor

@yaoyu-33 yaoyu-33 commented Mar 10, 2026

Summary

Splits all 41 functional test launcher scripts from the flat L2_Launch_* prefix into three explicit tiers with matching CI jobs:

Tier Scripts CI Trigger
L0 (7) training, converter, data, deepseek, qwen, recipes_llama_1b, utils Every PR + main push + schedule
L1 (17) gemma, glm, gpt_oss, ministral3, mistral, nemotron, nemotronh, olmoe, qwen_vl, qwen35_vl, ptq, quant_export, recipes_{gpt_oss,llama_3b,ministral3,nemotronh,qwen} Main push + schedule (not PRs)
L2 (17) ckpts×4, gemma_vl, llama_nemotron, nemotron_vl, quant variants×3, qat, recipes_{gemma_vl,cuda_graphs,distill,qwen_vl} Schedule / workflow_dispatch only

Changes:

  • Rename scripts: L2_Launch_*L0_/L1_/L2_Launch_*
  • Replace single cicd-functional-tests matrix job in cicd-main.yml with three jobs (cicd-functional-tests-l0/l1/l2) gated by event/ref conditions
  • Fix run_ci_tests.sh local mode: previously hardcoded 4 pytest groups (missed data, utils, quantization, ckpts); now globs tier scripts via --tier L0|L1|L2 flag
  • Update CONTRIBUTING.md with tier table and guidance

Test plan

  • CI on this PR should only run L0 (7 jobs) in cicd-functional-tests-l0
  • L1 and L2 jobs should be skipped on this PR
  • Verify bash scripts/run_ci_tests.sh --skip-lint --skip-unit runs 7 L0 scripts
  • Verify bash scripts/run_ci_tests.sh --tier L2 --skip-lint --skip-unit runs all 41 scripts

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Chores

    • Restructured CI/CD pipeline with tiered testing approach (L0, L1, L2) for improved test organization and execution efficiency.
  • Documentation

    • Updated contributor guidelines to reflect new tiered test model and launcher script naming conventions.

yaoyu-33 added 4 commits March 9, 2026 21:53
…ts.sh

Rename all functional test launcher scripts from the flat L2_Launch_*
prefix to explicit L0/L1/L2 tier prefixes:

  L0 (7 scripts) — PR gate: core smoke tests that must pass on every PR
    training, utils, converter, data, models_gpt_oss,
    models_llama_nemotron, recipes_llama_1b

  L1 (16 scripts) — daily / merge-to-main: broader model & recipe coverage
    models_{deepseek,gemma,glm,ministral3,mistral,nemotron,nemotronh,olmoe,qwen},
    post_training_quantization, quantization_export,
    recipes_{gpt_oss,llama_3b,ministral3,nemotronh,qwen}

  L2 (18 scripts) — nightly / weekly: VL models, ckpt conversion, heavy quant
    ckpts_* (6), models_{gemma_vl,nemotron_vl,qwen_vl,qwen35_vl},
    models_{nemotronh,qwen,qwen_vl}_quantization,
    quantization_aware_training,
    recipes_{gemma_vl,llama_cuda_graphs,llama_distill,qwen_vl}

Update run_ci_tests.sh:
- Add --tier L0|L1|L2 flag (default: L0); tiers are cumulative
- Both local and docker modes now glob L{n}_Launch_*.sh by tier,
  eliminating the previous local-mode hardcoded pytest groups that
  missed data, utils, quantization, and ckpts test groups

Signed-off-by: Yu Yao <yaoyu.094@gmail.com>
…-main.yml

Replace the single cicd-functional-tests matrix job with three tiered jobs:

  cicd-functional-tests-l0 — 7 scripts, runs on every PR / push / schedule
  cicd-functional-tests-l1 — 16 scripts, runs on main push + schedule (not PRs)
  cicd-functional-tests-l2 — 17 scripts, runs on schedule / workflow_dispatch only

L1 is gated by `!startsWith(github.ref, 'refs/heads/pull-request/')`.
L2 is gated by `github.event_name == 'schedule' || 'workflow_dispatch'`.

Also fixes: L1_Launch_quantization_export was commented-out previously;
now included in the L1 tier. L2_Launch_recipes_qwen_vl was missing from
the old matrix; added to L2.

Signed-off-by: Yu Yao <yaoyu.094@gmail.com>
Replace the old L2_Launch_* guidance with a table explaining all three
tiers and their CI triggers, and update the example to use the correct
tier prefix and job name.

Signed-off-by: Yu Yao <yaoyu.094@gmail.com>
  L0 (added): models_deepseek, models_qwen
  L0 (removed): models_gpt_oss, models_llama_nemotron
  L1 (added): models_gpt_oss, models_qwen_vl, models_qwen35_vl
  L1 (removed): models_deepseek, models_qwen
  L2 (added): models_llama_nemotron
  L2 (removed): models_qwen_vl, models_qwen35_vl

Signed-off-by: Yu Yao <yaoyu.094@gmail.com>
@yaoyu-33 yaoyu-33 requested a review from a team as a code owner March 10, 2026 05:37
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Mar 10, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@yaoyu-33
Copy link
Copy Markdown
Contributor Author

/ok to test 179e15c

…scripts

Break combined model launchers into version-specific scripts for
better CI parallelism and tier granularity:

  qwen: L0_Launch_models_qwen.sh (all) →
    L0_Launch_models_qwen3.sh  (qwen3, qwen3_moe, qwen3_next)  [L0]
    L1_Launch_models_qwen2.sh  (qwen2)                          [L1]

  gemma: L1_Launch_models_gemma.sh (all) →
    L1_Launch_models_gemma1.sh (gemma1)
    L1_Launch_models_gemma2.sh (gemma2)
    L1_Launch_models_gemma3.sh (gemma3)

  qwen_vl: L1_Launch_models_qwen_vl.sh (all, overlapped qwen35) →
    L1_Launch_models_qwen25_vl.sh (qwen25_vl)
    L1_Launch_models_qwen3_vl.sh  (qwen3_vl conversion + generation)
    L1_Launch_models_qwen35_vl.sh already existed — no change

Also fixes a duplication: the old qwen_vl launcher ran the entire
qwen_vl/ dir, causing qwen35_vl tests to execute twice in CI.

Signed-off-by: Yu Yao <yaoyu.094@gmail.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 10, 2026

📝 Walkthrough

Walkthrough

The changes introduce a tiered CI/CD testing framework (L0, L1, L2) that splits a monolithic functional test job into three separate jobs with distinct trigger conditions and test matrices. The workflow now includes MCore submodule handling, while supporting documentation and test execution scripts are updated to implement and orchestrate the tiered testing model.

Changes

Cohort / File(s) Summary
CI Workflow Restructuring
.github/workflows/cicd-main.yml
Replaced single cicd-functional-tests job with three tiered jobs (cicd-functional-tests-l0, cicd-functional-tests-l1, cicd-functional-tests-l2). Added MCore submodule update steps, updated job dependencies, and redefined test matrices with tiered script naming (L0_Launch_, L1_Launch_, L2_Launch_*). Configured distinct trigger conditions per tier (L0: all PRs/main/schedule; L1: main/schedule; L2: schedule/workflow_dispatch).
Documentation Update
CONTRIBUTING.md
Updated contribution guidelines to document the tiered launcher model, replacing L2-only instructions with a table defining tier purposes and triggers, and updated guidance to select appropriate tier when adding launcher scripts.
Test Execution Script
scripts/run_ci_tests.sh
Introduced tiered test execution with TIER variable validation and tierPatterns() helper function. Replaced hard-coded test invocations with dynamic loop-based discovery of L{n}Launch*.sh scripts, propagating tier awareness through logging and docker execution paths.

Sequence Diagram

sequenceDiagram
    actor Dev as Developer
    participant GHA as GitHub Actions
    participant L0 as L0 Tests
    participant L1 as L1 Tests
    participant L2 as L2 Tests
    participant MCore as MCore Submodule
    participant Report as Test Report

    Dev->>GHA: Trigger (PR/Push/Schedule)
    
    alt PR or Push to main
        GHA->>MCore: Update submodule (if needed)
        MCore->>GHA: Commit validated
        GHA->>L0: Run cicd-functional-tests-l0
        L0->>L0: Execute L0_Launch_*.sh scripts
        L0->>Report: Report results
    end
    
    alt Push to main or Schedule
        GHA->>L1: Run cicd-functional-tests-l1
        L1->>L1: Execute L1_Launch_*.sh scripts
        L1->>Report: Report results
    end
    
    alt Schedule or workflow_dispatch
        GHA->>L2: Run cicd-functional-tests-l2
        L2->>L2: Execute L2_Launch_*.sh scripts
        L2->>Report: Report results
    end
    
    Report->>GHA: Aggregate Nemo_CICD_Test (depends on L0/L1/L2)
    GHA->>Dev: Final status
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Results For Major Changes ⚠️ Warning Major CI/CD refactoring lacks documented test results; PR contains only stated intentions rather than evidence of validation. Document actual test results confirming tiered job structure works correctly, including L0/L1/L2 execution validation and resolution of identified unresolved issues.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly and concisely describes the main change: refactoring functional tests into three tiers (L0/L1/L2) in the CI pipeline, which is the primary objective of the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch yuya/tier-functional-tests-L0-L1-L2

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/cicd-main.yml:
- Around line 458-465: The current job condition for cicd-functional-tests-l1
allows other branch pushes and workflow_dispatch; update the job's if expression
(the block currently using success() && !cancelled() && !startsWith(github.ref,
'refs/heads/pull-request/') && (github.event.inputs.test_suite ...)) to
explicitly gate runs to main branch pushes and scheduled runs only by adding a
check such as requiring (github.ref == 'refs/heads/main' || github.event_name ==
'schedule') alongside the existing conditions; ensure the combined logic still
respects force_run_all and input test_suite checks so only main pushes or
schedule events (and forced runs) can trigger cicd-functional-tests-l1.
- Around line 607-609: The Nemo_CICD_Test summary job is treating intentionally
skipped functional-test tiers (cicd-functional-tests-l0,
cicd-functional-tests-l1, cicd-functional-tests-l2) as failures; update the
result-aggregation query used by Nemo_CICD_Test to only count completed jobs
with conclusion != "success" AND conclusion != "skipped" (or explicitly filter
out conclusion == "skipped") so that skipped tiers do not increment the failure
count. Locate the query/step in the Nemo_CICD_Test job that inspects job
conclusions and add the additional filter (exclude "skipped") to the conditions
that mark a job as a failure.
- Around line 417-421: The workflow step named "main" currently passes the
timeout input with the heterogeneous-matrix expression `${{ matrix.timeout || 30
}}` which yields a string; update the timeout input to use fromJSON for proper
typing by replacing that expression with `${{ fromJSON(matrix.timeout || '30')
}}` where the step uses the action `./.github/actions/test-template` and the
`timeout` input is set, and apply the same change to the other occurrences
listed (the other `timeout` matrix usages).
- Around line 371-380: The downstream functional jobs depend on cicd-unit-tests
and currently use success() which returns false if cicd-unit-tests was skipped,
preventing functional-only runs; update the job condition to allow a skipped
dependency by replacing success() with an explicit check like
(needs.cicd-unit-tests.result == 'success' || needs.cicd-unit-tests.result ==
'skipped') (or equivalent using needs.<job>.result) in the jobs that reference
success() (e.g., the job with needs: [pre-flight, cicd-unit-tests] and its if:
block), and apply the same change to the other similar conditions noted (the
other occurrences that check success()) so functional-only dispatches run when
cicd-unit-tests is skipped.

In `@scripts/run_ci_tests.sh`:
- Around line 95-98: The --tier case currently assigns TIER="${2:-}" then does
shift 2 which will error if no argument was supplied; modify the --tier branch
to first check that "$2" is non-empty (and optionally not another flag) and
print the usage/error message if missing, then only set TIER="$2" and perform
shift 2; update the branch handling for the --tier case (reference the --tier
case label, the TIER variable, and the shift 2) to validate the value before
shifting.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3cd59a68-25ac-49e8-be9e-0a107da53885

📥 Commits

Reviewing files that changed from the base of the PR and between a44fff2 and 179e15c.

📒 Files selected for processing (27)
  • .github/workflows/cicd-main.yml
  • CONTRIBUTING.md
  • scripts/run_ci_tests.sh
  • tests/functional_tests/L0_Launch_converter.sh
  • tests/functional_tests/L0_Launch_data.sh
  • tests/functional_tests/L0_Launch_models_deepseek.sh
  • tests/functional_tests/L0_Launch_models_qwen.sh
  • tests/functional_tests/L0_Launch_recipes_llama_1b.sh
  • tests/functional_tests/L0_Launch_training.sh
  • tests/functional_tests/L0_Launch_utils.sh
  • tests/functional_tests/L1_Launch_models_gemma.sh
  • tests/functional_tests/L1_Launch_models_glm.sh
  • tests/functional_tests/L1_Launch_models_gpt_oss.sh
  • tests/functional_tests/L1_Launch_models_ministral3.sh
  • tests/functional_tests/L1_Launch_models_mistral.sh
  • tests/functional_tests/L1_Launch_models_nemotron.sh
  • tests/functional_tests/L1_Launch_models_nemotronh.sh
  • tests/functional_tests/L1_Launch_models_olmoe.sh
  • tests/functional_tests/L1_Launch_models_qwen35_vl.sh
  • tests/functional_tests/L1_Launch_models_qwen_vl.sh
  • tests/functional_tests/L1_Launch_post_training_quantization.sh
  • tests/functional_tests/L1_Launch_quantization_export.sh
  • tests/functional_tests/L1_Launch_recipes_gpt_oss.sh
  • tests/functional_tests/L1_Launch_recipes_llama_3b.sh
  • tests/functional_tests/L1_Launch_recipes_ministral3.sh
  • tests/functional_tests/L1_Launch_recipes_nemotronh.sh
  • tests/functional_tests/L1_Launch_recipes_qwen.sh

Comment on lines +371 to +380
needs: [pre-flight, cicd-unit-tests]
runs-on: ${{ needs.pre-flight.outputs.runner_prefix }}-gpu-x2
if: |
(
success()
|| needs.pre-flight.outputs.is_ci_workload == 'true'
|| needs.pre-flight.outputs.force_run_all == 'true'
)
&& !cancelled()
&& (github.event.inputs.test_suite == '' || github.event.inputs.test_suite == 'all' || github.event.inputs.test_suite == 'functional-only')
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n .github/workflows/cicd-main.yml | sed -n '365,385p'

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 1039


🏁 Script executed:

cat -n .github/workflows/cicd-main.yml | sed -n '450,470p'

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 1168


🏁 Script executed:

cat -n .github/workflows/cicd-main.yml | sed -n '536,556p'

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 1247


🏁 Script executed:

grep -n "cicd-unit-tests:" .github/workflows/cicd-main.yml

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 93


🏁 Script executed:

cat -n .github/workflows/cicd-main.yml | sed -n '150,200p'

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 2563


🏁 Script executed:

cat -n .github/workflows/cicd-main.yml | sed -n '295,340p'

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 2077


🏁 Script executed:

web_search
GitHub Actions success() function skipped dependency behavior

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 268


🌐 Web query:

GitHub Actions success() function skipped job dependency

💡 Result:

In GitHub Actions, a job that has needs: effectively gets an implicit default condition of success() unless you override it with another status-check function. GitHub’s docs note that “a default status check of success() is applied unless you include one of these functions.” [1]

That matters because when an upstream needs job is skipped, downstream jobs are typically skipped too (“If a job fails or is skipped, all jobs that need it are skipped unless the jobs use a conditional expression that causes the job to continue.”) [2] Even though skipped jobs show as “Success” in the UI, they still have a needs.<job_id>.result of skipped (possible values: success, failure, cancelled, skipped). [3][4]

Fix: explicitly allow skipped

Use always() (or another status-check function) and then check needs.*.result:

jobs:
  deploy:
    needs: [build]
    if: ${{ always() && (needs.build.result == 'success' || needs.build.result == 'skipped') }}
    runs-on: ubuntu-latest
    steps:
      - run: echo "deploy"

Or for many dependencies:

if: ${{ always() && !contains(needs.*.result, 'failure') && !contains(needs.*.result, 'cancelled') }}

This overrides the default success() gating and lets the job run when dependencies were skipped. [1][3]


All three functional job stages will be skipped when test_suite == 'functional-only' is dispatched.

When functional-only is selected, the cicd-unit-tests job is skipped (it only runs for unit-only or all). Since L0, L1, and L2 jobs depend on cicd-unit-tests and use success() in their conditions, GitHub Actions will skip these downstream jobs as well, preventing the functional test dispatch from running anything.

Fix: explicitly allow skipped dependencies
-    if: |
-      (
-        success()
+    if: |
+      always()
+      && (
+        needs.cicd-unit-tests.result == 'success'
+        || needs.cicd-unit-tests.result == 'skipped'
         || needs.pre-flight.outputs.is_ci_workload == 'true'
         || needs.pre-flight.outputs.force_run_all == 'true'
       )
       && !cancelled()
       && (github.event.inputs.test_suite == '' || github.event.inputs.test_suite == 'all' || github.event.inputs.test_suite == 'functional-only')

Also applies to: 460-469, 546-555

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/cicd-main.yml around lines 371 - 380, The downstream
functional jobs depend on cicd-unit-tests and currently use success() which
returns false if cicd-unit-tests was skipped, preventing functional-only runs;
update the job condition to allow a skipped dependency by replacing success()
with an explicit check like (needs.cicd-unit-tests.result == 'success' ||
needs.cicd-unit-tests.result == 'skipped') (or equivalent using
needs.<job>.result) in the jobs that reference success() (e.g., the job with
needs: [pre-flight, cicd-unit-tests] and its if: block), and apply the same
change to the other similar conditions noted (the other occurrences that check
success()) so functional-only dispatches run when cicd-unit-tests is skipped.

Comment thread .github/workflows/cicd-main.yml Outdated
Comment on lines +417 to +421
- name: main
uses: ./.github/actions/test-template
with:
script: ${{ matrix.script }}
timeout: ${{ matrix.timeout || 30 }}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, find and examine the workflow file
fd cicd-main.yml --type f

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 53


🏁 Script executed:

# Read the workflow file with line numbers to inspect the matrix definitions and timeout usage
wc -l .github/workflows/cicd-main.yml

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 106


🏁 Script executed:

# Read lines around the mentioned locations (417-421, 502-506, 588-592)
# Start with a broader view to understand matrix structure
sed -n '350,610p' .github/workflows/cicd-main.yml | cat -n | head -100

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 4640


🏁 Script executed:

# Read the exact lines mentioned in the review
sed -n '417,421p' .github/workflows/cicd-main.yml | cat -n

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 270


🏁 Script executed:

# Read lines around 502-506
sed -n '502,506p' .github/workflows/cicd-main.yml | cat -n

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 292


🏁 Script executed:

# Read lines around 588-592
sed -n '588,592p' .github/workflows/cicd-main.yml | cat -n

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 292


🏁 Script executed:

# Get broader context around each location to understand matrix definitions
sed -n '380,430p' .github/workflows/cicd-main.yml | cat -n

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 2525


🏁 Script executed:

# Check if actionlint is used in the workflow
grep -n "actionlint" .github/workflows/cicd-main.yml

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 53


🏁 Script executed:

# Identify which jobs are at lines 417, 502, 588
sed -n '350,430p' .github/workflows/cicd-main.yml | cat -n

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 3848


🏁 Script executed:

# Read the matrix definitions more broadly
sed -n '350,625p' .github/workflows/cicd-main.yml | grep -A 30 "strategy:" | head -80

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 3073


🏁 Script executed:

# Check what matrix entries exist for L1 and L2 jobs
sed -n '84,250p' .github/workflows/cicd-main.yml | cat -n

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 7379


🏁 Script executed:

# Check if there's a lint workflow or actionlint configuration
find .github -name "*.yml" -o -name "*.yaml" | xargs grep -l "actionlint"

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 55


🌐 Web query:

GitHub Actions heterogeneous matrix timeout expression null coalescing operator

💡 Result:

GitHub Actions expressions do not have a null-coalescing operator (??). If you need a “default when missing” behavior (common with a heterogeneous matrix where only some entries define a key like timeout), you typically use || (OR) as the fallback operator—but note it treats all falsy values (null, '', 0, false) as “missing”. [1][2]

Pattern for heterogeneous matrix timeout-minutes

Define timeout only on matrix entries that need it, and default otherwise:

strategy:
  matrix:
    include:
      - name: fast
        timeout: 10
      - name: slow
        timeout: 45
      - name: defaulted   # no timeout key

timeout-minutes: ${{ fromJSON(matrix.timeout || '30') }}
  • matrix.timeout || '30' picks the matrix value if present/truthy, else '30'. (No ?? exists to fallback only on null/undefined.) [1][2]
  • fromJSON(...) converts the string to a number, which is commonly used when a numeric field is driven by expressions. [3]

Notes

  • This works because jobs.<job_id>.timeout-minutes supports the matrix context. [4]
  • If you need 0 to be a valid timeout (rare), || won’t work as intended because 0 is falsy. [2]

Sources: GitHub Community discussion on lack of ?? [1]; GitHub Actions expressions/operators & falsy rules [2]; fromJSON() usage in expressions [3]; contexts allowed for jobs.<job_id>.timeout-minutes (includes matrix) [4].


🏁 Script executed:

# Check the complete L0, L1, and L2 matrix definitions more carefully
sed -n '357,410p' .github/workflows/cicd-main.yml | cat -n

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 2395


🏁 Script executed:

# Check the L1 matrix for any timeout entries
sed -n '425,550p' .github/workflows/cicd-main.yml | cat -n

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 6250


🏁 Script executed:

# Check the L2 matrix for any timeout entries  
sed -n '540,625p' .github/workflows/cicd-main.yml | cat -n

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 3918


Use fromJSON() to ensure type safety in the heterogeneous matrix timeout expression.

The pattern ${{ matrix.timeout || 30 }} is correct for heterogeneous matrices, but the result is a string. To ensure proper type conversion when only some matrix entries define timeout, wrap the expression: ${{ fromJSON(matrix.timeout || '30') }}. This prevents any potential type mismatch when the action receives the timeout value.

Affected locations
  • L0: line 72
  • L1: line 86
  • L2: line 57
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/cicd-main.yml around lines 417 - 421, The workflow step
named "main" currently passes the timeout input with the heterogeneous-matrix
expression `${{ matrix.timeout || 30 }}` which yields a string; update the
timeout input to use fromJSON for proper typing by replacing that expression
with `${{ fromJSON(matrix.timeout || '30') }}` where the step uses the action
`./.github/actions/test-template` and the `timeout` input is set, and apply the
same change to the other occurrences listed (the other `timeout` matrix usages).

Comment on lines +458 to +465
if: |
(
success()
|| needs.pre-flight.outputs.force_run_all == 'true'
)
&& !cancelled()
&& !startsWith(github.ref, 'refs/heads/pull-request/')
&& (github.event.inputs.test_suite == '' || github.event.inputs.test_suite == 'all' || github.event.inputs.test_suite == 'functional-only')
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Restrict L1 to main pushes and schedules explicitly.

!startsWith(github.ref, 'refs/heads/pull-request/') still allows deploy-release/* pushes and workflow_dispatch, so cicd-functional-tests-l1 runs outside the documented trigger set.

Suggested gate
     if: |
       (
         success()
         || needs.pre-flight.outputs.force_run_all == 'true'
       )
       && !cancelled()
-      && !startsWith(github.ref, 'refs/heads/pull-request/')
+      && (
+        github.event_name == 'schedule'
+        || (github.event_name == 'push' && github.ref == 'refs/heads/main')
+      )
       && (github.event.inputs.test_suite == '' || github.event.inputs.test_suite == 'all' || github.event.inputs.test_suite == 'functional-only')
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/cicd-main.yml around lines 458 - 465, The current job
condition for cicd-functional-tests-l1 allows other branch pushes and
workflow_dispatch; update the job's if expression (the block currently using
success() && !cancelled() && !startsWith(github.ref, 'refs/heads/pull-request/')
&& (github.event.inputs.test_suite ...)) to explicitly gate runs to main branch
pushes and scheduled runs only by adding a check such as requiring (github.ref
== 'refs/heads/main' || github.event_name == 'schedule') alongside the existing
conditions; ensure the combined logic still respects force_run_all and input
test_suite checks so only main pushes or schedule events (and forced runs) can
trigger cicd-functional-tests-l1.

Comment on lines +607 to +609
- cicd-functional-tests-l0
- cicd-functional-tests-l1
- cicd-functional-tests-l2
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, let's read the workflow file to understand the context
cat -n .github/workflows/cicd-main.yml | head -650 | tail -100

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 4775


🏁 Script executed:

# Search for the l2 job definition
rg -n "cicd-functional-tests-l2:" .github/workflows/cicd-main.yml

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 102


🏁 Script executed:

# Read the l2 job definition and its conditions
sed -n '522,560p' .github/workflows/cicd-main.yml

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 1828


🏁 Script executed:

# Verify the skip condition for l2 job
grep -A 8 "cicd-functional-tests-l2:" .github/workflows/cicd-main.yml | grep -A 8 "if:"

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 53


The final status job treats intentionally skipped tiers as failures.

Nemo_CICD_Test depends on cicd-functional-tests-l0/l1/l2, but its result query counts every completed non-success job as a failure. On a normal push event, cicd-functional-tests-l2 is skipped by design (triggered only on schedule or workflow_dispatch events), causing the summary job to fail even when the workflow is functioning correctly.

Suggested filter for real failures only
-          FAILED_JOBS=$(gh run view $GITHUB_RUN_ID --json jobs --jq '[.jobs[] | select(.status == "completed" and .conclusion != "success")] | length') || echo 0
+          FAILED_JOBS=$(gh run view $GITHUB_RUN_ID --json jobs --jq '[.jobs[] | select(.status == "completed" and .conclusion != "success" and .conclusion != "skipped" and .conclusion != "neutral")] | length') || echo 0
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/cicd-main.yml around lines 607 - 609, The Nemo_CICD_Test
summary job is treating intentionally skipped functional-test tiers
(cicd-functional-tests-l0, cicd-functional-tests-l1, cicd-functional-tests-l2)
as failures; update the result-aggregation query used by Nemo_CICD_Test to only
count completed jobs with conclusion != "success" AND conclusion != "skipped"
(or explicitly filter out conclusion == "skipped") so that skipped tiers do not
increment the failure count. Locate the query/step in the Nemo_CICD_Test job
that inspects job conclusions and add the additional filter (exclude "skipped")
to the conditions that mark a job as a failure.

Comment thread scripts/run_ci_tests.sh
Comment on lines +95 to +98
--tier)
TIER="${2:-}"
shift 2
;;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Validate --tier before shift 2.

bash scripts/run_ci_tests.sh --tier currently exits with a raw shift error instead of a usage message. Please guard the new flag the same way you would any required-value option.

Suggested fix
     --tier)
-      TIER="${2:-}"
+      if [[ $# -lt 2 || -z "${2:-}" ]]; then
+        echo "Missing value for --tier" >&2
+        usage
+        exit 2
+      fi
+      TIER="${2}"
       shift 2
       ;;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
--tier)
TIER="${2:-}"
shift 2
;;
--tier)
if [[ $# -lt 2 || -z "${2:-}" ]]; then
echo "Missing value for --tier" >&2
usage
exit 2
fi
TIER="${2}"
shift 2
;;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/run_ci_tests.sh` around lines 95 - 98, The --tier case currently
assigns TIER="${2:-}" then does shift 2 which will error if no argument was
supplied; modify the --tier branch to first check that "$2" is non-empty (and
optionally not another flag) and print the usage/error message if missing, then
only set TIER="$2" and perform shift 2; update the branch handling for the
--tier case (reference the --tier case label, the TIER variable, and the shift
2) to validate the value before shifting.

Three fixes from CodeRabbit review:

1. L1 trigger condition: replace !startsWith(pull-request/) with an
   explicit allowlist (github.ref == 'refs/heads/main' || schedule ||
   workflow_dispatch) so L1 does not accidentally run on arbitrary
   feature branches.

2. Nemo_CICD_Test rollup: exclude jobs with conclusion == "skipped"
   from the failure count and listing query. L1/L2 are intentionally
   skipped on PRs and should not fail the overall gate.

3. Timeout type: use fromJSON(matrix.timeout || '30') instead of
   matrix.timeout || 30 so the value is passed as a number, not a
   string, in all three tier job blocks.

Signed-off-by: Yu Yao <yaoyu.094@gmail.com>
@yaoyu-33
Copy link
Copy Markdown
Contributor Author

/ok to test 9572c11

@yaoyu-33 yaoyu-33 added area:misc Cross-cutting utilities, logging, helpers, and other changes needs-review PR is ready for code review and waiting on a reviewer and removed area:misc Cross-cutting utilities, logging, helpers, and other changes needs-review PR is ready for code review and waiting on a reviewer labels Mar 10, 2026
Signed-off-by: oliver könig <okoenig@nvidia.com>
@ko3n1g
Copy link
Copy Markdown
Contributor

ko3n1g commented Mar 17, 2026

/ok to test b064a25

Replace blunt SKIPPING_IS_ALLOWED flag with per-tier EXPECT_L0/L1/L2
booleans derived from the L0/L1/L2 if-conditions. Fail when any job in
an expected tier is unexpectedly skipped, instead of silently passing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@ko3n1g
Copy link
Copy Markdown
Contributor

ko3n1g commented Mar 17, 2026

/ok to test 6078f65

ko3n1g and others added 2 commits March 17, 2026 10:31
Signed-off-by: oliver könig <okoenig@nvidia.com>
Remove `-x` from bash shell options to suppress command tracing noise.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@ko3n1g
Copy link
Copy Markdown
Contributor

ko3n1g commented Mar 17, 2026

/ok to test 404eab4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:misc Cross-cutting utilities, logging, helpers, and other changes needs-more-tests Requires additional L0 and L1 test coverage before merge needs-review PR is ready for code review and waiting on a reviewer

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants