CI Pipeline

Overview

The CI pipeline maps test categories (st, ut-py, ut-cpp) × hardware tiers to GitHub Actions jobs. See testing.md for full test organization and hardware classification.

Design principles:

Merge by runner, not by language — Python and C++ unit tests share setup cost and run as steps within a single job per runner tier (ut, ut-a2a3, ut-a5).
Runner matches hardware tier — no-hardware tests run on ubuntu-latest; platform-specific tests run on self-hosted runners with the matching label (a2a3, a5).
--platform is the only filter — pytest uses --platform + the requires_hardware marker; ctest uses label -LE exclusion. No -m st, no -m "not requires_hardware".
sim = no hardware — a2a3sim/a5sim jobs run on github-hosted runners alongside unit tests.
Skip irrelevant platforms for scene tests — detect-changes gates st-sim-* and st-onboard-* so pure-a5 PRs skip a2a3 scene-test runs and vice versa. UT jobs (ut, ut-a2a3, ut-a5) are unconditional — unit tests cover shared contracts and the cost of a falsely-skipped regression outweighs the savings.

Full Job Matrix

The complete test-type × hardware-tier matrix. Empty cells have no tests yet; only non-empty jobs exist in ci.yml.

Category	github-hosted (no hardware)	a2a3 runner	a5 runner
ut (py + cpp)	`ut`	`ut-a2a3`	`ut-a5`
st	`st-sim-a2a3`, `st-sim-a5`	`st-onboard-a2a3`	`st-onboard-a5`

GitHub Actions Jobs

PullRequest
  ├── pre-commit             (ubuntu-latest)
  ├── packaging-matrix       (ubuntu + macOS)
  ├── ut                     (ubuntu + macOS)        — Python + C++ UT, no hardware [always]
  ├── detect-changes         (ubuntu-latest)         — outputs a{2a3,5}_changed flags
  ├── st-sim-a2a3            (ubuntu + macOS)        — gated by a2a3_changed
  ├── st-sim-a5              (ubuntu + macOS)        — gated by a5_changed
  ├── ut-a2a3                (a2a3 self-hosted)      — Python + C++ UT, a2a3 hardware [always]
  ├── st-onboard-a2a3        (a2a3 self-hosted)      — gated by a2a3_changed
  ├── ut-a5                  (a5 self-hosted)        — Python + C++ UT, a5 hardware [always]
  └── st-onboard-a5          (a5 self-hosted)        — gated by a5_changed

Job	Runner	What it runs
`ut`	`ubuntu-latest`, `macos-latest`	`pytest tests/ut` + `ctest -LE requires_hardware`
`st-sim-a2a3`	`ubuntu-latest`, `macos-latest`	`pytest examples tests/st --platform a2a3sim`
`st-sim-a5`	`ubuntu-latest`, `macos-latest`	`pytest examples tests/st --platform a5sim`
`ut-a2a3`	a2a3 self-hosted	`pytest tests/ut --platform a2a3` + `ctest -L "^requires_hardware(_a2a3)?$" --resource-spec-file ...`
`st-onboard-a2a3`	a2a3 self-hosted	`pytest examples tests/st --platform a2a3 --device ...`
`ut-a5`	a5 self-hosted	`pytest tests/ut --platform a5` + `ctest -L "^requires_hardware(_a5)?$"`
`st-onboard-a5`	a5 self-hosted	`pytest examples tests/st --platform a5 --device ...`

Parallel ST runs on hardware

For self-hosted jobs with multiple NPUs, pass a --device range (and optionally pytest's -x for fail-fast) to get the full dispatcher benefit — device bin-packing for L3, xdist fanout for L2, and a shared ChipWorker per (runtime, device):

# Recommended CI invocation
pytest examples tests/st --platform a2a3 --device 4-7 -x

# Same for a5
pytest examples tests/st --platform a5 --device 0-7 -x

-x (--exitfirst) is appropriate for CI, where aborting on first failure saves runner minutes. Local development usually wants the opposite (let every failure surface) — just drop the flag. The short form is the same in both pytest and standalone on purpose; see testing.md §CLI Design Principles.

pytest-xdist is pulled in via the test extra. See testing.md §Parallel Test Execution for the full hierarchy, fail-fast semantics, and the profiling-vs-parallelism trade-off.

Sim jobs on CPU-constrained runners

Sim jobs (st-sim-a2a3, st-sim-a5) run on ubuntu-latest, which typically has 2 vCPUs. --device 0-15 is still the right choice for the pool size (some L3 cases need several virtual ids), but the default --max-parallel auto caps the in-flight subprocess count to min(nproc, len(--device)) — on a 2-core runner that becomes 2, avoiding CPU thrashing:

# Sim: --max-parallel auto resolves to 2 on ubuntu-latest
pytest examples tests/st --platform a2a3sim --device 0-15

# Or pin explicitly if your runner has a different CPU count
pytest examples tests/st --platform a2a3sim --device 0-15 --max-parallel 2

On hardware jobs the auto default is len(--device) because each subprocess is device-bound (host CPU mostly waits on the NPU), so hardware runners do not need --max-parallel manually.

Scheduling constraints

Sim scene tests and no-hardware unit tests run on github-hosted runners (no hardware).
detect-changes computes two flags (a2a3_changed, a5_changed) from the PR diff. Each flag is false only when every changed file is in the opposite platform's tree (src/{arch}/, examples/{arch}/, tests/{st,device_tests}/{arch}/) or in the NON_CODE list (docs/, .docs/, .claude/, KNOWN_ISSUES.md, .gitignore, README.md, .pre-commit-config.yaml). Anything else — shared C++ (src/common/), Python (python/, simpler_setup/), build files (CMakeLists.txt, pyproject.toml), test infra (tests/ut/, tests/lint/), tooling (tools/) — flips both flags to true.
Gated jobs (scene tests only): st-sim-{a2a3,a5}, st-onboard-{a2a3,a5} run iff their platform's flag is true.
Unconditional jobs (all UT): ut, ut-a2a3, ut-a5 always run. The gating regex intentionally does not include tests/ut/ — unit tests exercise shared contracts (nanobind bindings, RuntimeBuilder, ring buffers, etc.) and the risk of silently skipping a regression outweighs the CI minutes saved. A consequence: self-hosted runners (a2a3, a5) are always busy for at least the UT job, even on doc-only PRs that skip all scene tests.

Hardware Classification

Three hardware tiers, applied to all test categories. See testing.md for the full table including per-category mechanisms (pytest markers, ctest labels, folder structure).

Tier	CI Runner	Job examples
No hardware	`ubuntu-latest`	`ut`, `st-sim-*`
Platform-specific (a2a3)	`[self-hosted, a2a3]`	`ut-a2a3`, `st-onboard-a2a3`
Platform-specific (a5)	`[self-hosted, a5]`	`ut-a5`, `st-onboard-a5`

Test Sources

`tests/ut/` — Python unit tests (ut-py)

Python unit tests. Run via pytest, filtered by --platform + requires_hardware marker.

File	Content	Hardware?
`test_task_interface.py`	nanobind extension API tests	No
`test_runtime_builder.py` (mocked classes)	RuntimeBuilder discovery, error handling, build logic	No
`test_runtime_builder.py::TestRuntimeBuilderIntegration`	Real compilation across platform × runtime	Yes (`@pytest.mark.requires_hardware`)

`tests/ut/cpp/` — C++ unit tests (ut-cpp)

GoogleTest-based tests for pure C++ modules. Run via ctest, filtered by label -LE exclusion.

Runner	Command
No hardware	`ctest --test-dir tests/ut/cpp/build -LE requires_hardware`
a2a3	`ctest --test-dir tests/ut/cpp/build -L "^requires_hardware(_a2a3)?$"`
a5	`ctest --test-dir tests/ut/cpp/build -L "^requires_hardware(_a5)?$"`

`examples/` — Small examples (sim + onboard)

Small, fast examples that run on both simulation and real hardware. Organized as examples/{arch}/{runtime}/{name}/. Discovered and executed by pytest via each example's test_*.py (@scene_test format).

`tests/st/` — Scene tests (onboard-biased)

Large-scale, feature-rich hardware tests. Too slow or using instructions unsupported by the simulator. Organized as tests/st/{arch}/{runtime}/{name}/. Platform compatibility is declared per test via @scene_test(platforms=[...]).

Shared structure

Both examples/ and tests/st/ cases follow the same layout:

{name}/
  test_{name}.py                 # @scene_test class (generate_args, compute_golden)
  kernels/
    orchestration/*.cpp
    aic/*.cpp                    # optional
    aiv/*.cpp                    # optional

Cases are discovered by pytest via test_*.py files. Each test module ends with if __name__ == "__main__": SceneTestCase.run_module(__name__) so it can also run standalone as python test_*.py -p <platform>.

Selection Scheme

A single --platform flag controls hardware/non-hardware splitting across all three categories.

ut-py (pytest marker)

@pytest.mark.requires_hardware                  # any hardware
class TestRuntimeBuilderIntegration:
    ...

@pytest.mark.requires_hardware("a2a3")          # a2a3 specifically
class TestA2A3Feature:
    ...

Selection:

# No hardware (no-hw tests run, requires_hardware tests skip)
pytest tests/ut

# Hardware (no-hw tests skip, hw + platform-specific tests run)
pytest tests/ut --platform a2a3

ut-cpp (ctest label)

# any hardware
set_tests_properties(test_runtime_integration PROPERTIES LABELS "requires_hardware")
# a2a3-specific
set_tests_properties(test_a2a3_feature PROPERTIES LABELS "requires_hardware_a2a3")

Selection uses -LE (label exclude) on no-hw runner and -L (label include) on device runners:

ctest -LE requires_hardware                 # no-hardware runner: only unlabeled
ctest -L "^requires_hardware(_a2a3)?$"      # a2a3 runner: hw + a2a3-specific
ctest -L "^requires_hardware(_a5)?$"        # a5 runner: hw + a5-specific

st (`@scene_test`)

@scene_test(level=2, platforms=["a2a3sim", "a2a3"], runtime="tensormap_and_ringbuffer")
class TestVectorExample(SceneTestCase):
    ...

`--platform`	Behavior
`a2a3sim`	Run if `"a2a3sim"` in `platforms`
`a2a3`	Run if `"a2a3"` in `platforms`
(none)	Auto-parametrize over all `*sim` entries in `platforms`

No --platform means "run all sims" — tests with no sim in their platforms list are skipped. No additional markers are used.

Platform notes

macOS libomp collision: on macOS, the root conftest.py sets KMP_DUPLICATE_LIB_OK=TRUE before import pytest to work around a duplicate-libomp abort triggered by homebrew numpy and pip torch coexisting in one Python process (see troubleshooting/macos-libomp-collision.md). Standalone python test_*.py bypasses conftest — rely on the env var being exported by the shell or tools/verify_packaging.sh.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CI Pipeline

Overview

Full Job Matrix

GitHub Actions Jobs

Parallel ST runs on hardware

Sim jobs on CPU-constrained runners

Scheduling constraints

Hardware Classification

Test Sources

`tests/ut/` — Python unit tests (ut-py)

`tests/ut/cpp/` — C++ unit tests (ut-cpp)

`examples/` — Small examples (sim + onboard)

`tests/st/` — Scene tests (onboard-biased)

Shared structure

Selection Scheme

ut-py (pytest marker)

ut-cpp (ctest label)

st (`@scene_test`)

Platform notes

FilesExpand file tree

ci.md

Latest commit

History

ci.md

File metadata and controls

CI Pipeline

Overview

Full Job Matrix

GitHub Actions Jobs

Parallel ST runs on hardware

Sim jobs on CPU-constrained runners

Scheduling constraints

Hardware Classification

Test Sources

tests/ut/ — Python unit tests (ut-py)

tests/ut/cpp/ — C++ unit tests (ut-cpp)

examples/ — Small examples (sim + onboard)

tests/st/ — Scene tests (onboard-biased)

Shared structure

Selection Scheme

ut-py (pytest marker)

ut-cpp (ctest label)

st (@scene_test)

Platform notes

`tests/ut/` — Python unit tests (ut-py)

`tests/ut/cpp/` — C++ unit tests (ut-cpp)

`examples/` — Small examples (sim + onboard)

`tests/st/` — Scene tests (onboard-biased)

st (`@scene_test`)