Skip to content

[CI] Fix flaky Ameba example build (parallel-target configure race)#72283

Merged
mergify[bot] merged 1 commit into
project-chip:masterfrom
woody-apple:ci/ameba-parallel-build-race
Jun 1, 2026
Merged

[CI] Fix flaky Ameba example build (parallel-target configure race)#72283
mergify[bot] merged 1 commit into
project-chip:masterfrom
woody-apple:ci/ameba-parallel-build-race

Conversation

@woody-apple
Copy link
Copy Markdown
Contributor

Problem

The Ameba example workflow (.github/workflows/examples-ameba.yaml) builds three targets in a single build_examples.py invocation:

--target ameba-amebad-all-clusters
--target ameba-amebad-all-clusters-minimal
--target ameba-amebad-light

build_examples.py runs the generate() step for these targets in parallel (Context.Generate() uses a ThreadPoolExecutor). Each Ameba target's generate() shells out to the Realtek SDK's $AMEBA_PATH/project/realtek_amebaD_va0_example/GCC-RELEASE/build.sh, which configures one SDK tree shared across all applications — it is not scoped to the per-target output_dir.

The existing @lock_output_dir decorator keys its lock on self.output_dir, which is distinct per target, so it provides no mutual exclusion for the shared SDK tree. Concurrent generation therefore races on it:

mkdir: cannot create directory 'linux': File exists
CMake Error at asdk/config.cmake:20 (configure_file):
    No such file or directory  (asdk/CMakeLists.txt:72)

A racing configure leaves no build.ninja behind, so the subsequent compile fails with ninja: error: loading 'build.ninja'. The failing target rotates between runs (nondeterministic) and master is green, confirming a timing-dependent parallel-configure race rather than a real build break.

Fix

Serialize only the Realtek SDK configure step across Ameba targets by acquiring the shared OutDirLock on a constant key (AMEBA_SDK_GENERATE_LOCK_KEY) around the build.sh invocation in AmebaBuilder.generate() (scripts/build/builders/ameba.py). The lock infrastructure already exists; this just locks on a key that is the same for all three targets (the shared SDK tree) instead of the per-target output_dir.

Compilation (ninja in _build) still runs fully in parallel, each in its own output_dir, so CI throughput is preserved — only the short, contended configure step is serialized.

Testing

  • python3 -m py_compile scripts/build/builders/ameba.py — passes.
  • AST parse / style checks pass (lines within the repo's 132-char limit; no trailing whitespace).
  • The change uses only the pre-existing OutDirLock.lock_dir() API (re-entrant, keyed by string, and a no-op when the lock is None), so non-parallel single-target builds are unaffected.
  • CI: this PR re-runs the Build example - Ameba workflow, which exercises the exact three-target parallel build that was flaking. Several green runs (vs. the previous rotating failures) validate the fix.

…ealtek SDK dir

The Ameba example workflow builds three targets (light, all-clusters,
all-clusters-minimal) in a single build_examples.py invocation, which
generates them in parallel. Each Ameba target's generate() step shells
out to the Realtek SDK's
$AMEBA_PATH/project/realtek_amebaD_va0_example/GCC-RELEASE/build.sh,
which configures a single SDK tree that is shared across all
applications rather than scoped to the per-target output_dir.

The existing @lock_output_dir decorator keys its lock on self.output_dir,
which is distinct per target, so it provides no mutual exclusion for the
shared SDK tree. Concurrent generation therefore races on that tree:

  - mkdir: cannot create directory 'linux': File exists
  - CMake Error at asdk/config.cmake:20 (configure_file):
        No such file or directory  (asdk/CMakeLists.txt:72)

A racing configure leaves no build.ninja behind, so the subsequent
compile fails with 'ninja: error: loading build.ninja'. The failing
target rotates between runs (nondeterministic); master is green because
the race is timing-dependent.

Fix: serialize only the Realtek SDK configure step across Ameba targets
by acquiring the shared OutDirLock on a constant key
(AMEBA_SDK_GENERATE_LOCK_KEY) around the build.sh invocation in
generate(). Compilation (ninja in _build) still runs in parallel, each
in its own output_dir, so CI throughput is preserved.
@woody-apple woody-apple marked this pull request as ready for review May 31, 2026 22:16
Copilot AI review requested due to automatic review settings May 31, 2026 22:16
@pullapprove pullapprove Bot requested a review from andy31415 May 31, 2026 22:16
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request serializes the Realtek SDK configure step across all Ameba targets to prevent race conditions during parallel generation. It introduces a shared lock key (AMEBA_SDK_GENERATE_LOCK_KEY) and uses contextlib.nullcontext() as a fallback when locking is not active. There are no review comments, so we have no feedback to provide.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 31, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 55.52%. Comparing base (8a162c6) to head (89e7209).
⚠️ Report is 2 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #72283   +/-   ##
=======================================
  Coverage   55.52%   55.52%           
=======================================
  Files        1630     1630           
  Lines      111127   111127           
  Branches    13418    13418           
=======================================
  Hits        61706    61706           
  Misses      49421    49421           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@andy31415 andy31415 added the sdk-maintainer-approved PR marked by `matter-sdk-maintainers` as suitable for MERGE - meets guideline & sufficient reviews. label Jun 1, 2026
@mergify mergify Bot merged commit 98c5265 into project-chip:master Jun 1, 2026
78 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

review - pending scripts sdk-maintainer-approved PR marked by `matter-sdk-maintainers` as suitable for MERGE - meets guideline & sufficient reviews.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants