Skip to content

Clean up CI annotations and surface sccache stats from cibuildwheel#1879

Open
leofang wants to merge 10 commits intoNVIDIA:mainfrom
leofang:ci/suppress-pip-warnings
Open

Clean up CI annotations and surface sccache stats from cibuildwheel#1879
leofang wants to merge 10 commits intoNVIDIA:mainfrom
leofang:ci/suppress-pip-warnings

Conversation

@leofang
Copy link
Copy Markdown
Member

@leofang leofang commented Apr 7, 2026

Summary

Reduce CI annotation noise and surface sccache stats from cibuildwheel containers.

Changes

  • PIP_CACHE_DIR=/tmp/pip-cache in container.env for Linux container CI jobs — redirects pip cache to a writable location (the default /github/home/.cache/pip is not writable when running as root on self-hosted runners).
  • Bump actions/cache from v4.2.3 (node20) to v5.0.4 (node24) in fetch_ctk.
  • Bump JamesIves/github-pages-deploy-action from v4.7.3 (node20) to v4.8.0 (node24) in doc_preview.
  • Bump marocchino/sticky-pull-request-comment from v2.9.2 (node20) to v3.0.3 (node24) in doc_preview.
  • Disable sccache-action annotation (disable_annotations: 'true') — the host-side stats show 0% because compilation happens inside cibuildwheel's container.
  • New .github/actions/sccache-summary composite action — dumps sccache stats JSON from inside the cibuildwheel container to the host via /host/ mount, then writes a formatted table to GITHUB_STEP_SUMMARY. Applied to all 3 cibuildwheel steps (cuda.bindings, cuda.core, cuda.core prev CTK). Inspired by NVIDIA/cccl PR #3621.

Annotation impact

Category Before After Status
pip cache errors ~40 0 Fixed
sccache stats notices ~18 0 Replaced with accurate job summary
Node.js 20 warnings ~54 ~19 Reduced (bumped 3 actions)
pip root-user errors ~40 ~40 Out of control (details)

Remaining Node.js 20 warnings (no node24 releases yet)

  • mozilla-actions/sccache-action@0.0.9
  • ilammy/msvc-dev-cmd@v1
  • conda-incubator/setup-miniconda@v3.3.0
  • actions/upload-pages-artifact@v4.0.0 (composite; internally pins upload-artifact@v4.6.2 on node20)

Test plan

  • Verify pip cache annotations are eliminated
  • Verify sccache stats appear in job summaries (Linux build jobs)
  • Verify actions/cache v5 works with self-hosted runners
  • Verify doc preview deployment still works
  • Verify sticky PR comment still works

-- Leo's bot

Set PIP_CACHE_DIR=/tmp/pip-cache and PIP_ROOT_USER_ACTION=ignore in the
container env for test-wheel-linux and coverage-linux jobs. These jobs
run as root in Ubuntu containers where /github/home/.cache/pip is not
writable, causing ~54 harmless warnings per CI run.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@copy-pr-bot
Copy link
Copy Markdown
Contributor

copy-pr-bot bot commented Apr 7, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@leofang
Copy link
Copy Markdown
Member Author

leofang commented Apr 7, 2026

/ok to test 1a0a753

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 7, 2026

@leofang leofang self-assigned this Apr 8, 2026
@leofang leofang added the CI/CD CI/CD infrastructure label Apr 8, 2026
The pip root-user warning was still appearing because actions/setup-python
runs its internal "pip upgrade" on the host runner, not inside the
container. Container-level env vars are invisible to host-side actions.
Moving PIP_ROOT_USER_ACTION to the job-level env block makes it available
to both host-side actions (setup-python) and container-side run steps.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@leofang
Copy link
Copy Markdown
Member Author

leofang commented Apr 8, 2026

/ok to test c938310

@leofang leofang added the P2 Low priority - Nice to have label Apr 8, 2026
The pip root-user warning originates from actions/setup-python's
internal ensurepip call, which deliberately strips all PIP_* env vars
(CPython design, see python/cpython#139363). Neither container.env
nor job-level env can suppress it. See PR comment for full analysis.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@leofang
Copy link
Copy Markdown
Member Author

leofang commented Apr 8, 2026

Why PIP_ROOT_USER_ACTION=ignore is not included in this PR

We investigated suppressing the WARNING: Running pip as the 'root' user... annotations that appear on every Linux container test job. Here's what we found:

Root cause

The warning comes from actions/setup-python's internal Python installation process, specifically the ensurepip call in the installer script (nix-setup-template.sh from actions/python-versions):

echo "Upgrading pip..."
export PIP_ROOT_USER_ACTION=ignore    # ← they already try to suppress it!
./python -m ensurepip                  # ← THE PROBLEM: ensurepip strips all PIP_* env vars
./python -m pip install --upgrade ...  # ← this one correctly respects the env var

CPython's ensurepip module (Lib/ensurepip/__init__.py) calls _disable_pip_configuration_settings() which deliberately strips all PIP_* environment variables before spawning pip as a subprocess. This is by design — see python/cpython#139363. Setting PIP_ROOT_USER_ACTION=ignore at any level (container env, job env, step env) has no effect on this particular pip invocation.

Additionally, actions/setup-python routes all stderr from the installation script through core.error(), which is why pip's WARNING: level message gets the misleading ##[error] prefix and becomes a failure-level annotation.

What we tried

  1. PIP_ROOT_USER_ACTION in container.env — the env var was confirmed present in the Docker container (docker create -e "PIP_ROOT_USER_ACTION=ignore"), but ensurepip strips it.
  2. PIP_ROOT_USER_ACTION at job-level env — same result; the env var is visible to shell steps but ensurepip strips it before invoking pip.

Prior art

This is a well-known issue with multiple reports and no fix merged:

Decision

Since this warning is entirely out of our control (originates from ensurepip inside actions/setup-python), we decided not to include PIP_ROOT_USER_ACTION changes in this PR. The fix needs to happen upstream in actions/python-versions (PR #369).

-- Leo's bot

- Update actions/cache from v4.2.3 (node20) to v5.0.4 (node24) in
  fetch_ctk to eliminate Node.js 20 deprecation warnings. All runners
  are on v2.332+ (v5 requires >= 2.327.1).
- Set disable_annotations on sccache-action to suppress the cache
  stats notice annotations and job summaries.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@leofang
Copy link
Copy Markdown
Member Author

leofang commented Apr 8, 2026

/ok to test 38416ad

leofang and others added 2 commits April 8, 2026 13:07
- JamesIves/github-pages-deploy-action: v4.7.3 (node20) → v4.8.0 (node24)
- marocchino/sticky-pull-request-comment: v2.9.2 (node20) → v3.0.3 (node24)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
sccache stats were previously lost because cibuildwheel runs compilation
inside a manylinux container with its own sccache server instance, while
sccache-action on the host sees 0 hits.

Fix by dumping sccache stats JSON from inside the container to the host
filesystem (via /host/ mount), then reading it in a new composite action
that writes a formatted table to GITHUB_STEP_SUMMARY.

Inspired by NVIDIA/cccl PR #3621.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@leofang leofang changed the title Suppress pip cache and root-user warnings in Linux container CI jobs Clean up CI annotations and surface sccache stats from cibuildwheel Apr 8, 2026
@leofang leofang added P1 Medium priority - Should do and removed P2 Low priority - Nice to have labels Apr 8, 2026
@leofang
Copy link
Copy Markdown
Member Author

leofang commented Apr 8, 2026

/ok to test 5984786

leofang and others added 4 commits April 8, 2026 17:19
- Use cache_hits + cache_misses (per language) as the denominator
  instead of compile_requests, which includes non-compilation calls
  (linker invocations, etc). This matches sccache's own hit rate.
- Add build-step input to reference the cibuildwheel step name in
  the summary for easier navigation to full stats.
- Remove intermediate summary file; only write to GITHUB_STEP_SUMMARY.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
GHA has no construct to reference a previous step's name dynamically
(steps context only exposes outcome/conclusion/outputs). The label
input already identifies which build produced the stats, so the
hardcoded build-step name is redundant.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@leofang
Copy link
Copy Markdown
Member Author

leofang commented Apr 8, 2026

/ok to test 3b0730e

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI/CD CI/CD infrastructure P1 Medium priority - Should do

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants