ci: declare workflow-level `contents: read` on 6 workflows by arpitjain099 · Pull Request #2042 · NVIDIA/NeMo-Retriever

arpitjain099 · 2026-05-15T05:46:40Z

Pins the default GITHUB_TOKEN to contents: read on 6 workflows in .github/workflows/ that don't call a GitHub API beyond the initial checkout.

Why

CVE-2025-30066 (March 2025 tj-actions/changed-files supply-chain compromise) exfiltrated GITHUB_TOKEN from workflow logs. Pinning per workflow caps runtime authority irrespective of the repo or org default, gives drift protection if the default ever widens, and is credited per-file by the OpenSSF Scorecard Token-Permissions check.

YAML validated locally with yaml.safe_load on each touched file.

Pins the default GITHUB_TOKEN to contents: read on the workflows in .github/workflows/ that don't call a GitHub API beyond the initial checkout. The other workflows in this directory are left implicit because they need write scopes that a maintainer is better placed to declare. Motivation: CVE-2025-30066 (March 2025 tj-actions/changed-files compromise) exfiltrated GITHUB_TOKEN from workflow logs. Per-workflow caps bound runtime authority irrespective of repo or org default, give drift protection if the default ever widens, and are credited per-file by the OpenSSF Scorecard Token-Permissions check. YAML validated locally with yaml.safe_load. Signed-off-by: Arpit Jain <arpitjain099@gmail.com>

copy-pr-bot · 2026-05-15T05:46:44Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

greptile-apps · 2026-05-15T05:50:16Z

Greptile Summary

This PR pins the GITHUB_TOKEN to contents: read at the workflow level across six CI/CD workflows that only need repository checkout access, capping runtime authority for the token irrespective of the organization default.

Six workflows (main CI, PR validation, ARM Docker build, library-mode integration tests, retriever unit tests, and nightly builds) each receive a top-level permissions: contents: read block. All external-service operations (Docker push to NGC, PyPI/Artifactory publish) rely on repository secrets rather than GITHUB_TOKEN, so restricting the token scope does not break any publishing step.
Pre-existing issues visible in the touched files: all third-party action references use mutable version tags rather than full SHA pins, which is the same supply-chain attack vector this PR addresses; and docker-build-arm.yml still uses echo to write the HF token to disk, unlike the safer env-variable pattern already in reusable-docker-build.yml.

Confidence Score: 4/5

Safe to merge — the permission additions are correct and do not break any existing workflow step.

The six permissions: contents: read blocks are correctly scoped: every operation that actually needs the GITHUB_TOKEN (checkout) is covered, and all publishing steps use external credentials unaffected by the token scope. The two observations — mutable action tags and the direct-echo pattern for the HF token — are pre-existing and not introduced by this PR, but they remain unaddressed in the files being touched.

docker-build-arm.yml retains the echo "${{ secrets.HF_ACCESS_TOKEN }}" shell-interpolation pattern that the reusable Docker build workflow has already replaced with a safer env-variable approach.

Important Files Changed

Filename	Overview
.github/workflows/ci-main.yml	Adds workflow-level `permissions: contents: read`; no functional changes. Pre-existing action mutable-tag references are the only concern.
.github/workflows/ci-pull-request.yml	Adds workflow-level `permissions: contents: read`; all downstream jobs only need checkout and Docker operations that use external credentials.
.github/workflows/docker-build-arm.yml	Adds workflow-level `permissions: contents: read`; pre-existing `echo` of secret into a file is a latent shell-interpolation concern unrelated to this PR's change.
.github/workflows/integration-test-library-mode.yml	Adds workflow-level `permissions: contents: read`; all steps are checkout, Python setup, and pytest against external NIM endpoints authenticated via secret env vars.
.github/workflows/retriever-unit-tests.yml	Adds workflow-level `permissions: contents: read`; only requires checkout and running unit tests locally.
.github/workflows/scheduled-nightly.yml	Adds workflow-level `permissions: contents: read`; Docker push and PyPI/Artifactory publish use external credentials (DOCKER_PASSWORD, ARTIFACTORY_*), not GITHUB_TOKEN, so the restricted scope does not break publishing.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    subgraph before["Before PR"]
        BT["GITHUB_TOKEN\n(org/repo default permissions)"]
    end

    subgraph after["After PR — all 6 workflows"]
        AT["GITHUB_TOKEN\ncontents: read only\n(all other scopes → none)"]
    end

    subgraph operations["Token-independent operations (unchanged)"]
        D["Docker push\n(DOCKER_PASSWORD secret)"]
        P["PyPI / Artifactory publish\n(ARTIFACTORY_* secrets)"]
        A["Artifact upload/download\n(ACTIONS_RUNTIME_TOKEN)"]
        C["Repository checkout\n(contents: read ✓)"]
    end

    after --> C
    after -.->|"not needed"| D
    after -.->|"not needed"| P
    after -.->|"not needed"| A

Comments Outside Diff (2)

.github/workflows/ci-main.yml, line 17-22 (link)

Third-party actions pinned to mutable version tags

All six modified workflows reference third-party actions using mutable tags (e.g., actions/checkout@v4, actions/setup-python@v3, pre-commit/action@v3.0.1, astral-sh/setup-uv@v6, docker/setup-qemu-action@v3) rather than immutable commit SHAs. A mutable tag can be force-pushed to point at a different — potentially malicious — commit at any time, which is exactly the supply-chain attack vector this PR is hardening against. The same pattern appears in the other five touched workflows (ci-pull-request.yml, docker-build-arm.yml, integration-test-library-mode.yml, retriever-unit-tests.yml, scheduled-nightly.yml). Each action reference should be pinned to a full 40-character SHA, e.g. actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2.

Prompt To Fix With AI

This is a comment left during a code review.
Path: .github/workflows/ci-main.yml
Line: 17-22

Comment:
**Third-party actions pinned to mutable version tags**

All six modified workflows reference third-party actions using mutable tags (e.g., `actions/checkout@v4`, `actions/setup-python@v3`, `pre-commit/action@v3.0.1`, `astral-sh/setup-uv@v6`, `docker/setup-qemu-action@v3`) rather than immutable commit SHAs. A mutable tag can be force-pushed to point at a different — potentially malicious — commit at any time, which is exactly the supply-chain attack vector this PR is hardening against. The same pattern appears in the other five touched workflows (`ci-pull-request.yml`, `docker-build-arm.yml`, `integration-test-library-mode.yml`, `retriever-unit-tests.yml`, `scheduled-nightly.yml`). Each action reference should be pinned to a full 40-character SHA, e.g. `actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2`.

How can I resolve this? If you propose a fix, please make it concise.

.github/workflows/docker-build-arm.yml, line 28-32 (link)

Secret interpolated directly into shell via echo

echo "${{ secrets.HF_ACCESS_TOKEN }}" > file bakes the secret value into the shell command string at substitution time. If the token contains shell metacharacters or newline characters, this can produce unexpected behavior or partial exposure. Compare this to reusable-docker-build.yml (line 79-84) which correctly injects the secret as an environment variable (env: HF_ACCESS_TOKEN: ${{ secrets.HF_ACCESS_TOKEN }}) and then writes it with printf '%s' "${HF_ACCESS_TOKEN}" — that approach keeps the secret out of the evaluated command and is immune to shell word-splitting.

Prompt To Fix With AI

This is a comment left during a code review.
Path: .github/workflows/docker-build-arm.yml
Line: 28-32

Comment:
**Secret interpolated directly into shell via `echo`**

`echo "${{ secrets.HF_ACCESS_TOKEN }}" > file` bakes the secret value into the shell command string at substitution time. If the token contains shell metacharacters or newline characters, this can produce unexpected behavior or partial exposure. Compare this to `reusable-docker-build.yml` (line 79-84) which correctly injects the secret as an environment variable (`env: HF_ACCESS_TOKEN: ${{ secrets.HF_ACCESS_TOKEN }}`) and then writes it with `printf '%s' "${HF_ACCESS_TOKEN}"` — that approach keeps the secret out of the evaluated command and is immune to shell word-splitting.

How can I resolve this? If you propose a fix, please make it concise.

Prompt To Fix All With AI

Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
.github/workflows/ci-main.yml:17-22
**Third-party actions pinned to mutable version tags**

All six modified workflows reference third-party actions using mutable tags (e.g., `actions/checkout@v4`, `actions/setup-python@v3`, `pre-commit/action@v3.0.1`, `astral-sh/setup-uv@v6`, `docker/setup-qemu-action@v3`) rather than immutable commit SHAs. A mutable tag can be force-pushed to point at a different — potentially malicious — commit at any time, which is exactly the supply-chain attack vector this PR is hardening against. The same pattern appears in the other five touched workflows (`ci-pull-request.yml`, `docker-build-arm.yml`, `integration-test-library-mode.yml`, `retriever-unit-tests.yml`, `scheduled-nightly.yml`). Each action reference should be pinned to a full 40-character SHA, e.g. `actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2`.

### Issue 2 of 2
.github/workflows/docker-build-arm.yml:28-32
**Secret interpolated directly into shell via `echo`**

`echo "${{ secrets.HF_ACCESS_TOKEN }}" > file` bakes the secret value into the shell command string at substitution time. If the token contains shell metacharacters or newline characters, this can produce unexpected behavior or partial exposure. Compare this to `reusable-docker-build.yml` (line 79-84) which correctly injects the secret as an environment variable (`env: HF_ACCESS_TOKEN: ${{ secrets.HF_ACCESS_TOKEN }}`) and then writes it with `printf '%s' "${HF_ACCESS_TOKEN}"` — that approach keeps the secret out of the evaluated command and is immune to shell word-splitting.

_{Reviews (1): Last reviewed commit: "ci: declare workflow-level contents: rea..." | Re-trigger Greptile}

arpitjain099 requested review from a team as code owners May 15, 2026 05:46

arpitjain099 requested a review from jdye64 May 15, 2026 05:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: declare workflow-level `contents: read` on 6 workflows#2042

ci: declare workflow-level `contents: read` on 6 workflows#2042
arpitjain099 wants to merge 1 commit into
NVIDIA:mainfrom
arpitjain099:chore/declare-workflow-perms-readonly

arpitjain099 commented May 15, 2026

Uh oh!

copy-pr-bot Bot commented May 15, 2026

Uh oh!

greptile-apps Bot commented May 15, 2026 •

edited

Loading

Confidence Score: 4/5

Important Files Changed

Flowchart

Comments Outside Diff (2)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

arpitjain099 commented May 15, 2026

Why

Uh oh!

copy-pr-bot Bot commented May 15, 2026

Uh oh!

greptile-apps Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Comments Outside Diff (2)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

greptile-apps Bot commented May 15, 2026 •

edited

Loading