Skip to content

ci: add Platform testnet sync status reporting#3958

Open
infraclaw-dash wants to merge 7 commits into
dashpay:v3.1-devfrom
infraclaw-dash:infraclaw/latest-core-testnet-sync-worker
Open

ci: add Platform testnet sync status reporting#3958
infraclaw-dash wants to merge 7 commits into
dashpay:v3.1-devfrom
infraclaw-dash:infraclaw/latest-core-testnet-sync-worker

Conversation

@infraclaw-dash

@infraclaw-dash infraclaw-dash commented Jun 24, 2026

Copy link
Copy Markdown

Summary

  • add a persistent worker harness for latest-public-Core testnet Platform sync
  • add systemd service/timer/install assets under ops/latest-core-testnet-sync
  • add a lightweight GitHub status receiver for manual/repository-dispatch final status reporting
  • document the last-completed-status contract and long-running worker model

Notes

The long-running sync is intentionally not a GitHub-hosted scheduled workflow. The worker publishes only final completed commit statuses:

  • Sync Passed
  • Build Failed
  • Sync Failed

No running status is published, so the previous completed result remains visible while a new sync is active.

Validation

  • node --check .github/scripts/latest-core-testnet-sync/run.cjs
  • bash -n ops/latest-core-testnet-sync/run-worker.sh
  • bash -n ops/latest-core-testnet-sync/install-worker.sh
  • parsed .github/workflows/latest-core-testnet-sync-status.yml with PyYAML
  • systemd-analyze verify ops/latest-core-testnet-sync/latest-core-testnet-sync.service ops/latest-core-testnet-sync/latest-core-testnet-sync.timer
  • dry-run success path maps to Sync Passed
  • dry-run build failure maps to Build Failed

Summary by CodeRabbit

  • New Features

    • Added a new status update workflow for Platform testnet sync results.
    • Exposed a new badge in the README to show the latest sync state.
  • Documentation

    • Added documentation describing the supported sync result states, required fields, and how final outcomes are reported.
    • Clarified that detailed diagnostics are available via an external link, while the badge reflects only the final status.

@coderabbitai

coderabbitai Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 411e3253-ba43-4938-8e2b-ec6c0b998f18

📥 Commits

Reviewing files that changed from the base of the PR and between 06c6405 and 3546112.

📒 Files selected for processing (3)
  • .github/workflows/platform-testnet-sync-status.yml
  • README.md
  • docs/PLATFORM_TESTNET_SYNC.md
✅ Files skipped from review due to trivial changes (1)
  • README.md

📝 Walkthrough

Walkthrough

Adds a platform-testnet-sync-status GitHub Actions workflow that accepts a repository_dispatch or manual trigger with sync result metadata, validates the payload, verifies the target commit, and writes a GitHub commit status. A contract document and README badge are added alongside.

Changes

Platform Testnet Sync Status Reporter

Layer / File(s) Summary
Workflow triggers, validation, and commit status creation
.github/workflows/platform-testnet-sync-status.yml
Defines repository_dispatch (platform-testnet-sync-completed) and workflow_dispatch triggers with status, target_sha, target_url, core_version, platform_sha, and completed_at inputs. Validates payload shape, maps status values to labels, enforces a 40-hex SHA, optionally validates target_url as https://github.com, verifies the commit exists, creates the commit status, writes a job summary, and calls core.setFailed for non-passing statuses.
Contract documentation and README badge
docs/PLATFORM_TESTNET_SYNC.md, README.md
PLATFORM_TESTNET_SYNC.md defines allowed status values, dispatch payload fields, validation rules, and external worker expectations. README.md adds a Platform Sync badge linking to the new doc.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

A rabbit hops through the workflow queue,
Dispatching statuses — green, red, or blue.
The SHA must be forty hexes just right,
The badge on the README gleams in the light.
🐇✅ Sync passed! The testnet aligns tonight.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly matches the main change: adding Platform testnet sync status reporting via CI.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@thepastaclaw

thepastaclaw commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator

✅ Review complete (commit 3546112)

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/scripts/latest-core-testnet-sync/run.cjs:
- Around line 123-129: `LATEST_CORE_TESTNET_PHASE_TIMEOUT_MINUTES` is passed
through `Number(...)` without validation, so invalid input can become `NaN` and
break the timeout flow; update the timeout handling in `run.cjs` around
`runCommand` and the later `setTimeout` usage to parse the env var safely,
reject non-finite or non-positive values, and fall back to the existing default
(30) when the value is missing or invalid.
- Around line 79-114: The timeout handling in runPhase rejects immediately after
sending SIGTERM, which can allow later cleanup in main() to finish while the
child is still running. Update the runPhase lifecycle in run.cjs so the timeout
path waits for the spawned child process to actually exit (or escalates if
needed) before rejecting, using the existing child.on('close') flow and the
SIGTERM timeout callback.

In @.github/workflows/latest-core-testnet-sync-status.yml:
- Around line 53-71: The payload handling in the workflow entrypoint needs to
guard against missing repository_dispatch client payload and avoid defaulting to
context.sha. Update the logic around the dispatch payload parsing and the
payload.status access so repository_dispatch without client_payload fails
cleanly instead of throwing, and require an explicit target SHA by validating
target_sha or platform_sha before continuing. Keep the fix localized to the
payload selection and targetSha resolution in the workflow script.
- Line 48: The workflow step using actions/github-script is pinned to a mutable
version tag, so update the github-script reference in the
latest-core-testnet-sync-status workflow to a specific commit SHA instead of
`@v7`. Locate the action usage in the workflow step and replace the tag with the
repository’s approved full SHA pin to prevent supply-chain drift.

In `@ops/latest-core-testnet-sync/install-worker.sh`:
- Around line 27-36: The installer currently creates PLATFORM_REPO_DIR and then
copies service assets from inside it without ensuring that path already contains
a writable git checkout. Update install-worker.sh to either locate the systemd
files relative to the installer script itself and add an explicit validation
that PLATFORM_REPO_DIR is an existing writable checkout, or make the installer
provision that checkout before copy/install steps. Reference the existing
install -d and install -m steps, plus the run-worker.sh expectation for
platform-sync, so the source path and runtime checkout requirements stay
consistent.

In `@ops/latest-core-testnet-sync/latest-core-testnet-sync.service`:
- Around line 10-12: The service unit hardcodes the repo path instead of
honoring PLATFORM_REPO_DIR from its environment file. Update the
latest-core-testnet-sync.service definition so WorkingDirectory and the
ExecStart script path are derived from the environment-backed repo directory
used by latest-core-testnet-sync.env and install-worker.sh, or otherwise remove
the configurable path assumption; use the unit’s existing EnvironmentFile and
the run-worker.sh launch to keep the path consistent.

In `@ops/latest-core-testnet-sync/run-worker.sh`:
- Around line 18-30: The repo-prep steps in run-worker.sh can fail before
run.cjs starts, which skips the final failure status and leaves the old result
visible. Move the git fetch/git checkout/git reset flow into run.cjs so it uses
the same status handling, or add a shell trap around the repo setup in
run-worker.sh that reports sync_failed on any early exit. Keep the existing
TARGET_SHA/run ID setup and make sure the failure path still posts the final
status.
- Around line 20-22: The sync script leaves behind untracked build artifacts
because the current git reset flow only restores tracked files. Update the
cleanup sequence in run-worker.sh around the existing git fetch/git checkout/git
reset logic to also remove untracked files and directories before each run,
using the same checkout context so every sync starts from a fully clean
workspace.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8aaaeede-69bc-4d57-b7b8-d975ae10c56c

📥 Commits

Reviewing files that changed from the base of the PR and between 87f6b73 and 1749712.

📒 Files selected for processing (10)
  • .github/scripts/latest-core-testnet-sync/run.cjs
  • .github/workflows/latest-core-testnet-sync-status.yml
  • .gitignore
  • docs/NIGHTLY_LATEST_CORE_TESTNET_SYNC.md
  • ops/latest-core-testnet-sync/README.md
  • ops/latest-core-testnet-sync/install-worker.sh
  • ops/latest-core-testnet-sync/latest-core-testnet-sync.env.example
  • ops/latest-core-testnet-sync/latest-core-testnet-sync.service
  • ops/latest-core-testnet-sync/latest-core-testnet-sync.timer
  • ops/latest-core-testnet-sync/run-worker.sh

Comment thread .github/scripts/platform-testnet-sync/run.cjs Outdated
Comment thread .github/scripts/latest-core-testnet-sync/run.cjs Outdated
Comment thread .github/workflows/latest-core-testnet-sync-status.yml Outdated
Comment thread .github/workflows/platform-testnet-sync-status.yml
Comment thread ops/latest-core-testnet-sync/install-worker.sh Outdated
Comment thread ops/latest-core-testnet-sync/latest-core-testnet-sync.service Outdated
Comment thread ops/latest-core-testnet-sync/run-worker.sh Outdated
Comment thread ops/latest-core-testnet-sync/run-worker.sh Outdated

@thepastaclaw thepastaclaw left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

PR adds a self-contained nightly Core-testnet sync worker (Node harness, systemd units, install script, dispatch workflow, docs). No consensus or cryptographic code is touched. Verification confirms three in-scope blocking issues — stderr contamination of version parsing, unbounded in-memory log accumulation across multi-hour phases, and GITHUB_TOKEN being forwarded into phase subprocesses that execute target-branch code — plus several reliability and security suggestions worth tightening before this runs unattended.

🔴 3 blocking | 🟡 5 suggestion(s) | 💬 1 nitpick(s)

🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.

In `.github/scripts/latest-core-testnet-sync/run.cjs`:
- [BLOCKING] .github/scripts/latest-core-testnet-sync/run.cjs:92-95: stderr is captured into the same buffer used to parse the Core version
  `runCommand` appends both stdout and stderr to the single `output` buffer (lines 86–96), and `resolveLatestCoreVersion` then calls `firstLine(result.output)` (line 130) to extract the version. The documented contract says `LATEST_CORE_TESTNET_CORE_VERSION_COMMAND` prints the version on stdout, but any diagnostic written to stderr before the tag (a `set -x` line, a warning from a git/curl helper, a deprecation notice) will appear first in the merged buffer and be silently adopted as `LATEST_CORE_VERSION`. The metadata is then written, the status is published, and the run reports success against a bogus Core version. The stderr stream should still be logged, but it must not be mixed into the captured stdout used for parsing.
- [BLOCKING] .github/scripts/latest-core-testnet-sync/run.cjs:84-96: Phase output is retained in memory for the entire 16+ hour run
  `runCommand` appends every stdout and stderr chunk to the `output` string for every phase (lines 88, 94), including the documented long-running Core-sync and Platform-sync phases that run for many hours with verbose docker/cargo/dashmate output. Logs are already streamed to per-phase log files, but the Node process keeps a full duplicate in memory until the phase exits. A verbose build or sync can grow the heap to hundreds of MB or more, eventually OOM-killing the worker before `publishCommitStatus` runs — leaving the GitHub commit status stale and the systemd unit silently failed. Only the short `resolve-core-version` path actually consumes `result.output`; the long phases should stream without retaining the full log in memory.
- [BLOCKING] .github/scripts/latest-core-testnet-sync/run.cjs:250-256: GITHUB_TOKEN is forwarded into every phase subprocess
  `phaseEnv` is built by spreading `process.env` (which is populated from `EnvironmentFile=/etc/latest-core-testnet-sync.env` and includes `GITHUB_TOKEN`/`GH_TOKEN`) and is then handed to `spawn('bash', ['-c', ...])` for the operator-defined Core-sync, Platform-build, and Platform-sync commands. `run-worker.sh` first does `git reset --hard origin/${PLATFORM_BRANCH}`, so those commands execute scripts from the freshly checked-out branch. Anyone able to land a commit on the tracked branch (or any compromised build dependency invoked during the phases) can read the token from the environment and forge `Latest public Core testnet sync` commit statuses on arbitrary commits — the exact signal this PR introduces for developer trust. The same exposure is compounded by `runCommand` persisting all stdout/stderr to `${runDir}/${name}.log`, so any `set -x` or `env` invocation also writes the token to disk under `/var/log/latest-core-testnet-sync/`. The token is only needed by `publishCommitStatus`, which reads from `process.env` directly; strip `GITHUB_TOKEN`/`GH_TOKEN` from `phaseEnv` before passing it to `spawn`.
- [SUGGESTION] .github/scripts/latest-core-testnet-sync/run.cjs:72-82: Timeout only signals the bash wrapper, not the phase's process tree
  `spawn('bash', ['-c', ...])` is launched without `detached: true`, and the timeout handler calls `child.kill('SIGTERM')` (line 80), which delivers the signal only to the bash wrapper. The phase commands fork docker/dashmate/cargo subprocesses that will become orphans when bash exits, continuing to hold ports, disk, CPU, and lock files. With a 1440-minute default timeout, a real timeout means the next nightly run collides with leftovers from the previous one. Detach the child into its own process group (`spawn(..., { detached: true })`) and signal the whole group on timeout (`process.kill(-child.pid, 'SIGTERM')`, escalating to `SIGKILL` after a grace period).
- [SUGGESTION] .github/scripts/latest-core-testnet-sync/run.cjs:138-219: GitHub `fetch` calls have no timeout or AbortSignal
  Both `resolveLatestCoreVersion` (GET `/releases`, line 138) and `publishCommitStatus` (POST `/statuses/...`, line 210) call the global `fetch` with no `signal`. The systemd unit sets `TimeoutStartSec=0` and Node imposes no default request timeout, so a hung TCP connection to GitHub blocks the worker indefinitely. For the resolve call this delays the run start; for the publish call a completed sync sits silent forever, and the next nightly run is then blocked by the `flock` in `run-worker.sh` (which exits 0 without publishing anything). Wire `AbortSignal.timeout(...)` (or equivalent) into both fetches.
- [SUGGESTION] .github/scripts/latest-core-testnet-sync/run.cjs:285-301: Publish failure in `finally` silently drops the run outcome
  If `publishCommitStatus` throws inside the `finally` block (transient 5xx, expired token, network blip), the rejection is swallowed by the outer `main().catch` which only sets `process.exitCode = 1`. The metadata file already records `status: 'sync_passed'`, but no GitHub commit status is created and there is no retry — the visible status panel goes stale until the next nightly run, which itself may fail to publish for the same reason. Wrap the publish call in retry-with-backoff and log a clear, distinguishable error when the final status could not be reported, so operators can act before the next run.

In `ops/latest-core-testnet-sync/run-worker.sh`:
- [SUGGESTION] ops/latest-core-testnet-sync/run-worker.sh:1-30: Per-run log/state directories accumulate without retention
  `run-worker.sh` creates `${LATEST_CORE_TESTNET_LOG_DIR}/${RUN_ID}` per nightly run, and `run.cjs` writes per-phase log files and metadata under it. Nothing prunes old run directories, so over a year of daily runs you accumulate 365+ directories of multi-MB logs under `/var/log/latest-core-testnet-sync/`. Either add a retention step here (e.g. `find ... -mtime +N -delete`) or document the operator's responsibility in `ops/latest-core-testnet-sync/README.md`.

In `.github/workflows/latest-core-testnet-sync-status.yml`:
- [SUGGESTION] .github/workflows/latest-core-testnet-sync-status.yml:70-88: Dispatch workflow accepts arbitrary `target_sha`/`target_url` without validation
  The reporter posts a commit status to whatever `target_sha` and `target_url` are supplied via `repository_dispatch` `client_payload` (or `workflow_dispatch` input). Because the job runs with `statuses: write`, any credential able to fire the `latest-core-testnet-sync-completed` dispatch event can publish a `success`/`failure` status under the `Latest public Core testnet sync` context against any commit SHA, with an arbitrary `target_url` that maintainers will click from the commit page. Worst case is a forged `Sync Passed` status linking to a phishing page. Mitigations: validate `target_sha` is a 40-hex commit that exists in the repo (e.g. `repos.getCommit`), restrict `target_url` to a known origin (e.g. `https://github.com/...`), and document who is expected to hold the dispatch credential in `docs/NIGHTLY_LATEST_CORE_TESTNET_SYNC.md`.

Comment thread .github/scripts/platform-testnet-sync/run.cjs Outdated
Comment thread .github/scripts/latest-core-testnet-sync/run.cjs Outdated
Comment thread .github/scripts/platform-testnet-sync/run.cjs Outdated
Comment thread .github/scripts/platform-testnet-sync/run.cjs Outdated
Comment thread .github/scripts/latest-core-testnet-sync/run.cjs Outdated
Comment thread .github/scripts/platform-testnet-sync/run.cjs Outdated
Comment thread ops/latest-core-testnet-sync/run-worker.sh Outdated
Comment thread .github/workflows/latest-core-testnet-sync-status.yml Outdated
Comment thread .github/scripts/latest-core-testnet-sync/run.cjs Outdated

@thepastaclaw thepastaclaw left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

All nine prior findings are addressed in the latest delta: stderr/version corruption, unbounded output buffering, GITHUB_TOKEN propagation, process-group signaling, fetch timeouts, log retention, dispatch payload validation, and the shadowed body identifier are all fixed; the publish-failure-in-finally case is partially addressed via retries but the metadata divergence remains. The latest delta introduces two new low-severity issues: a phase-attribution bug for resolve-core-version failures, and a shell-interpolation hardening gap on PLATFORM_BRANCH. No blocking issues.

🟡 2 suggestion(s) | 💬 1 nitpick(s)

🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.

In `.github/scripts/latest-core-testnet-sync/run.cjs`:
- [SUGGESTION] .github/scripts/latest-core-testnet-sync/run.cjs:397-400: resolve-core-version failures are recorded as workspace-prepare failures
  `prepareWorkspace` sets `metadata.phase = 'workspace-prepare'` and never clears it on success. The next statement awaits `resolveLatestCoreVersion`, which can fail (bad `LATEST_CORE_TESTNET_CORE_VERSION_COMMAND`, GitHub releases fetch error, no public release, etc.). The catch at line 440 sets `metadata.failure_phase = metadata.phase || 'resolve-core-version'`, so any such failure is recorded as `failure_phase=workspace-prepare`. The `|| 'resolve-core-version'` fallback only fires when `prepareWorkspace` was skipped via `LATEST_CORE_TESTNET_SKIP_WORKSPACE_PREP=1`. Set the phase before resolving the Core version so the run-tracking contract points operators at the right phase.
- [SUGGESTION] .github/scripts/latest-core-testnet-sync/run.cjs:444-449: Metadata claims sync_passed even when status publish fails after retries (carried forward, partial fix)
  Prior finding #6 is only partially addressed. `publishCommitStatus` now retries up to `STATUS_PUBLISH_ATTEMPTS` times, which is a clear improvement, but if every retry fails the throw propagates to `main().catch`, leaving `run-metadata.json` on disk with `status: sync_passed` (or `build_failed`/`sync_failed`) while no GitHub status was published and the previous run's status remains visible. Operators that read `run-metadata.json` to decide whether the latest sync passed will see a green result while GitHub disagrees. Record the publish outcome in the metadata (e.g. `status_published_at`, `status_publish_error`) after the final `publishCommitStatus` call so the on-disk record and GitHub stay aligned.

Comment thread .github/scripts/platform-testnet-sync/run.cjs Outdated
Comment thread .github/scripts/platform-testnet-sync/run.cjs Outdated
Comment thread .github/scripts/platform-testnet-sync/run.cjs Outdated

@thepastaclaw thepastaclaw left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Prior phase-tracking and publish-outcome metadata findings are resolved. One prior carry-forward (PLATFORM_BRANCH quoting) plus two new in-scope hardening items: target_sha resolution can still report a stale/inherited commit after workspace reset, and the new status-receiver workflow exposes a workflow_dispatch override that lets any repo writer forge a 'Sync Passed' status. All three are suggestions, no blockers.

🟡 2 suggestion(s)

1 carried-forward finding(s) already raised on this PR; not re-posting as new inline comments.

🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.

In `.github/scripts/latest-core-testnet-sync/run.cjs`:
- [SUGGESTION] .github/scripts/latest-core-testnet-sync/run.cjs:233-237: target_sha re-capture still prefers inherited env vars over the checked-out HEAD
  After `prepareWorkspace` resets the workspace to `origin/$PLATFORM_BRANCH`, it re-derives `targetSha` via `resolveTargetSha()`, which returns `process.env.TARGET_SHA || process.env.GITHUB_SHA || git rev-parse HEAD`. If either env var is set on the worker (systemd env file, prior run leakage into a long-lived shell, or future CI invocation), the script runs phase commands against the freshly reset branch tip but publishes the commit status against the inherited SHA — the worker's contract to report the actually-tested commit is silently broken. Worse, if `prepareWorkspace` fails before this line, the catch path publishes 'Sync Failed' against whatever stale SHA was captured at line 382. Resolve `target_sha` directly from `git rev-parse HEAD` after the reset, and only fall back to inherited env vars for the pre-prepare initial value.

In `.github/workflows/latest-core-testnet-sync-status.yml`:
- [SUGGESTION] .github/workflows/latest-core-testnet-sync-status.yml:7-36: workflow_dispatch lets any repo writer forge a sync status
  The receiver exposes both `repository_dispatch` (driven by the operator-only worker credential) and `workflow_dispatch` with free-form `status` and `target_sha` inputs. The job carries `statuses: write`, so anyone with repo write access can manually publish 'Sync Passed' against any existing SHA without the worker ever running. If maintainers, branch protection, or merge automation treat the `Latest public Core testnet sync` context as evidence the testnet sync actually ran, this becomes a trust-laundering primitive that bypasses the operator-credential boundary the rest of the pipeline carefully maintains. Either drop the `workflow_dispatch` trigger, or gate it behind an environment with required reviewers so the manual path requires an operator approval distinct from generic repo write.

Comment on lines +233 to +237
const targetSha = resolveTargetSha();
metadata.target_sha = targetSha;
metadata.platform_sha = targetSha;
process.env.TARGET_SHA = targetSha;
writeMetadata(metadata);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: target_sha re-capture still prefers inherited env vars over the checked-out HEAD

After prepareWorkspace resets the workspace to origin/$PLATFORM_BRANCH, it re-derives targetSha via resolveTargetSha(), which returns process.env.TARGET_SHA || process.env.GITHUB_SHA || git rev-parse HEAD. If either env var is set on the worker (systemd env file, prior run leakage into a long-lived shell, or future CI invocation), the script runs phase commands against the freshly reset branch tip but publishes the commit status against the inherited SHA — the worker's contract to report the actually-tested commit is silently broken. Worse, if prepareWorkspace fails before this line, the catch path publishes 'Sync Failed' against whatever stale SHA was captured at line 382. Resolve target_sha directly from git rev-parse HEAD after the reset, and only fall back to inherited env vars for the pre-prepare initial value.

Suggested change
const targetSha = resolveTargetSha();
metadata.target_sha = targetSha;
metadata.platform_sha = targetSha;
process.env.TARGET_SHA = targetSha;
writeMetadata(metadata);
const targetSha = gitOutput(['rev-parse', 'HEAD']);
metadata.target_sha = targetSha;
metadata.platform_sha = targetSha;
process.env.TARGET_SHA = targetSha;
writeMetadata(metadata);

source: ['claude', 'codex']

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved in f6783a6target_sha re-capture still prefers inherited env vars over the checked-out HEAD no longer present.

Auto-resolved by the review system based on the latest commit diff. If you believe this was closed in error, reopen the thread.

Comment on lines +7 to +36
workflow_dispatch:
inputs:
status:
description: Final completed run status
required: true
type: choice
options:
- sync_passed
- build_failed
- sync_failed
target_sha:
description: Platform commit SHA to report against
required: false
type: string
target_url:
description: URL for run logs or status details
required: false
type: string
core_version:
description: Public Core version used by the run
required: false
type: string
platform_sha:
description: Platform commit SHA built and synced
required: false
type: string
completed_at:
description: Completion timestamp, preferably ISO 8601 UTC
required: false
type: string

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: workflow_dispatch lets any repo writer forge a sync status

The receiver exposes both repository_dispatch (driven by the operator-only worker credential) and workflow_dispatch with free-form status and target_sha inputs. The job carries statuses: write, so anyone with repo write access can manually publish 'Sync Passed' against any existing SHA without the worker ever running. If maintainers, branch protection, or merge automation treat the Latest public Core testnet sync context as evidence the testnet sync actually ran, this becomes a trust-laundering primitive that bypasses the operator-credential boundary the rest of the pipeline carefully maintains. Either drop the workflow_dispatch trigger, or gate it behind an environment with required reviewers so the manual path requires an operator approval distinct from generic repo write.

source: ['codex']

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved in this update — workflow_dispatch lets any repo writer forge a sync status no longer present.

Auto-resolved by the review system based on the latest commit diff. If you believe this was closed in error, reopen the thread.

Comment thread ops/latest-core-testnet-sync/README.md Outdated
@@ -0,0 +1,79 @@
# Latest Public Core Testnet Sync Worker

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why it exists? who needs that? When I, as a user or developer, should use that tool?

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
ops/latest-core-testnet-sync/run-worker.sh (1)

13-24: 🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

Take the worker lock before pruning logs.

A second timer invocation still runs find ... rm -rf before failing flock; with short retention or an unusually long run, it can delete the active run directory before exiting.

Proposed fix
 mkdir -p "${LATEST_CORE_TESTNET_STATE_DIR}" "${LATEST_CORE_TESTNET_LOG_DIR}"
 
-find "${LATEST_CORE_TESTNET_LOG_DIR}" \
-  -mindepth 1 \
-  -maxdepth 1 \
-  -type d \
-  -mtime +"${LATEST_CORE_TESTNET_LOG_RETENTION_DAYS}" \
-  -exec rm -rf {} +
-
 exec 9>"${LATEST_CORE_TESTNET_STATE_DIR}/run.lock"
 if ! flock -n 9; then
   echo "latest Core testnet sync is already running"
   exit 0
 fi
+
+find "${LATEST_CORE_TESTNET_LOG_DIR}" \
+  -mindepth 1 \
+  -maxdepth 1 \
+  -type d \
+  -mtime +"${LATEST_CORE_TESTNET_LOG_RETENTION_DAYS}" \
+  -exec rm -rf {} +
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ops/latest-core-testnet-sync/run-worker.sh` around lines 13 - 24, The lock
acquisition in run-worker.sh happens after the log-pruning find/rm -rf block,
which allows a concurrent timer invocation to delete the active run directory
before flock fails. Move the run.lock open and flock -n 9 check to the start of
the worker flow, before any pruning, and keep the existing latest Core testnet
sync log cleanup logic gated behind the lock so only the active worker can
prune.
ops/latest-core-testnet-sync/install-worker.sh (1)

50-57: 🔒 Security & Privacy | 🟠 Major | ⚡ Quick win

Force root ownership on existing env files.

chmod 0600 is not enough if /etc/latest-core-testnet-sync.env already exists with a non-root owner; that owner can still read or modify GITHUB_TOKEN and command configuration.

Proposed fix
 if [[ ! -f /etc/latest-core-testnet-sync.env ]]; then
   install -m 0600 \
     "${SCRIPT_DIR}/latest-core-testnet-sync.env.example" \
     /etc/latest-core-testnet-sync.env
   echo "Created /etc/latest-core-testnet-sync.env; fill it before enabling the timer."
 else
+  chown root:root /etc/latest-core-testnet-sync.env
   chmod 0600 /etc/latest-core-testnet-sync.env
 fi
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ops/latest-core-testnet-sync/install-worker.sh` around lines 50 - 57, The
existing env-file handling in install-worker.sh only tightens permissions with
chmod, but it does not ensure /etc/latest-core-testnet-sync.env is owned by
root. Update the existing-file branch so the setup enforces root ownership for
the env file (using the install/ownership path in the script) before setting
restrictive permissions, and keep the creation path in the same install flow for
latest-core-testnet-sync.env.example.
♻️ Duplicate comments (1)
.github/scripts/latest-core-testnet-sync/run.cjs (1)

468-483: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Don’t throw from the finally block.

Biome flags Line 481, and this also skips the final status log path. Record the publish failure in metadata, then handle the nonzero exit after finally.

Proposed fix
   let finalStatus = 'sync_passed';
+  let statusPublishError;
@@
     } catch (error) {
       metadata.status_publish_error = error.message;
       metadata.status_publish_failed_at = now();
       writeMetadata(metadata);
-      throw error;
+      statusPublishError = error;
     }
     writeMetadata(metadata);
   }
 
   console.log(`${STATUS_CONTEXT}: ${STATUS_LABELS[finalStatus]}`);
 
-  if (finalStatus !== 'sync_passed') {
+  if (statusPublishError) {
+    console.error(`Failed to publish final commit status: ${statusPublishError.message}`);
+    process.exitCode = 1;
+  } else if (finalStatus !== 'sync_passed') {
     process.exitCode = 1;
   }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/scripts/latest-core-testnet-sync/run.cjs around lines 468 - 483, The
publish-status flow in the `publishCommitStatus`/metadata update block should
not rethrow from the cleanup path, because that bypasses the final status
logging and causes Biome to flag the `finally`-style handling. Keep recording
the failure details in `metadata.status_publish_error` and
`metadata.status_publish_failed_at`, ensure `writeMetadata(metadata)` still
runs, and move the nonzero exit/error propagation to after the final status log
path so the script can complete cleanup before exiting.

Source: Linters/SAST tools

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@ops/latest-core-testnet-sync/install-worker.sh`:
- Around line 28-38: The current preflight in install-worker.sh only checks that
the repo root is a writable git checkout, but it can still miss root-owned Git
internals and a missing or non-executable run-worker.sh entrypoint. Update the
validation near the existing PLATFORM_REPO_DIR and PLATFORM_SYNC_USER checks to
also verify write access to the Git data paths used by the service, such as
.git/objects and .git/refs, and confirm the
ops/latest-core-testnet-sync/run-worker.sh script exists and is executable in
${PLATFORM_REPO_DIR}. Keep the checks alongside the existing preflight logic so
failures are caught before the unit starts.

---

Outside diff comments:
In `@ops/latest-core-testnet-sync/install-worker.sh`:
- Around line 50-57: The existing env-file handling in install-worker.sh only
tightens permissions with chmod, but it does not ensure
/etc/latest-core-testnet-sync.env is owned by root. Update the existing-file
branch so the setup enforces root ownership for the env file (using the
install/ownership path in the script) before setting restrictive permissions,
and keep the creation path in the same install flow for
latest-core-testnet-sync.env.example.

In `@ops/latest-core-testnet-sync/run-worker.sh`:
- Around line 13-24: The lock acquisition in run-worker.sh happens after the
log-pruning find/rm -rf block, which allows a concurrent timer invocation to
delete the active run directory before flock fails. Move the run.lock open and
flock -n 9 check to the start of the worker flow, before any pruning, and keep
the existing latest Core testnet sync log cleanup logic gated behind the lock so
only the active worker can prune.

---

Duplicate comments:
In @.github/scripts/latest-core-testnet-sync/run.cjs:
- Around line 468-483: The publish-status flow in the
`publishCommitStatus`/metadata update block should not rethrow from the cleanup
path, because that bypasses the final status logging and causes Biome to flag
the `finally`-style handling. Keep recording the failure details in
`metadata.status_publish_error` and `metadata.status_publish_failed_at`, ensure
`writeMetadata(metadata)` still runs, and move the nonzero exit/error
propagation to after the final status log path so the script can complete
cleanup before exiting.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9249bc37-cdf7-4e25-91ef-a9fd94a1c30c

📥 Commits

Reviewing files that changed from the base of the PR and between 1749712 and 06c6405.

📒 Files selected for processing (8)
  • .github/scripts/latest-core-testnet-sync/run.cjs
  • .github/workflows/latest-core-testnet-sync-status.yml
  • docs/NIGHTLY_LATEST_CORE_TESTNET_SYNC.md
  • ops/latest-core-testnet-sync/README.md
  • ops/latest-core-testnet-sync/install-worker.sh
  • ops/latest-core-testnet-sync/latest-core-testnet-sync.env.example
  • ops/latest-core-testnet-sync/latest-core-testnet-sync.service
  • ops/latest-core-testnet-sync/run-worker.sh
✅ Files skipped from review due to trivial changes (3)
  • docs/NIGHTLY_LATEST_CORE_TESTNET_SYNC.md
  • ops/latest-core-testnet-sync/README.md
  • ops/latest-core-testnet-sync/latest-core-testnet-sync.env.example
🚧 Files skipped from review as they are similar to previous changes (2)
  • .github/workflows/latest-core-testnet-sync-status.yml
  • ops/latest-core-testnet-sync/latest-core-testnet-sync.service

Comment on lines +28 to +38
if [[ ! -d "${PLATFORM_REPO_DIR}/.git" ]]; then
echo "PLATFORM_REPO_DIR must be an existing git checkout: ${PLATFORM_REPO_DIR}" >&2
echo "Clone dashpay/platform there before running this installer." >&2
exit 1
fi

if ! runuser -u "${PLATFORM_SYNC_USER}" -- test -w "${PLATFORM_REPO_DIR}"; then
echo "PLATFORM_REPO_DIR must be writable by ${PLATFORM_SYNC_USER}: ${PLATFORM_REPO_DIR}" >&2
echo "Run: chown -R ${PLATFORM_SYNC_USER}:${PLATFORM_SYNC_GROUP} ${PLATFORM_REPO_DIR}" >&2
exit 1
fi

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Validate the runtime entrypoint and Git write paths.

This preflight can pass with a writable repo root while .git/objects or .git/refs remains root-owned, and it does not ensure the service’s ops/latest-core-testnet-sync/run-worker.sh entrypoint exists/executable in ${PLATFORM_REPO_DIR}. The unit can then fail immediately or during workspace prep.

Proposed fix
 if [[ ! -d "${PLATFORM_REPO_DIR}/.git" ]]; then
   echo "PLATFORM_REPO_DIR must be an existing git checkout: ${PLATFORM_REPO_DIR}" >&2
   echo "Clone dashpay/platform there before running this installer." >&2
   exit 1
 fi
 
-if ! runuser -u "${PLATFORM_SYNC_USER}" -- test -w "${PLATFORM_REPO_DIR}"; then
+WORKER_ENTRYPOINT="${PLATFORM_REPO_DIR}/ops/latest-core-testnet-sync/run-worker.sh"
+if ! runuser -u "${PLATFORM_SYNC_USER}" -- test -x "${WORKER_ENTRYPOINT}"; then
+  echo "Worker entrypoint must exist and be executable by ${PLATFORM_SYNC_USER}: ${WORKER_ENTRYPOINT}" >&2
+  exit 1
+fi
+
+if ! runuser -u "${PLATFORM_SYNC_USER}" -- bash -c '
+  set -euo pipefail
+  repo=$1
+  test -w "$repo"
+  test -w "$repo/.git"
+  test -w "$repo/.git/objects"
+  test -w "$repo/.git/refs"
+' _ "${PLATFORM_REPO_DIR}"; then
   echo "PLATFORM_REPO_DIR must be writable by ${PLATFORM_SYNC_USER}: ${PLATFORM_REPO_DIR}" >&2
   echo "Run: chown -R ${PLATFORM_SYNC_USER}:${PLATFORM_SYNC_GROUP} ${PLATFORM_REPO_DIR}" >&2
   exit 1
 fi
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if [[ ! -d "${PLATFORM_REPO_DIR}/.git" ]]; then
echo "PLATFORM_REPO_DIR must be an existing git checkout: ${PLATFORM_REPO_DIR}" >&2
echo "Clone dashpay/platform there before running this installer." >&2
exit 1
fi
if ! runuser -u "${PLATFORM_SYNC_USER}" -- test -w "${PLATFORM_REPO_DIR}"; then
echo "PLATFORM_REPO_DIR must be writable by ${PLATFORM_SYNC_USER}: ${PLATFORM_REPO_DIR}" >&2
echo "Run: chown -R ${PLATFORM_SYNC_USER}:${PLATFORM_SYNC_GROUP} ${PLATFORM_REPO_DIR}" >&2
exit 1
fi
if [[ ! -d "${PLATFORM_REPO_DIR}/.git" ]]; then
echo "PLATFORM_REPO_DIR must be an existing git checkout: ${PLATFORM_REPO_DIR}" >&2
echo "Clone dashpay/platform there before running this installer." >&2
exit 1
fi
WORKER_ENTRYPOINT="${PLATFORM_REPO_DIR}/ops/latest-core-testnet-sync/run-worker.sh"
if ! runuser -u "${PLATFORM_SYNC_USER}" -- test -x "${WORKER_ENTRYPOINT}"; then
echo "Worker entrypoint must exist and be executable by ${PLATFORM_SYNC_USER}: ${WORKER_ENTRYPOINT}" >&2
exit 1
fi
if ! runuser -u "${PLATFORM_SYNC_USER}" -- bash -c '
set -euo pipefail
repo=$1
test -w "$repo"
test -w "$repo/.git"
test -w "$repo/.git/objects"
test -w "$repo/.git/refs"
' _ "${PLATFORM_REPO_DIR}"; then
echo "PLATFORM_REPO_DIR must be writable by ${PLATFORM_SYNC_USER}: ${PLATFORM_REPO_DIR}" >&2
echo "Run: chown -R ${PLATFORM_SYNC_USER}:${PLATFORM_SYNC_GROUP} ${PLATFORM_REPO_DIR}" >&2
exit 1
fi
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ops/latest-core-testnet-sync/install-worker.sh` around lines 28 - 38, The
current preflight in install-worker.sh only checks that the repo root is a
writable git checkout, but it can still miss root-owned Git internals and a
missing or non-executable run-worker.sh entrypoint. Update the validation near
the existing PLATFORM_REPO_DIR and PLATFORM_SYNC_USER checks to also verify
write access to the Git data paths used by the service, such as .git/objects and
.git/refs, and confirm the ops/latest-core-testnet-sync/run-worker.sh script
exists and is executable in ${PLATFORM_REPO_DIR}. Keep the checks alongside the
existing preflight logic so failures are caught before the unit starts.

@infraclaw-dash infraclaw-dash changed the title ci: add latest Core testnet sync worker ci: add Platform testnet sync worker Jun 24, 2026
@ktechmidas ktechmidas changed the title ci: add Platform testnet sync worker ci: add automated testnet sync worker Jun 24, 2026

@thepastaclaw thepastaclaw left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Incremental review reconciling 3 prior findings against head f6783a6. The JSON.stringify shell-quoting issue (prior-3) is now FIXED via env-var indirection plus validatePlatformBranch allow-list. The target_sha-from-env (prior-1) and workflow_dispatch trust-boundary (prior-2) issues are STILL VALID and carried forward. Two new in-scope issues were validated against the worker scripts: pre-lock log pruning can delete the active run directory, and install-worker.sh does not enforce root ownership on a pre-existing token-bearing env file.

🔴 2 blocking | 🟡 3 suggestion(s) | 💬 1 nitpick(s)

6 additional finding(s) omitted (not in diff).

🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.

In `.github/workflows/latest-core-testnet-sync-status.yml`:
- [BLOCKING] .github/workflows/latest-core-testnet-sync-status.yml:7-36: workflow_dispatch bypasses the operator-only worker trust boundary
  The workflow exposes workflow_dispatch with free-form status/target_sha inputs while the job carries statuses: write. Any user with repo write access can manually run it and publish 'Sync Passed' against any commit reachable in the repo, forging the 'Latest public Core testnet sync' context without the persistent worker ever running. The SHA format check and getCommit lookup only confirm the commit exists — they do not enforce that the worker produced the result. If this context is (or becomes) a required check, the manual path is a status-forgery vector. Remove the workflow_dispatch trigger or gate it behind a protected environment that requires operator approval.

In `ops/latest-core-testnet-sync/install-worker.sh`:
- [BLOCKING] ops/latest-core-testnet-sync/install-worker.sh:50-57: Existing /etc/latest-core-testnet-sync.env keeps prior ownership
  When /etc/latest-core-testnet-sync.env already exists, the installer only re-applies 0600 perms and does not chown to root:root. If the file was previously written by a non-root account (e.g. created manually before the installer ran), that account retains read/write access to GITHUB_TOKEN and can modify the phase commands EnvironmentFile-loaded by the service that later runs as root. The create path also relies on install's default ownership rather than explicitly forcing root:root. Force root:root ownership in both branches so the secret-bearing env file is never owned by a less-privileged account.

In `.github/scripts/latest-core-testnet-sync/run.cjs`:
- [SUGGESTION] .github/scripts/latest-core-testnet-sync/run.cjs:250-254: target_sha can still be inherited from env after the workspace reset
  After prepareWorkspace fetches/checks out/resets to origin/$PLATFORM_BRANCH, it re-derives the SHA via resolveTargetSha(), which returns process.env.TARGET_SHA || process.env.GITHUB_SHA || git rev-parse HEAD. If either env var is present in the worker environment (systemd EnvironmentFile, prior shell, CI re-invocation), metadata.target_sha/platform_sha and the published commit status will point at the inherited SHA even though the phases actually ran against the freshly reset branch tip. After the reset, take the SHA from git directly so metadata describes the commit that was actually tested. resolveTargetSha additionally never validates the value is a 40-char hex SHA, so a malformed env value silently propagates to the statuses API (which then 422s and burns the retry budget).
- [SUGGESTION] .github/scripts/latest-core-testnet-sync/run.cjs:374-389: publishCommitStatus retries on permanent HTTP failures
  publishCommitStatusOnce throws a generic `HTTP <status> <body>` error for every non-2xx, and publishCommitStatus retries up to STATUS_PUBLISH_ATTEMPTS times with exponential backoff for any error. 401 (bad token), 403 (missing scopes), 404 (repo/SHA missing), and 422 (malformed SHA) are deterministic failures that the retries cannot recover, so the worker just delays surfacing the real error. Inspect the response status and only retry transient classes (5xx, network errors, 429); fail fast on the rest.

In `ops/latest-core-testnet-sync/run-worker.sh`:
- [SUGGESTION] ops/latest-core-testnet-sync/run-worker.sh:13-24: Log pruning runs before the worker lock is acquired
  The destructive `find ... -mtime +RETENTION -exec rm -rf {} +` block executes before `flock -n 9`. A second timer invocation can therefore prune log directories while another sync is in flight, then exit when flock fails. If LATEST_CORE_TESTNET_LOG_RETENTION_DAYS is set low or a run exceeds the retention window, the active run's log directory (LATEST_CORE_TESTNET_SYNC_RUN_DIR under LATEST_CORE_TESTNET_LOG_DIR) can be removed mid-run, breaking the orchestrator's appendLog/writeMetadata calls. Move pruning inside the flock-held critical section so only the active worker performs cleanup.

@thepastaclaw thepastaclaw left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Cumulative incremental review for head 06c6405 against prior reviewed commit f6783a6. The latest delta is a scope-narrowing rename (core-sync → core-readiness) and does not touch any consensus-critical code. All six prior findings remain STILL VALID at HEAD: two blocking trust-boundary/permission issues (workflow_dispatch status forgery, env file ownership), three suggestion-level operational correctness issues, and one nitpick. One additional new suggestion is added for the initial target_sha being resolved from pre-reset HEAD, which is adjacent to but distinct from prior-3.

🔴 2 blocking | 🟡 3 suggestion(s) | 💬 1 nitpick(s)

6 additional finding(s) omitted (not in diff).

🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.

In `.github/workflows/latest-core-testnet-sync-status.yml`:
- [BLOCKING] .github/workflows/latest-core-testnet-sync-status.yml:7-36: workflow_dispatch bypasses the operator-only worker trust boundary
  The workflow exposes a manual `workflow_dispatch` path that accepts caller-provided `status` and `target_sha` while the job carries `statuses: write`. Any repo user with workflow_dispatch permission can publish `Sync Passed` against any commit reachable in the repository without the persistent worker ever running. The 40-hex regex and `getCommit` lookup only prove the SHA exists; they do not prove a sync ran. If maintainers, branch protection, or downstream automation treat the `Latest public Core testnet sync` context as evidence of a real sync, this manual path forges that signal. Either drop `workflow_dispatch` entirely (the persistent worker uses `repository_dispatch`) or gate the job behind a protected environment requiring operator approval.

In `ops/latest-core-testnet-sync/install-worker.sh`:
- [BLOCKING] ops/latest-core-testnet-sync/install-worker.sh:50-57: Existing /etc/latest-core-testnet-sync.env keeps prior ownership
  When `/etc/latest-core-testnet-sync.env` already exists, the installer only re-applies mode `0600` and leaves the current owner intact. If the file was created or copied by a non-root account before this installer ran (or rotated in by hand), that account retains read access to `GITHUB_TOKEN` and write access to the phase commands the systemd service consumes. Because this file carries the statuses-write token and controls the commands the worker runs as root-installed unit, the installer should force `chown root:root` in both the create and existing-file paths.

In `.github/scripts/latest-core-testnet-sync/run.cjs`:
- [SUGGESTION] .github/scripts/latest-core-testnet-sync/run.cjs:142-254: target_sha can be reported for a commit that was never tested
  `resolveTargetSha()` (line 142) prefers `process.env.TARGET_SHA || process.env.GITHUB_SHA` over the working-tree HEAD, and it is called twice: once in `main()` at line 399 *before* the workspace is reset, and again inside `prepareWorkspace` at line 250 *after* the reset. Two failure modes follow: (1) if `TARGET_SHA`/`GITHUB_SHA` is present in the worker environment, both calls return the inherited SHA even though phases run against `origin/$PLATFORM_BRANCH`, so the published commit status points at a commit that was not actually tested; (2) if neither env var is set and `prepareWorkspace` fails (e.g. `git fetch` errors before the reset), the `finally` block publishes `Sync Failed` against the stale pre-fetch HEAD. Resolve the SHA strictly from `git rev-parse HEAD` after a successful reset, and use that single value for both metadata and the published status.
- [SUGGESTION] .github/scripts/latest-core-testnet-sync/run.cjs:374-389: publishCommitStatus retries permanent HTTP failures
  `publishCommitStatusOnce` throws a generic `Error` for every non-2xx response, and `publishCommitStatus` retries on any thrown error up to `STATUS_PUBLISH_ATTEMPTS`. Permanent failures such as 401, 403, 404, and 422 are deterministic and cannot recover, so the worker burns the entire retry budget (with exponential backoff up to 8 seconds on attempt 3) before surfacing what is really a misconfiguration. Inspect the response status and only retry on transient categories — network errors, 429, 502, 503, 504 — while failing fast on permanent HTTP errors.

In `ops/latest-core-testnet-sync/run-worker.sh`:
- [SUGGESTION] ops/latest-core-testnet-sync/run-worker.sh:13-24: Log pruning runs before the worker lock is acquired
  The destructive `find ... -mtime +RETENTION -exec rm -rf {} +` block runs before `flock -n 9` acquires `run.lock`. A second timer invocation triggered while another sync is in flight will execute the prune first, then exit because flock fails. If retention is low or a sync exceeds the retention window, that prune can remove the active run directory and break `appendLog`/`writeMetadata` mid-run. Acquire the flock first and perform pruning inside the single-worker critical section.

@thepastaclaw thepastaclaw left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Cumulative review at head 07c7e22. The latest delta is a rename from latest-core-testnet-sync to platform-testnet-sync; no behavior change. All six prior findings remain present in the renamed files and are carried forward. Two additional blocking security findings surfaced this round: the worker harness itself lives in the mutable Platform checkout that the same worker resets to the tested branch (so anyone who can land code on PLATFORM_BRANCH can modify the script that holds GITHUB_TOKEN), and phase commands inherit access to the parent Node process environment via /proc/$PPID/environ, defeating the child-env GITHUB_TOKEN/GH_TOKEN scrub. One symmetry suggestion is added for target_url validation.

🔴 3 blocking | 🟡 4 suggestion(s) | 💬 1 nitpick(s)

1 carried-forward finding(s) already raised on this PR; not re-posting as new inline comments.

🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.

In `ops/platform-testnet-sync/platform-testnet-sync.service`:
- [BLOCKING] ops/platform-testnet-sync/platform-testnet-sync.service:10-11: Worker harness executes from the same mutable checkout it resets to the tested branch
  The unit loads `/etc/platform-testnet-sync.env` (containing `GITHUB_TOKEN` with `statuses: write`) and then runs `ops/platform-testnet-sync/run-worker.sh` and `.github/scripts/platform-testnet-sync/run.cjs` from `PLATFORM_REPO_DIR`. That directory is the same checkout that `prepareWorkspace` later resets to `origin/$PLATFORM_BRANCH` (run.cjs:240). Consequently the harness that receives the token is itself branch-controlled code: any commit that lands on `$PLATFORM_BRANCH` can rewrite `run-worker.sh` or `run.cjs` to read `process.env.GITHUB_TOKEN` before any scrubbing happens. The trusted credential-handling code must live outside the mutable checkout — install it as root-owned files under `/usr/local/lib/platform-testnet-sync/` (or similar) and let only the phase commands execute from the checked-out branch.

In `.github/scripts/platform-testnet-sync/run.cjs`:
- [BLOCKING] .github/scripts/platform-testnet-sync/run.cjs:422-429: Phase commands can recover GITHUB_TOKEN from parent process /proc/$PPID/environ
  `sanitizePhaseEnv` and the explicit deletes at run.cjs:428-429 only strip `GITHUB_TOKEN`/`GH_TOKEN` from the child phase environment. The long-lived Node parent still holds those secrets in `process.env` because they were inherited from systemd's `EnvironmentFile` and are needed later for `publishCommitStatus`. On a standard Linux host, a same-UID child can read `/proc/$PPID/environ` (and scan other same-UID `/proc/*/environ`), so any code executed during `baseline-readiness`, `platform-build`, or `platform-sync` — including branch-controlled build scripts — can exfiltrate the token despite the child-env scrub. Move status publishing into an isolated helper (separate UID/container/process) that is invoked after phases finish, or hold the token in a file readable only after phases complete so it never lives in the credential-holding process's environment while untrusted code runs.
- [SUGGESTION] .github/scripts/platform-testnet-sync/run.cjs:142-254: target_sha can describe a commit that was never tested
  `resolveTargetSha` prefers `TARGET_SHA` and `GITHUB_SHA` over `git rev-parse HEAD`, and it runs at main() before workspace preparation (run.cjs:399) and again after the reset inside `prepareWorkspace` (run.cjs:250). If either env var is inherited by the worker, the initial metadata and any failure-path commit status published from the `finally` block point at that inherited SHA even though no phase has actually run against it. If `prepareWorkspace` fails before the reset and `TARGET_SHA` is unset, the published status describes the stale pre-fetch checkout HEAD. Resolve the tested SHA strictly from git after a successful reset, and avoid publishing a commit-specific status before a target commit has actually been selected.
- [SUGGESTION] .github/scripts/platform-testnet-sync/run.cjs:366-388: Status publisher retries permanent HTTP failures
  `publishCommitStatusOnce` throws the same generic `Error` for every non-2xx response, and `publishCommitStatus` retries every thrown error up to `STATUS_PUBLISH_ATTEMPTS`. Permanent failures such as 401, 403, 404, and 422 cannot recover with retry, so the worker burns the retry budget (plus exponential backoff sleeps) before surfacing what is actually a token/repository/SHA misconfiguration. Distinguish transient failures (network errors, 429, 5xx) from deterministic HTTP errors and fail fast on the latter.
- [SUGGESTION] .github/scripts/platform-testnet-sync/run.cjs:339-353: Worker does not validate target_url symmetrically with the receiver workflow
  The status workflow rejects any `target_url` whose origin is not `https://github.com` (workflow lines 87-101), but the worker forwards `metadata.target_url` and `PLATFORM_TESTNET_SYNC_TARGET_URL` verbatim to the createCommitStatus API. The env file is root-owned so the immediate risk is bounded, but the two reporting paths drift: a manual workflow_dispatch with a non-github URL is refused, while a misconfigured or accidentally typo'd env value can publish a commit status whose link points anywhere (http://, third-party CDN, internal phishing). Apply the same `parsedUrl.origin === 'https://github.com'` (or an explicit allowlist) check in the worker before forwarding.

In `ops/platform-testnet-sync/install-worker.sh`:
- [BLOCKING] ops/platform-testnet-sync/install-worker.sh:50-57: Pre-existing env file keeps prior ownership
  When `/etc/platform-testnet-sync.env` already exists, the installer only reapplies `chmod 0600` and leaves whatever ownership the file came with. If the file was created or copied by a non-root account before the installer ran, that account retains read access to `GITHUB_TOKEN` and write access to the phase commands that the systemd unit executes. Because the file carries the statuses-write token and controls the worker's behavior, the root-run installer should force `root:root` ownership in both the create and existing-file paths.

In `ops/platform-testnet-sync/run-worker.sh`:
- [SUGGESTION] ops/platform-testnet-sync/run-worker.sh:13-24: Log pruning runs before the worker lock is acquired
  The destructive `find ... -mtime +RETENTION -exec rm -rf {} +` block at lines 13-18 runs before `flock -n 9` acquires `run.lock` at line 20. A second timer invocation while a sync is in flight prunes first and only then exits because the lock is held. If retention is configured low or a sync exceeds the retention window, the prune can remove the active run directory, breaking `appendLog`/`writeMetadata` in the running worker. Acquire the lock before pruning so only the active worker performs retention cleanup.

Comment on lines +10 to +11
EnvironmentFile=/etc/platform-testnet-sync.env
ExecStart=/bin/bash -lc 'cd "${PLATFORM_REPO_DIR:-/opt/dash-platform}" && exec ops/platform-testnet-sync/run-worker.sh'

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Blocking: Worker harness executes from the same mutable checkout it resets to the tested branch

The unit loads /etc/platform-testnet-sync.env (containing GITHUB_TOKEN with statuses: write) and then runs ops/platform-testnet-sync/run-worker.sh and .github/scripts/platform-testnet-sync/run.cjs from PLATFORM_REPO_DIR. That directory is the same checkout that prepareWorkspace later resets to origin/$PLATFORM_BRANCH (run.cjs:240). Consequently the harness that receives the token is itself branch-controlled code: any commit that lands on $PLATFORM_BRANCH can rewrite run-worker.sh or run.cjs to read process.env.GITHUB_TOKEN before any scrubbing happens. The trusted credential-handling code must live outside the mutable checkout — install it as root-owned files under /usr/local/lib/platform-testnet-sync/ (or similar) and let only the phase commands execute from the checked-out branch.

source: ['codex']

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved in 3546112Worker harness executes from the same mutable checkout it resets to the tested branch no longer present.

Auto-resolved by the review system based on the latest commit diff. If you believe this was closed in error, reopen the thread.

Comment on lines +422 to +429
const phaseEnv = {
...process.env,
CORE_VERSION: metadata.core_version,
PLATFORM_SHA: metadata.platform_sha,
PLATFORM_TESTNET_SYNC_RUN_DIR: runDir,
};
delete phaseEnv.GITHUB_TOKEN;
delete phaseEnv.GH_TOKEN;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Blocking: Phase commands can recover GITHUB_TOKEN from parent process /proc/$PPID/environ

sanitizePhaseEnv and the explicit deletes at run.cjs:428-429 only strip GITHUB_TOKEN/GH_TOKEN from the child phase environment. The long-lived Node parent still holds those secrets in process.env because they were inherited from systemd's EnvironmentFile and are needed later for publishCommitStatus. On a standard Linux host, a same-UID child can read /proc/$PPID/environ (and scan other same-UID /proc/*/environ), so any code executed during baseline-readiness, platform-build, or platform-sync — including branch-controlled build scripts — can exfiltrate the token despite the child-env scrub. Move status publishing into an isolated helper (separate UID/container/process) that is invoked after phases finish, or hold the token in a file readable only after phases complete so it never lives in the credential-holding process's environment while untrusted code runs.

source: ['codex']

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved in 3546112Phase commands can recover GITHUB_TOKEN from parent process /proc/$PPID/environ no longer present.

Auto-resolved by the review system based on the latest commit diff. If you believe this was closed in error, reopen the thread.

Comment on lines +50 to +57
if [[ ! -f /etc/platform-testnet-sync.env ]]; then
install -m 0600 \
"${SCRIPT_DIR}/platform-testnet-sync.env.example" \
/etc/platform-testnet-sync.env
echo "Created /etc/platform-testnet-sync.env; fill it before enabling the timer."
else
chmod 0600 /etc/platform-testnet-sync.env
fi

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Blocking: Pre-existing env file keeps prior ownership

When /etc/platform-testnet-sync.env already exists, the installer only reapplies chmod 0600 and leaves whatever ownership the file came with. If the file was created or copied by a non-root account before the installer ran, that account retains read access to GITHUB_TOKEN and write access to the phase commands that the systemd unit executes. Because the file carries the statuses-write token and controls the worker's behavior, the root-run installer should force root:root ownership in both the create and existing-file paths.

Suggested change
if [[ ! -f /etc/platform-testnet-sync.env ]]; then
install -m 0600 \
"${SCRIPT_DIR}/platform-testnet-sync.env.example" \
/etc/platform-testnet-sync.env
echo "Created /etc/platform-testnet-sync.env; fill it before enabling the timer."
else
chmod 0600 /etc/platform-testnet-sync.env
fi
if [[ ! -f /etc/platform-testnet-sync.env ]]; then
install -o root -g root -m 0600 \
"${SCRIPT_DIR}/platform-testnet-sync.env.example" \
/etc/platform-testnet-sync.env
echo "Created /etc/platform-testnet-sync.env; fill it before enabling the timer."
else
chown root:root /etc/platform-testnet-sync.env
chmod 0600 /etc/platform-testnet-sync.env
fi

source: ['claude', 'codex']

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved in 3546112Pre-existing env file keeps prior ownership no longer present.

Auto-resolved by the review system based on the latest commit diff. If you believe this was closed in error, reopen the thread.

Comment on lines +142 to +254
function resolveTargetSha() {
return process.env.TARGET_SHA
|| process.env.GITHUB_SHA
|| gitOutput(['rev-parse', 'HEAD']);
}

async function runCommand(name, command, env, timeoutMinutes, options = {}) {
if (!command || !command.trim()) {
throw new Error(`Missing command for phase: ${name}`);
}

const { captureStdout = false } = options;
const logFileName = `${name}.log`;
appendLog(logFileName, `$ ${command}\n\n`);

const timeoutMs = timeoutMinutes * 60 * 1000;
const startedAt = Date.now();

return new Promise((resolve, reject) => {
const child = spawn('bash', ['-c', `set -euo pipefail\n${command}`], {
cwd: repoRoot,
env,
stdio: ['ignore', 'pipe', 'pipe'],
detached: true,
});

let timedOut = false;
let settled = false;
let killTimeout;
let stdout = '';

const timeout = setTimeout(() => {
timedOut = true;
killProcessGroup(child, 'SIGTERM');
killTimeout = setTimeout(() => {
if (!settled) {
killProcessGroup(child, 'SIGKILL');
}
}, 30_000);
}, timeoutMs);

child.stdout.on('data', (chunk) => {
process.stdout.write(chunk);
if (captureStdout) {
stdout += chunk.toString();
}
appendLog(logFileName, chunk.toString());
});

child.stderr.on('data', (chunk) => {
process.stderr.write(chunk);
appendLog(logFileName, chunk.toString());
});

child.on('error', (error) => {
settled = true;
clearTimeout(timeout);
clearTimeout(killTimeout);
reject(error);
});

child.on('close', (code, signal) => {
settled = true;
clearTimeout(timeout);
clearTimeout(killTimeout);
const durationSeconds = Math.round((Date.now() - startedAt) / 1000);
appendLog(logFileName, `\nexit_code=${code} signal=${signal || ''} duration_seconds=${durationSeconds}\n`);

if (timedOut) {
reject(new Error(`${name} timed out after ${timeoutMinutes} minutes`));
return;
}

if (code === 0) {
resolve({ stdout, durationSeconds });
return;
}

reject(new Error(`${name} exited with code ${code}${signal ? ` (${signal})` : ''}`));
});
});
}

async function prepareWorkspace(metadata, timeoutMinutes) {
if (process.env.PLATFORM_TESTNET_SYNC_SKIP_WORKSPACE_PREP === '1') {
return;
}

const branch = process.env.PLATFORM_BRANCH || 'v3.1-dev';
metadata.phase = 'workspace-prepare';
writeMetadata(metadata);
validatePlatformBranch(branch);

await runCommand(
'workspace-prepare',
[
'git fetch --prune origin "$PLATFORM_BRANCH"',
'git checkout "$PLATFORM_BRANCH"',
'git reset --hard "origin/$PLATFORM_BRANCH"',
'git clean -ffdx -e .platform-testnet-sync/',
].join('\n'),
{
...sanitizePhaseEnv(process.env),
PLATFORM_BRANCH: branch,
},
timeoutMinutes,
);

const targetSha = resolveTargetSha();
metadata.target_sha = targetSha;
metadata.platform_sha = targetSha;
process.env.TARGET_SHA = targetSha;
writeMetadata(metadata);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: target_sha can describe a commit that was never tested

resolveTargetSha prefers TARGET_SHA and GITHUB_SHA over git rev-parse HEAD, and it runs at main() before workspace preparation (run.cjs:399) and again after the reset inside prepareWorkspace (run.cjs:250). If either env var is inherited by the worker, the initial metadata and any failure-path commit status published from the finally block point at that inherited SHA even though no phase has actually run against it. If prepareWorkspace fails before the reset and TARGET_SHA is unset, the published status describes the stale pre-fetch checkout HEAD. Resolve the tested SHA strictly from git after a successful reset, and avoid publishing a commit-specific status before a target commit has actually been selected.

source: ['claude', 'codex']

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved in 3546112target_sha can describe a commit that was never tested no longer present.

Auto-resolved by the review system based on the latest commit diff. If you believe this was closed in error, reopen the thread.

Comment thread ops/platform-testnet-sync/run-worker.sh Outdated
Comment on lines +13 to +24
find "${PLATFORM_TESTNET_SYNC_LOG_DIR}" \
-mindepth 1 \
-maxdepth 1 \
-type d \
-mtime +"${PLATFORM_TESTNET_SYNC_LOG_RETENTION_DAYS}" \
-exec rm -rf {} +

exec 9>"${PLATFORM_TESTNET_SYNC_STATE_DIR}/run.lock"
if ! flock -n 9; then
echo "Platform testnet sync is already running"
exit 0
fi

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: Log pruning runs before the worker lock is acquired

The destructive find ... -mtime +RETENTION -exec rm -rf {} + block at lines 13-18 runs before flock -n 9 acquires run.lock at line 20. A second timer invocation while a sync is in flight prunes first and only then exits because the lock is held. If retention is configured low or a sync exceeds the retention window, the prune can remove the active run directory, breaking appendLog/writeMetadata in the running worker. Acquire the lock before pruning so only the active worker performs retention cleanup.

Suggested change
find "${PLATFORM_TESTNET_SYNC_LOG_DIR}" \
-mindepth 1 \
-maxdepth 1 \
-type d \
-mtime +"${PLATFORM_TESTNET_SYNC_LOG_RETENTION_DAYS}" \
-exec rm -rf {} +
exec 9>"${PLATFORM_TESTNET_SYNC_STATE_DIR}/run.lock"
if ! flock -n 9; then
echo "Platform testnet sync is already running"
exit 0
fi
exec 9>"${PLATFORM_TESTNET_SYNC_STATE_DIR}/run.lock"
if ! flock -n 9; then
echo "Platform testnet sync is already running"
exit 0
fi
find "${PLATFORM_TESTNET_SYNC_LOG_DIR}" \
-mindepth 1 \
-maxdepth 1 \
-type d \
-mtime +"${PLATFORM_TESTNET_SYNC_LOG_RETENTION_DAYS}" \
-exec rm -rf {} +

source: ['claude', 'codex']

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved in 3546112Log pruning runs before the worker lock is acquired no longer present.

Auto-resolved by the review system based on the latest commit diff. If you believe this was closed in error, reopen the thread.

Comment on lines +366 to +388
if (!response.ok) {
const errorBody = await response.text();
throw new Error(`Unable to publish commit status: HTTP ${response.status} ${errorBody}`);
}

return 'published';
}

async function publishCommitStatus(status, metadata) {
let lastError;
for (let attempt = 1; attempt <= STATUS_PUBLISH_ATTEMPTS; attempt += 1) {
try {
return await publishCommitStatusOnce(status, metadata);
} catch (error) {
lastError = error;
console.error(`Commit status publish attempt ${attempt} failed: ${error.message}`);
if (attempt < STATUS_PUBLISH_ATTEMPTS) {
await sleep(2 ** attempt * 1000);
}
}
}

throw lastError;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: Status publisher retries permanent HTTP failures

publishCommitStatusOnce throws the same generic Error for every non-2xx response, and publishCommitStatus retries every thrown error up to STATUS_PUBLISH_ATTEMPTS. Permanent failures such as 401, 403, 404, and 422 cannot recover with retry, so the worker burns the retry budget (plus exponential backoff sleeps) before surfacing what is actually a token/repository/SHA misconfiguration. Distinguish transient failures (network errors, 429, 5xx) from deterministic HTTP errors and fail fast on the latter.

source: ['claude', 'codex']

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved in 3546112Status publisher retries permanent HTTP failures no longer present.

Auto-resolved by the review system based on the latest commit diff. If you believe this was closed in error, reopen the thread.

Comment on lines +339 to +353
const targetUrl = metadata.target_url
|| process.env.PLATFORM_TESTNET_SYNC_TARGET_URL
|| (process.env.GITHUB_RUN_ID
? `${process.env.GITHUB_SERVER_URL || 'https://github.com'}/${repository}/actions/runs/${process.env.GITHUB_RUN_ID}`
: undefined);

const body = {
state: status === 'sync_passed' ? 'success' : 'failure',
context: STATUS_CONTEXT,
description: statusDescription(status, metadata),
};

if (targetUrl) {
body.target_url = targetUrl;
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: Worker does not validate target_url symmetrically with the receiver workflow

The status workflow rejects any target_url whose origin is not https://github.com (workflow lines 87-101), but the worker forwards metadata.target_url and PLATFORM_TESTNET_SYNC_TARGET_URL verbatim to the createCommitStatus API. The env file is root-owned so the immediate risk is bounded, but the two reporting paths drift: a manual workflow_dispatch with a non-github URL is refused, while a misconfigured or accidentally typo'd env value can publish a commit status whose link points anywhere (http://, third-party CDN, internal phishing). Apply the same parsedUrl.origin === 'https://github.com' (or an explicit allowlist) check in the worker before forwarding.

source: ['claude']

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved in 3546112Worker does not validate target_url symmetrically with the receiver workflow no longer present.

Auto-resolved by the review system based on the latest commit diff. If you believe this was closed in error, reopen the thread.

Comment on lines +334 to +337
const repository = process.env.GITHUB_REPOSITORY || 'dashpay/platform';
if (!repository) {
throw new Error('GITHUB_REPOSITORY is required to publish commit status');
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💬 Nitpick: Unreachable !repository guard after default fallback

const repository = process.env.GITHUB_REPOSITORY || 'dashpay/platform' always produces a non-empty string, so the following if (!repository) throw is dead code. The fallback also silently routes status posts to dashpay/platform when GITHUB_REPOSITORY is unset, which can mask a worker configured for a fork or staging clone. Either require GITHUB_REPOSITORY and drop the default, or drop the dead guard — keeping both is misleading.

Suggested change
const repository = process.env.GITHUB_REPOSITORY || 'dashpay/platform';
if (!repository) {
throw new Error('GITHUB_REPOSITORY is required to publish commit status');
}
const repository = process.env.GITHUB_REPOSITORY;
if (!repository) {
throw new Error('GITHUB_REPOSITORY is required to publish commit status');
}

source: ['claude', 'codex']

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved in 3546112Unreachable !repository guard after default fallback no longer present.

Auto-resolved by the review system based on the latest commit diff. If you believe this was closed in error, reopen the thread.

@infraclaw-dash infraclaw-dash changed the title ci: add automated testnet sync worker ci: add Platform testnet sync status reporting Jun 24, 2026

@thepastaclaw thepastaclaw left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Cumulative PR is now narrow: a status receiver workflow, README badge, and docs. The previously flagged worker harness (run.cjs, systemd unit, install/run shell, env example) was deleted in 3546112, resolving 8 of 9 prior findings. One prior issue remains: the surviving workflow_dispatch entry on platform-testnet-sync-status.yml lets any actor with workflow-dispatch privilege forge a 'Sync Passed' commit status on an arbitrary in-repo SHA, since payload validation only proves the SHA exists. Two independent Codex passes re-flagged this; carrying forward as blocking.

🔴 1 blocking

🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.

In `.github/workflows/platform-testnet-sync-status.yml`:
- [BLOCKING] .github/workflows/platform-testnet-sync-status.yml:7-40: workflow_dispatch path can forge 'Platform testnet sync' commit statuses
  The job is granted statuses: write and accepts workflow_dispatch input directly for status, target_sha, target_url, core_version, platform_sha, and completed_at. The only checks performed are the enum match on status, a 40-hex regex on the SHA, repos.getCommit (which only proves the SHA exists in the repo), and an origin restriction on target_url. None of those prove the external sync worker actually ran against that commit. Any account with workflow-dispatch privilege on this repo (broader than the operator credential the design intends) can therefore publish state=success, context='Platform testnet sync' on an arbitrary commit. If branch protection, dashboards, or release automation treat this context as authoritative — and docs/PLATFORM_TESTNET_SYNC.md presents it that way — this is a status-forgery path that bypasses the worker boundary. Either drop the workflow_dispatch trigger (rely on repository_dispatch from the operator-held credential only), or gate the manual path behind a protected environment with a required reviewer and a restricted actor allowlist. Note: per the file-handling constraint, suggestion field is null; do not interpret that as agreement to leave the trigger in place.

Comment on lines +7 to +40
workflow_dispatch:
inputs:
status:
description: Final completed run status
required: true
type: choice
options:
- sync_passed
- build_failed
- sync_failed
target_sha:
description: Platform commit SHA to report against
required: false
type: string
target_url:
description: URL for run logs or status details
required: false
type: string
core_version:
description: Public Core version used by the run
required: false
type: string
platform_sha:
description: Platform commit SHA built and synced
required: false
type: string
completed_at:
description: Completion timestamp, preferably ISO 8601 UTC
required: false
type: string

permissions:
contents: read
statuses: write

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Blocking: workflow_dispatch path can forge 'Platform testnet sync' commit statuses

The job is granted statuses: write and accepts workflow_dispatch input directly for status, target_sha, target_url, core_version, platform_sha, and completed_at. The only checks performed are the enum match on status, a 40-hex regex on the SHA, repos.getCommit (which only proves the SHA exists in the repo), and an origin restriction on target_url. None of those prove the external sync worker actually ran against that commit. Any account with workflow-dispatch privilege on this repo (broader than the operator credential the design intends) can therefore publish state=success, context='Platform testnet sync' on an arbitrary commit. If branch protection, dashboards, or release automation treat this context as authoritative — and docs/PLATFORM_TESTNET_SYNC.md presents it that way — this is a status-forgery path that bypasses the worker boundary. Either drop the workflow_dispatch trigger (rely on repository_dispatch from the operator-held credential only), or gate the manual path behind a protected environment with a required reviewer and a restricted actor allowlist. Note: per the file-handling constraint, suggestion field is null; do not interpret that as agreement to leave the trigger in place.

source: ['codex']

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants