ci: add Platform testnet sync status reporting#3958
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
✅ Files skipped from review due to trivial changes (1)
📝 WalkthroughWalkthroughAdds a ChangesPlatform Testnet Sync Status Reporter
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
✅ Review complete (commit 3546112) |
There was a problem hiding this comment.
Actionable comments posted: 8
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/scripts/latest-core-testnet-sync/run.cjs:
- Around line 123-129: `LATEST_CORE_TESTNET_PHASE_TIMEOUT_MINUTES` is passed
through `Number(...)` without validation, so invalid input can become `NaN` and
break the timeout flow; update the timeout handling in `run.cjs` around
`runCommand` and the later `setTimeout` usage to parse the env var safely,
reject non-finite or non-positive values, and fall back to the existing default
(30) when the value is missing or invalid.
- Around line 79-114: The timeout handling in runPhase rejects immediately after
sending SIGTERM, which can allow later cleanup in main() to finish while the
child is still running. Update the runPhase lifecycle in run.cjs so the timeout
path waits for the spawned child process to actually exit (or escalates if
needed) before rejecting, using the existing child.on('close') flow and the
SIGTERM timeout callback.
In @.github/workflows/latest-core-testnet-sync-status.yml:
- Around line 53-71: The payload handling in the workflow entrypoint needs to
guard against missing repository_dispatch client payload and avoid defaulting to
context.sha. Update the logic around the dispatch payload parsing and the
payload.status access so repository_dispatch without client_payload fails
cleanly instead of throwing, and require an explicit target SHA by validating
target_sha or platform_sha before continuing. Keep the fix localized to the
payload selection and targetSha resolution in the workflow script.
- Line 48: The workflow step using actions/github-script is pinned to a mutable
version tag, so update the github-script reference in the
latest-core-testnet-sync-status workflow to a specific commit SHA instead of
`@v7`. Locate the action usage in the workflow step and replace the tag with the
repository’s approved full SHA pin to prevent supply-chain drift.
In `@ops/latest-core-testnet-sync/install-worker.sh`:
- Around line 27-36: The installer currently creates PLATFORM_REPO_DIR and then
copies service assets from inside it without ensuring that path already contains
a writable git checkout. Update install-worker.sh to either locate the systemd
files relative to the installer script itself and add an explicit validation
that PLATFORM_REPO_DIR is an existing writable checkout, or make the installer
provision that checkout before copy/install steps. Reference the existing
install -d and install -m steps, plus the run-worker.sh expectation for
platform-sync, so the source path and runtime checkout requirements stay
consistent.
In `@ops/latest-core-testnet-sync/latest-core-testnet-sync.service`:
- Around line 10-12: The service unit hardcodes the repo path instead of
honoring PLATFORM_REPO_DIR from its environment file. Update the
latest-core-testnet-sync.service definition so WorkingDirectory and the
ExecStart script path are derived from the environment-backed repo directory
used by latest-core-testnet-sync.env and install-worker.sh, or otherwise remove
the configurable path assumption; use the unit’s existing EnvironmentFile and
the run-worker.sh launch to keep the path consistent.
In `@ops/latest-core-testnet-sync/run-worker.sh`:
- Around line 18-30: The repo-prep steps in run-worker.sh can fail before
run.cjs starts, which skips the final failure status and leaves the old result
visible. Move the git fetch/git checkout/git reset flow into run.cjs so it uses
the same status handling, or add a shell trap around the repo setup in
run-worker.sh that reports sync_failed on any early exit. Keep the existing
TARGET_SHA/run ID setup and make sure the failure path still posts the final
status.
- Around line 20-22: The sync script leaves behind untracked build artifacts
because the current git reset flow only restores tracked files. Update the
cleanup sequence in run-worker.sh around the existing git fetch/git checkout/git
reset logic to also remove untracked files and directories before each run,
using the same checkout context so every sync starts from a fully clean
workspace.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 8aaaeede-69bc-4d57-b7b8-d975ae10c56c
📒 Files selected for processing (10)
.github/scripts/latest-core-testnet-sync/run.cjs.github/workflows/latest-core-testnet-sync-status.yml.gitignoredocs/NIGHTLY_LATEST_CORE_TESTNET_SYNC.mdops/latest-core-testnet-sync/README.mdops/latest-core-testnet-sync/install-worker.shops/latest-core-testnet-sync/latest-core-testnet-sync.env.exampleops/latest-core-testnet-sync/latest-core-testnet-sync.serviceops/latest-core-testnet-sync/latest-core-testnet-sync.timerops/latest-core-testnet-sync/run-worker.sh
thepastaclaw
left a comment
There was a problem hiding this comment.
Code Review
PR adds a self-contained nightly Core-testnet sync worker (Node harness, systemd units, install script, dispatch workflow, docs). No consensus or cryptographic code is touched. Verification confirms three in-scope blocking issues — stderr contamination of version parsing, unbounded in-memory log accumulation across multi-hour phases, and GITHUB_TOKEN being forwarded into phase subprocesses that execute target-branch code — plus several reliability and security suggestions worth tightening before this runs unattended.
🔴 3 blocking | 🟡 5 suggestion(s) | 💬 1 nitpick(s)
🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.
In `.github/scripts/latest-core-testnet-sync/run.cjs`:
- [BLOCKING] .github/scripts/latest-core-testnet-sync/run.cjs:92-95: stderr is captured into the same buffer used to parse the Core version
`runCommand` appends both stdout and stderr to the single `output` buffer (lines 86–96), and `resolveLatestCoreVersion` then calls `firstLine(result.output)` (line 130) to extract the version. The documented contract says `LATEST_CORE_TESTNET_CORE_VERSION_COMMAND` prints the version on stdout, but any diagnostic written to stderr before the tag (a `set -x` line, a warning from a git/curl helper, a deprecation notice) will appear first in the merged buffer and be silently adopted as `LATEST_CORE_VERSION`. The metadata is then written, the status is published, and the run reports success against a bogus Core version. The stderr stream should still be logged, but it must not be mixed into the captured stdout used for parsing.
- [BLOCKING] .github/scripts/latest-core-testnet-sync/run.cjs:84-96: Phase output is retained in memory for the entire 16+ hour run
`runCommand` appends every stdout and stderr chunk to the `output` string for every phase (lines 88, 94), including the documented long-running Core-sync and Platform-sync phases that run for many hours with verbose docker/cargo/dashmate output. Logs are already streamed to per-phase log files, but the Node process keeps a full duplicate in memory until the phase exits. A verbose build or sync can grow the heap to hundreds of MB or more, eventually OOM-killing the worker before `publishCommitStatus` runs — leaving the GitHub commit status stale and the systemd unit silently failed. Only the short `resolve-core-version` path actually consumes `result.output`; the long phases should stream without retaining the full log in memory.
- [BLOCKING] .github/scripts/latest-core-testnet-sync/run.cjs:250-256: GITHUB_TOKEN is forwarded into every phase subprocess
`phaseEnv` is built by spreading `process.env` (which is populated from `EnvironmentFile=/etc/latest-core-testnet-sync.env` and includes `GITHUB_TOKEN`/`GH_TOKEN`) and is then handed to `spawn('bash', ['-c', ...])` for the operator-defined Core-sync, Platform-build, and Platform-sync commands. `run-worker.sh` first does `git reset --hard origin/${PLATFORM_BRANCH}`, so those commands execute scripts from the freshly checked-out branch. Anyone able to land a commit on the tracked branch (or any compromised build dependency invoked during the phases) can read the token from the environment and forge `Latest public Core testnet sync` commit statuses on arbitrary commits — the exact signal this PR introduces for developer trust. The same exposure is compounded by `runCommand` persisting all stdout/stderr to `${runDir}/${name}.log`, so any `set -x` or `env` invocation also writes the token to disk under `/var/log/latest-core-testnet-sync/`. The token is only needed by `publishCommitStatus`, which reads from `process.env` directly; strip `GITHUB_TOKEN`/`GH_TOKEN` from `phaseEnv` before passing it to `spawn`.
- [SUGGESTION] .github/scripts/latest-core-testnet-sync/run.cjs:72-82: Timeout only signals the bash wrapper, not the phase's process tree
`spawn('bash', ['-c', ...])` is launched without `detached: true`, and the timeout handler calls `child.kill('SIGTERM')` (line 80), which delivers the signal only to the bash wrapper. The phase commands fork docker/dashmate/cargo subprocesses that will become orphans when bash exits, continuing to hold ports, disk, CPU, and lock files. With a 1440-minute default timeout, a real timeout means the next nightly run collides with leftovers from the previous one. Detach the child into its own process group (`spawn(..., { detached: true })`) and signal the whole group on timeout (`process.kill(-child.pid, 'SIGTERM')`, escalating to `SIGKILL` after a grace period).
- [SUGGESTION] .github/scripts/latest-core-testnet-sync/run.cjs:138-219: GitHub `fetch` calls have no timeout or AbortSignal
Both `resolveLatestCoreVersion` (GET `/releases`, line 138) and `publishCommitStatus` (POST `/statuses/...`, line 210) call the global `fetch` with no `signal`. The systemd unit sets `TimeoutStartSec=0` and Node imposes no default request timeout, so a hung TCP connection to GitHub blocks the worker indefinitely. For the resolve call this delays the run start; for the publish call a completed sync sits silent forever, and the next nightly run is then blocked by the `flock` in `run-worker.sh` (which exits 0 without publishing anything). Wire `AbortSignal.timeout(...)` (or equivalent) into both fetches.
- [SUGGESTION] .github/scripts/latest-core-testnet-sync/run.cjs:285-301: Publish failure in `finally` silently drops the run outcome
If `publishCommitStatus` throws inside the `finally` block (transient 5xx, expired token, network blip), the rejection is swallowed by the outer `main().catch` which only sets `process.exitCode = 1`. The metadata file already records `status: 'sync_passed'`, but no GitHub commit status is created and there is no retry — the visible status panel goes stale until the next nightly run, which itself may fail to publish for the same reason. Wrap the publish call in retry-with-backoff and log a clear, distinguishable error when the final status could not be reported, so operators can act before the next run.
In `ops/latest-core-testnet-sync/run-worker.sh`:
- [SUGGESTION] ops/latest-core-testnet-sync/run-worker.sh:1-30: Per-run log/state directories accumulate without retention
`run-worker.sh` creates `${LATEST_CORE_TESTNET_LOG_DIR}/${RUN_ID}` per nightly run, and `run.cjs` writes per-phase log files and metadata under it. Nothing prunes old run directories, so over a year of daily runs you accumulate 365+ directories of multi-MB logs under `/var/log/latest-core-testnet-sync/`. Either add a retention step here (e.g. `find ... -mtime +N -delete`) or document the operator's responsibility in `ops/latest-core-testnet-sync/README.md`.
In `.github/workflows/latest-core-testnet-sync-status.yml`:
- [SUGGESTION] .github/workflows/latest-core-testnet-sync-status.yml:70-88: Dispatch workflow accepts arbitrary `target_sha`/`target_url` without validation
The reporter posts a commit status to whatever `target_sha` and `target_url` are supplied via `repository_dispatch` `client_payload` (or `workflow_dispatch` input). Because the job runs with `statuses: write`, any credential able to fire the `latest-core-testnet-sync-completed` dispatch event can publish a `success`/`failure` status under the `Latest public Core testnet sync` context against any commit SHA, with an arbitrary `target_url` that maintainers will click from the commit page. Worst case is a forged `Sync Passed` status linking to a phishing page. Mitigations: validate `target_sha` is a 40-hex commit that exists in the repo (e.g. `repos.getCommit`), restrict `target_url` to a known origin (e.g. `https://github.com/...`), and document who is expected to hold the dispatch credential in `docs/NIGHTLY_LATEST_CORE_TESTNET_SYNC.md`.
thepastaclaw
left a comment
There was a problem hiding this comment.
Code Review
All nine prior findings are addressed in the latest delta: stderr/version corruption, unbounded output buffering, GITHUB_TOKEN propagation, process-group signaling, fetch timeouts, log retention, dispatch payload validation, and the shadowed body identifier are all fixed; the publish-failure-in-finally case is partially addressed via retries but the metadata divergence remains. The latest delta introduces two new low-severity issues: a phase-attribution bug for resolve-core-version failures, and a shell-interpolation hardening gap on PLATFORM_BRANCH. No blocking issues.
🟡 2 suggestion(s) | 💬 1 nitpick(s)
🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.
In `.github/scripts/latest-core-testnet-sync/run.cjs`:
- [SUGGESTION] .github/scripts/latest-core-testnet-sync/run.cjs:397-400: resolve-core-version failures are recorded as workspace-prepare failures
`prepareWorkspace` sets `metadata.phase = 'workspace-prepare'` and never clears it on success. The next statement awaits `resolveLatestCoreVersion`, which can fail (bad `LATEST_CORE_TESTNET_CORE_VERSION_COMMAND`, GitHub releases fetch error, no public release, etc.). The catch at line 440 sets `metadata.failure_phase = metadata.phase || 'resolve-core-version'`, so any such failure is recorded as `failure_phase=workspace-prepare`. The `|| 'resolve-core-version'` fallback only fires when `prepareWorkspace` was skipped via `LATEST_CORE_TESTNET_SKIP_WORKSPACE_PREP=1`. Set the phase before resolving the Core version so the run-tracking contract points operators at the right phase.
- [SUGGESTION] .github/scripts/latest-core-testnet-sync/run.cjs:444-449: Metadata claims sync_passed even when status publish fails after retries (carried forward, partial fix)
Prior finding #6 is only partially addressed. `publishCommitStatus` now retries up to `STATUS_PUBLISH_ATTEMPTS` times, which is a clear improvement, but if every retry fails the throw propagates to `main().catch`, leaving `run-metadata.json` on disk with `status: sync_passed` (or `build_failed`/`sync_failed`) while no GitHub status was published and the previous run's status remains visible. Operators that read `run-metadata.json` to decide whether the latest sync passed will see a green result while GitHub disagrees. Record the publish outcome in the metadata (e.g. `status_published_at`, `status_publish_error`) after the final `publishCommitStatus` call so the on-disk record and GitHub stay aligned.
thepastaclaw
left a comment
There was a problem hiding this comment.
Code Review
Prior phase-tracking and publish-outcome metadata findings are resolved. One prior carry-forward (PLATFORM_BRANCH quoting) plus two new in-scope hardening items: target_sha resolution can still report a stale/inherited commit after workspace reset, and the new status-receiver workflow exposes a workflow_dispatch override that lets any repo writer forge a 'Sync Passed' status. All three are suggestions, no blockers.
🟡 2 suggestion(s)
1 carried-forward finding(s) already raised on this PR; not re-posting as new inline comments.
🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.
In `.github/scripts/latest-core-testnet-sync/run.cjs`:
- [SUGGESTION] .github/scripts/latest-core-testnet-sync/run.cjs:233-237: target_sha re-capture still prefers inherited env vars over the checked-out HEAD
After `prepareWorkspace` resets the workspace to `origin/$PLATFORM_BRANCH`, it re-derives `targetSha` via `resolveTargetSha()`, which returns `process.env.TARGET_SHA || process.env.GITHUB_SHA || git rev-parse HEAD`. If either env var is set on the worker (systemd env file, prior run leakage into a long-lived shell, or future CI invocation), the script runs phase commands against the freshly reset branch tip but publishes the commit status against the inherited SHA — the worker's contract to report the actually-tested commit is silently broken. Worse, if `prepareWorkspace` fails before this line, the catch path publishes 'Sync Failed' against whatever stale SHA was captured at line 382. Resolve `target_sha` directly from `git rev-parse HEAD` after the reset, and only fall back to inherited env vars for the pre-prepare initial value.
In `.github/workflows/latest-core-testnet-sync-status.yml`:
- [SUGGESTION] .github/workflows/latest-core-testnet-sync-status.yml:7-36: workflow_dispatch lets any repo writer forge a sync status
The receiver exposes both `repository_dispatch` (driven by the operator-only worker credential) and `workflow_dispatch` with free-form `status` and `target_sha` inputs. The job carries `statuses: write`, so anyone with repo write access can manually publish 'Sync Passed' against any existing SHA without the worker ever running. If maintainers, branch protection, or merge automation treat the `Latest public Core testnet sync` context as evidence the testnet sync actually ran, this becomes a trust-laundering primitive that bypasses the operator-credential boundary the rest of the pipeline carefully maintains. Either drop the `workflow_dispatch` trigger, or gate it behind an environment with required reviewers so the manual path requires an operator approval distinct from generic repo write.
| const targetSha = resolveTargetSha(); | ||
| metadata.target_sha = targetSha; | ||
| metadata.platform_sha = targetSha; | ||
| process.env.TARGET_SHA = targetSha; | ||
| writeMetadata(metadata); |
There was a problem hiding this comment.
🟡 Suggestion: target_sha re-capture still prefers inherited env vars over the checked-out HEAD
After prepareWorkspace resets the workspace to origin/$PLATFORM_BRANCH, it re-derives targetSha via resolveTargetSha(), which returns process.env.TARGET_SHA || process.env.GITHUB_SHA || git rev-parse HEAD. If either env var is set on the worker (systemd env file, prior run leakage into a long-lived shell, or future CI invocation), the script runs phase commands against the freshly reset branch tip but publishes the commit status against the inherited SHA — the worker's contract to report the actually-tested commit is silently broken. Worse, if prepareWorkspace fails before this line, the catch path publishes 'Sync Failed' against whatever stale SHA was captured at line 382. Resolve target_sha directly from git rev-parse HEAD after the reset, and only fall back to inherited env vars for the pre-prepare initial value.
| const targetSha = resolveTargetSha(); | |
| metadata.target_sha = targetSha; | |
| metadata.platform_sha = targetSha; | |
| process.env.TARGET_SHA = targetSha; | |
| writeMetadata(metadata); | |
| const targetSha = gitOutput(['rev-parse', 'HEAD']); | |
| metadata.target_sha = targetSha; | |
| metadata.platform_sha = targetSha; | |
| process.env.TARGET_SHA = targetSha; | |
| writeMetadata(metadata); |
source: ['claude', 'codex']
There was a problem hiding this comment.
Resolved in f6783a6 — target_sha re-capture still prefers inherited env vars over the checked-out HEAD no longer present.
Auto-resolved by the review system based on the latest commit diff. If you believe this was closed in error, reopen the thread.
| workflow_dispatch: | ||
| inputs: | ||
| status: | ||
| description: Final completed run status | ||
| required: true | ||
| type: choice | ||
| options: | ||
| - sync_passed | ||
| - build_failed | ||
| - sync_failed | ||
| target_sha: | ||
| description: Platform commit SHA to report against | ||
| required: false | ||
| type: string | ||
| target_url: | ||
| description: URL for run logs or status details | ||
| required: false | ||
| type: string | ||
| core_version: | ||
| description: Public Core version used by the run | ||
| required: false | ||
| type: string | ||
| platform_sha: | ||
| description: Platform commit SHA built and synced | ||
| required: false | ||
| type: string | ||
| completed_at: | ||
| description: Completion timestamp, preferably ISO 8601 UTC | ||
| required: false | ||
| type: string |
There was a problem hiding this comment.
🟡 Suggestion: workflow_dispatch lets any repo writer forge a sync status
The receiver exposes both repository_dispatch (driven by the operator-only worker credential) and workflow_dispatch with free-form status and target_sha inputs. The job carries statuses: write, so anyone with repo write access can manually publish 'Sync Passed' against any existing SHA without the worker ever running. If maintainers, branch protection, or merge automation treat the Latest public Core testnet sync context as evidence the testnet sync actually ran, this becomes a trust-laundering primitive that bypasses the operator-credential boundary the rest of the pipeline carefully maintains. Either drop the workflow_dispatch trigger, or gate it behind an environment with required reviewers so the manual path requires an operator approval distinct from generic repo write.
source: ['codex']
There was a problem hiding this comment.
Resolved in this update — workflow_dispatch lets any repo writer forge a sync status no longer present.
Auto-resolved by the review system based on the latest commit diff. If you believe this was closed in error, reopen the thread.
| @@ -0,0 +1,79 @@ | |||
| # Latest Public Core Testnet Sync Worker | |||
There was a problem hiding this comment.
why it exists? who needs that? When I, as a user or developer, should use that tool?
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
ops/latest-core-testnet-sync/run-worker.sh (1)
13-24: 🩺 Stability & Availability | 🟡 Minor | ⚡ Quick winTake the worker lock before pruning logs.
A second timer invocation still runs
find ... rm -rfbefore failingflock; with short retention or an unusually long run, it can delete the active run directory before exiting.Proposed fix
mkdir -p "${LATEST_CORE_TESTNET_STATE_DIR}" "${LATEST_CORE_TESTNET_LOG_DIR}" -find "${LATEST_CORE_TESTNET_LOG_DIR}" \ - -mindepth 1 \ - -maxdepth 1 \ - -type d \ - -mtime +"${LATEST_CORE_TESTNET_LOG_RETENTION_DAYS}" \ - -exec rm -rf {} + - exec 9>"${LATEST_CORE_TESTNET_STATE_DIR}/run.lock" if ! flock -n 9; then echo "latest Core testnet sync is already running" exit 0 fi + +find "${LATEST_CORE_TESTNET_LOG_DIR}" \ + -mindepth 1 \ + -maxdepth 1 \ + -type d \ + -mtime +"${LATEST_CORE_TESTNET_LOG_RETENTION_DAYS}" \ + -exec rm -rf {} +🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ops/latest-core-testnet-sync/run-worker.sh` around lines 13 - 24, The lock acquisition in run-worker.sh happens after the log-pruning find/rm -rf block, which allows a concurrent timer invocation to delete the active run directory before flock fails. Move the run.lock open and flock -n 9 check to the start of the worker flow, before any pruning, and keep the existing latest Core testnet sync log cleanup logic gated behind the lock so only the active worker can prune.ops/latest-core-testnet-sync/install-worker.sh (1)
50-57: 🔒 Security & Privacy | 🟠 Major | ⚡ Quick winForce root ownership on existing env files.
chmod 0600is not enough if/etc/latest-core-testnet-sync.envalready exists with a non-root owner; that owner can still read or modifyGITHUB_TOKENand command configuration.Proposed fix
if [[ ! -f /etc/latest-core-testnet-sync.env ]]; then install -m 0600 \ "${SCRIPT_DIR}/latest-core-testnet-sync.env.example" \ /etc/latest-core-testnet-sync.env echo "Created /etc/latest-core-testnet-sync.env; fill it before enabling the timer." else + chown root:root /etc/latest-core-testnet-sync.env chmod 0600 /etc/latest-core-testnet-sync.env fi🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ops/latest-core-testnet-sync/install-worker.sh` around lines 50 - 57, The existing env-file handling in install-worker.sh only tightens permissions with chmod, but it does not ensure /etc/latest-core-testnet-sync.env is owned by root. Update the existing-file branch so the setup enforces root ownership for the env file (using the install/ownership path in the script) before setting restrictive permissions, and keep the creation path in the same install flow for latest-core-testnet-sync.env.example.
♻️ Duplicate comments (1)
.github/scripts/latest-core-testnet-sync/run.cjs (1)
468-483: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winDon’t throw from the
finallyblock.Biome flags Line 481, and this also skips the final status log path. Record the publish failure in metadata, then handle the nonzero exit after
finally.Proposed fix
let finalStatus = 'sync_passed'; + let statusPublishError; @@ } catch (error) { metadata.status_publish_error = error.message; metadata.status_publish_failed_at = now(); writeMetadata(metadata); - throw error; + statusPublishError = error; } writeMetadata(metadata); } console.log(`${STATUS_CONTEXT}: ${STATUS_LABELS[finalStatus]}`); - if (finalStatus !== 'sync_passed') { + if (statusPublishError) { + console.error(`Failed to publish final commit status: ${statusPublishError.message}`); + process.exitCode = 1; + } else if (finalStatus !== 'sync_passed') { process.exitCode = 1; }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/scripts/latest-core-testnet-sync/run.cjs around lines 468 - 483, The publish-status flow in the `publishCommitStatus`/metadata update block should not rethrow from the cleanup path, because that bypasses the final status logging and causes Biome to flag the `finally`-style handling. Keep recording the failure details in `metadata.status_publish_error` and `metadata.status_publish_failed_at`, ensure `writeMetadata(metadata)` still runs, and move the nonzero exit/error propagation to after the final status log path so the script can complete cleanup before exiting.Source: Linters/SAST tools
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@ops/latest-core-testnet-sync/install-worker.sh`:
- Around line 28-38: The current preflight in install-worker.sh only checks that
the repo root is a writable git checkout, but it can still miss root-owned Git
internals and a missing or non-executable run-worker.sh entrypoint. Update the
validation near the existing PLATFORM_REPO_DIR and PLATFORM_SYNC_USER checks to
also verify write access to the Git data paths used by the service, such as
.git/objects and .git/refs, and confirm the
ops/latest-core-testnet-sync/run-worker.sh script exists and is executable in
${PLATFORM_REPO_DIR}. Keep the checks alongside the existing preflight logic so
failures are caught before the unit starts.
---
Outside diff comments:
In `@ops/latest-core-testnet-sync/install-worker.sh`:
- Around line 50-57: The existing env-file handling in install-worker.sh only
tightens permissions with chmod, but it does not ensure
/etc/latest-core-testnet-sync.env is owned by root. Update the existing-file
branch so the setup enforces root ownership for the env file (using the
install/ownership path in the script) before setting restrictive permissions,
and keep the creation path in the same install flow for
latest-core-testnet-sync.env.example.
In `@ops/latest-core-testnet-sync/run-worker.sh`:
- Around line 13-24: The lock acquisition in run-worker.sh happens after the
log-pruning find/rm -rf block, which allows a concurrent timer invocation to
delete the active run directory before flock fails. Move the run.lock open and
flock -n 9 check to the start of the worker flow, before any pruning, and keep
the existing latest Core testnet sync log cleanup logic gated behind the lock so
only the active worker can prune.
---
Duplicate comments:
In @.github/scripts/latest-core-testnet-sync/run.cjs:
- Around line 468-483: The publish-status flow in the
`publishCommitStatus`/metadata update block should not rethrow from the cleanup
path, because that bypasses the final status logging and causes Biome to flag
the `finally`-style handling. Keep recording the failure details in
`metadata.status_publish_error` and `metadata.status_publish_failed_at`, ensure
`writeMetadata(metadata)` still runs, and move the nonzero exit/error
propagation to after the final status log path so the script can complete
cleanup before exiting.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 9249bc37-cdf7-4e25-91ef-a9fd94a1c30c
📒 Files selected for processing (8)
.github/scripts/latest-core-testnet-sync/run.cjs.github/workflows/latest-core-testnet-sync-status.ymldocs/NIGHTLY_LATEST_CORE_TESTNET_SYNC.mdops/latest-core-testnet-sync/README.mdops/latest-core-testnet-sync/install-worker.shops/latest-core-testnet-sync/latest-core-testnet-sync.env.exampleops/latest-core-testnet-sync/latest-core-testnet-sync.serviceops/latest-core-testnet-sync/run-worker.sh
✅ Files skipped from review due to trivial changes (3)
- docs/NIGHTLY_LATEST_CORE_TESTNET_SYNC.md
- ops/latest-core-testnet-sync/README.md
- ops/latest-core-testnet-sync/latest-core-testnet-sync.env.example
🚧 Files skipped from review as they are similar to previous changes (2)
- .github/workflows/latest-core-testnet-sync-status.yml
- ops/latest-core-testnet-sync/latest-core-testnet-sync.service
| if [[ ! -d "${PLATFORM_REPO_DIR}/.git" ]]; then | ||
| echo "PLATFORM_REPO_DIR must be an existing git checkout: ${PLATFORM_REPO_DIR}" >&2 | ||
| echo "Clone dashpay/platform there before running this installer." >&2 | ||
| exit 1 | ||
| fi | ||
|
|
||
| if ! runuser -u "${PLATFORM_SYNC_USER}" -- test -w "${PLATFORM_REPO_DIR}"; then | ||
| echo "PLATFORM_REPO_DIR must be writable by ${PLATFORM_SYNC_USER}: ${PLATFORM_REPO_DIR}" >&2 | ||
| echo "Run: chown -R ${PLATFORM_SYNC_USER}:${PLATFORM_SYNC_GROUP} ${PLATFORM_REPO_DIR}" >&2 | ||
| exit 1 | ||
| fi |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major | ⚡ Quick win
Validate the runtime entrypoint and Git write paths.
This preflight can pass with a writable repo root while .git/objects or .git/refs remains root-owned, and it does not ensure the service’s ops/latest-core-testnet-sync/run-worker.sh entrypoint exists/executable in ${PLATFORM_REPO_DIR}. The unit can then fail immediately or during workspace prep.
Proposed fix
if [[ ! -d "${PLATFORM_REPO_DIR}/.git" ]]; then
echo "PLATFORM_REPO_DIR must be an existing git checkout: ${PLATFORM_REPO_DIR}" >&2
echo "Clone dashpay/platform there before running this installer." >&2
exit 1
fi
-if ! runuser -u "${PLATFORM_SYNC_USER}" -- test -w "${PLATFORM_REPO_DIR}"; then
+WORKER_ENTRYPOINT="${PLATFORM_REPO_DIR}/ops/latest-core-testnet-sync/run-worker.sh"
+if ! runuser -u "${PLATFORM_SYNC_USER}" -- test -x "${WORKER_ENTRYPOINT}"; then
+ echo "Worker entrypoint must exist and be executable by ${PLATFORM_SYNC_USER}: ${WORKER_ENTRYPOINT}" >&2
+ exit 1
+fi
+
+if ! runuser -u "${PLATFORM_SYNC_USER}" -- bash -c '
+ set -euo pipefail
+ repo=$1
+ test -w "$repo"
+ test -w "$repo/.git"
+ test -w "$repo/.git/objects"
+ test -w "$repo/.git/refs"
+' _ "${PLATFORM_REPO_DIR}"; then
echo "PLATFORM_REPO_DIR must be writable by ${PLATFORM_SYNC_USER}: ${PLATFORM_REPO_DIR}" >&2
echo "Run: chown -R ${PLATFORM_SYNC_USER}:${PLATFORM_SYNC_GROUP} ${PLATFORM_REPO_DIR}" >&2
exit 1
fi📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if [[ ! -d "${PLATFORM_REPO_DIR}/.git" ]]; then | |
| echo "PLATFORM_REPO_DIR must be an existing git checkout: ${PLATFORM_REPO_DIR}" >&2 | |
| echo "Clone dashpay/platform there before running this installer." >&2 | |
| exit 1 | |
| fi | |
| if ! runuser -u "${PLATFORM_SYNC_USER}" -- test -w "${PLATFORM_REPO_DIR}"; then | |
| echo "PLATFORM_REPO_DIR must be writable by ${PLATFORM_SYNC_USER}: ${PLATFORM_REPO_DIR}" >&2 | |
| echo "Run: chown -R ${PLATFORM_SYNC_USER}:${PLATFORM_SYNC_GROUP} ${PLATFORM_REPO_DIR}" >&2 | |
| exit 1 | |
| fi | |
| if [[ ! -d "${PLATFORM_REPO_DIR}/.git" ]]; then | |
| echo "PLATFORM_REPO_DIR must be an existing git checkout: ${PLATFORM_REPO_DIR}" >&2 | |
| echo "Clone dashpay/platform there before running this installer." >&2 | |
| exit 1 | |
| fi | |
| WORKER_ENTRYPOINT="${PLATFORM_REPO_DIR}/ops/latest-core-testnet-sync/run-worker.sh" | |
| if ! runuser -u "${PLATFORM_SYNC_USER}" -- test -x "${WORKER_ENTRYPOINT}"; then | |
| echo "Worker entrypoint must exist and be executable by ${PLATFORM_SYNC_USER}: ${WORKER_ENTRYPOINT}" >&2 | |
| exit 1 | |
| fi | |
| if ! runuser -u "${PLATFORM_SYNC_USER}" -- bash -c ' | |
| set -euo pipefail | |
| repo=$1 | |
| test -w "$repo" | |
| test -w "$repo/.git" | |
| test -w "$repo/.git/objects" | |
| test -w "$repo/.git/refs" | |
| ' _ "${PLATFORM_REPO_DIR}"; then | |
| echo "PLATFORM_REPO_DIR must be writable by ${PLATFORM_SYNC_USER}: ${PLATFORM_REPO_DIR}" >&2 | |
| echo "Run: chown -R ${PLATFORM_SYNC_USER}:${PLATFORM_SYNC_GROUP} ${PLATFORM_REPO_DIR}" >&2 | |
| exit 1 | |
| fi |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@ops/latest-core-testnet-sync/install-worker.sh` around lines 28 - 38, The
current preflight in install-worker.sh only checks that the repo root is a
writable git checkout, but it can still miss root-owned Git internals and a
missing or non-executable run-worker.sh entrypoint. Update the validation near
the existing PLATFORM_REPO_DIR and PLATFORM_SYNC_USER checks to also verify
write access to the Git data paths used by the service, such as .git/objects and
.git/refs, and confirm the ops/latest-core-testnet-sync/run-worker.sh script
exists and is executable in ${PLATFORM_REPO_DIR}. Keep the checks alongside the
existing preflight logic so failures are caught before the unit starts.
thepastaclaw
left a comment
There was a problem hiding this comment.
Code Review
Incremental review reconciling 3 prior findings against head f6783a6. The JSON.stringify shell-quoting issue (prior-3) is now FIXED via env-var indirection plus validatePlatformBranch allow-list. The target_sha-from-env (prior-1) and workflow_dispatch trust-boundary (prior-2) issues are STILL VALID and carried forward. Two new in-scope issues were validated against the worker scripts: pre-lock log pruning can delete the active run directory, and install-worker.sh does not enforce root ownership on a pre-existing token-bearing env file.
🔴 2 blocking | 🟡 3 suggestion(s) | 💬 1 nitpick(s)
6 additional finding(s) omitted (not in diff).
🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.
In `.github/workflows/latest-core-testnet-sync-status.yml`:
- [BLOCKING] .github/workflows/latest-core-testnet-sync-status.yml:7-36: workflow_dispatch bypasses the operator-only worker trust boundary
The workflow exposes workflow_dispatch with free-form status/target_sha inputs while the job carries statuses: write. Any user with repo write access can manually run it and publish 'Sync Passed' against any commit reachable in the repo, forging the 'Latest public Core testnet sync' context without the persistent worker ever running. The SHA format check and getCommit lookup only confirm the commit exists — they do not enforce that the worker produced the result. If this context is (or becomes) a required check, the manual path is a status-forgery vector. Remove the workflow_dispatch trigger or gate it behind a protected environment that requires operator approval.
In `ops/latest-core-testnet-sync/install-worker.sh`:
- [BLOCKING] ops/latest-core-testnet-sync/install-worker.sh:50-57: Existing /etc/latest-core-testnet-sync.env keeps prior ownership
When /etc/latest-core-testnet-sync.env already exists, the installer only re-applies 0600 perms and does not chown to root:root. If the file was previously written by a non-root account (e.g. created manually before the installer ran), that account retains read/write access to GITHUB_TOKEN and can modify the phase commands EnvironmentFile-loaded by the service that later runs as root. The create path also relies on install's default ownership rather than explicitly forcing root:root. Force root:root ownership in both branches so the secret-bearing env file is never owned by a less-privileged account.
In `.github/scripts/latest-core-testnet-sync/run.cjs`:
- [SUGGESTION] .github/scripts/latest-core-testnet-sync/run.cjs:250-254: target_sha can still be inherited from env after the workspace reset
After prepareWorkspace fetches/checks out/resets to origin/$PLATFORM_BRANCH, it re-derives the SHA via resolveTargetSha(), which returns process.env.TARGET_SHA || process.env.GITHUB_SHA || git rev-parse HEAD. If either env var is present in the worker environment (systemd EnvironmentFile, prior shell, CI re-invocation), metadata.target_sha/platform_sha and the published commit status will point at the inherited SHA even though the phases actually ran against the freshly reset branch tip. After the reset, take the SHA from git directly so metadata describes the commit that was actually tested. resolveTargetSha additionally never validates the value is a 40-char hex SHA, so a malformed env value silently propagates to the statuses API (which then 422s and burns the retry budget).
- [SUGGESTION] .github/scripts/latest-core-testnet-sync/run.cjs:374-389: publishCommitStatus retries on permanent HTTP failures
publishCommitStatusOnce throws a generic `HTTP <status> <body>` error for every non-2xx, and publishCommitStatus retries up to STATUS_PUBLISH_ATTEMPTS times with exponential backoff for any error. 401 (bad token), 403 (missing scopes), 404 (repo/SHA missing), and 422 (malformed SHA) are deterministic failures that the retries cannot recover, so the worker just delays surfacing the real error. Inspect the response status and only retry transient classes (5xx, network errors, 429); fail fast on the rest.
In `ops/latest-core-testnet-sync/run-worker.sh`:
- [SUGGESTION] ops/latest-core-testnet-sync/run-worker.sh:13-24: Log pruning runs before the worker lock is acquired
The destructive `find ... -mtime +RETENTION -exec rm -rf {} +` block executes before `flock -n 9`. A second timer invocation can therefore prune log directories while another sync is in flight, then exit when flock fails. If LATEST_CORE_TESTNET_LOG_RETENTION_DAYS is set low or a run exceeds the retention window, the active run's log directory (LATEST_CORE_TESTNET_SYNC_RUN_DIR under LATEST_CORE_TESTNET_LOG_DIR) can be removed mid-run, breaking the orchestrator's appendLog/writeMetadata calls. Move pruning inside the flock-held critical section so only the active worker performs cleanup.
thepastaclaw
left a comment
There was a problem hiding this comment.
Code Review
Cumulative incremental review for head 06c6405 against prior reviewed commit f6783a6. The latest delta is a scope-narrowing rename (core-sync → core-readiness) and does not touch any consensus-critical code. All six prior findings remain STILL VALID at HEAD: two blocking trust-boundary/permission issues (workflow_dispatch status forgery, env file ownership), three suggestion-level operational correctness issues, and one nitpick. One additional new suggestion is added for the initial target_sha being resolved from pre-reset HEAD, which is adjacent to but distinct from prior-3.
🔴 2 blocking | 🟡 3 suggestion(s) | 💬 1 nitpick(s)
6 additional finding(s) omitted (not in diff).
🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.
In `.github/workflows/latest-core-testnet-sync-status.yml`:
- [BLOCKING] .github/workflows/latest-core-testnet-sync-status.yml:7-36: workflow_dispatch bypasses the operator-only worker trust boundary
The workflow exposes a manual `workflow_dispatch` path that accepts caller-provided `status` and `target_sha` while the job carries `statuses: write`. Any repo user with workflow_dispatch permission can publish `Sync Passed` against any commit reachable in the repository without the persistent worker ever running. The 40-hex regex and `getCommit` lookup only prove the SHA exists; they do not prove a sync ran. If maintainers, branch protection, or downstream automation treat the `Latest public Core testnet sync` context as evidence of a real sync, this manual path forges that signal. Either drop `workflow_dispatch` entirely (the persistent worker uses `repository_dispatch`) or gate the job behind a protected environment requiring operator approval.
In `ops/latest-core-testnet-sync/install-worker.sh`:
- [BLOCKING] ops/latest-core-testnet-sync/install-worker.sh:50-57: Existing /etc/latest-core-testnet-sync.env keeps prior ownership
When `/etc/latest-core-testnet-sync.env` already exists, the installer only re-applies mode `0600` and leaves the current owner intact. If the file was created or copied by a non-root account before this installer ran (or rotated in by hand), that account retains read access to `GITHUB_TOKEN` and write access to the phase commands the systemd service consumes. Because this file carries the statuses-write token and controls the commands the worker runs as root-installed unit, the installer should force `chown root:root` in both the create and existing-file paths.
In `.github/scripts/latest-core-testnet-sync/run.cjs`:
- [SUGGESTION] .github/scripts/latest-core-testnet-sync/run.cjs:142-254: target_sha can be reported for a commit that was never tested
`resolveTargetSha()` (line 142) prefers `process.env.TARGET_SHA || process.env.GITHUB_SHA` over the working-tree HEAD, and it is called twice: once in `main()` at line 399 *before* the workspace is reset, and again inside `prepareWorkspace` at line 250 *after* the reset. Two failure modes follow: (1) if `TARGET_SHA`/`GITHUB_SHA` is present in the worker environment, both calls return the inherited SHA even though phases run against `origin/$PLATFORM_BRANCH`, so the published commit status points at a commit that was not actually tested; (2) if neither env var is set and `prepareWorkspace` fails (e.g. `git fetch` errors before the reset), the `finally` block publishes `Sync Failed` against the stale pre-fetch HEAD. Resolve the SHA strictly from `git rev-parse HEAD` after a successful reset, and use that single value for both metadata and the published status.
- [SUGGESTION] .github/scripts/latest-core-testnet-sync/run.cjs:374-389: publishCommitStatus retries permanent HTTP failures
`publishCommitStatusOnce` throws a generic `Error` for every non-2xx response, and `publishCommitStatus` retries on any thrown error up to `STATUS_PUBLISH_ATTEMPTS`. Permanent failures such as 401, 403, 404, and 422 are deterministic and cannot recover, so the worker burns the entire retry budget (with exponential backoff up to 8 seconds on attempt 3) before surfacing what is really a misconfiguration. Inspect the response status and only retry on transient categories — network errors, 429, 502, 503, 504 — while failing fast on permanent HTTP errors.
In `ops/latest-core-testnet-sync/run-worker.sh`:
- [SUGGESTION] ops/latest-core-testnet-sync/run-worker.sh:13-24: Log pruning runs before the worker lock is acquired
The destructive `find ... -mtime +RETENTION -exec rm -rf {} +` block runs before `flock -n 9` acquires `run.lock`. A second timer invocation triggered while another sync is in flight will execute the prune first, then exit because flock fails. If retention is low or a sync exceeds the retention window, that prune can remove the active run directory and break `appendLog`/`writeMetadata` mid-run. Acquire the flock first and perform pruning inside the single-worker critical section.
thepastaclaw
left a comment
There was a problem hiding this comment.
Code Review
Cumulative review at head 07c7e22. The latest delta is a rename from latest-core-testnet-sync to platform-testnet-sync; no behavior change. All six prior findings remain present in the renamed files and are carried forward. Two additional blocking security findings surfaced this round: the worker harness itself lives in the mutable Platform checkout that the same worker resets to the tested branch (so anyone who can land code on PLATFORM_BRANCH can modify the script that holds GITHUB_TOKEN), and phase commands inherit access to the parent Node process environment via /proc/$PPID/environ, defeating the child-env GITHUB_TOKEN/GH_TOKEN scrub. One symmetry suggestion is added for target_url validation.
🔴 3 blocking | 🟡 4 suggestion(s) | 💬 1 nitpick(s)
1 carried-forward finding(s) already raised on this PR; not re-posting as new inline comments.
🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.
In `ops/platform-testnet-sync/platform-testnet-sync.service`:
- [BLOCKING] ops/platform-testnet-sync/platform-testnet-sync.service:10-11: Worker harness executes from the same mutable checkout it resets to the tested branch
The unit loads `/etc/platform-testnet-sync.env` (containing `GITHUB_TOKEN` with `statuses: write`) and then runs `ops/platform-testnet-sync/run-worker.sh` and `.github/scripts/platform-testnet-sync/run.cjs` from `PLATFORM_REPO_DIR`. That directory is the same checkout that `prepareWorkspace` later resets to `origin/$PLATFORM_BRANCH` (run.cjs:240). Consequently the harness that receives the token is itself branch-controlled code: any commit that lands on `$PLATFORM_BRANCH` can rewrite `run-worker.sh` or `run.cjs` to read `process.env.GITHUB_TOKEN` before any scrubbing happens. The trusted credential-handling code must live outside the mutable checkout — install it as root-owned files under `/usr/local/lib/platform-testnet-sync/` (or similar) and let only the phase commands execute from the checked-out branch.
In `.github/scripts/platform-testnet-sync/run.cjs`:
- [BLOCKING] .github/scripts/platform-testnet-sync/run.cjs:422-429: Phase commands can recover GITHUB_TOKEN from parent process /proc/$PPID/environ
`sanitizePhaseEnv` and the explicit deletes at run.cjs:428-429 only strip `GITHUB_TOKEN`/`GH_TOKEN` from the child phase environment. The long-lived Node parent still holds those secrets in `process.env` because they were inherited from systemd's `EnvironmentFile` and are needed later for `publishCommitStatus`. On a standard Linux host, a same-UID child can read `/proc/$PPID/environ` (and scan other same-UID `/proc/*/environ`), so any code executed during `baseline-readiness`, `platform-build`, or `platform-sync` — including branch-controlled build scripts — can exfiltrate the token despite the child-env scrub. Move status publishing into an isolated helper (separate UID/container/process) that is invoked after phases finish, or hold the token in a file readable only after phases complete so it never lives in the credential-holding process's environment while untrusted code runs.
- [SUGGESTION] .github/scripts/platform-testnet-sync/run.cjs:142-254: target_sha can describe a commit that was never tested
`resolveTargetSha` prefers `TARGET_SHA` and `GITHUB_SHA` over `git rev-parse HEAD`, and it runs at main() before workspace preparation (run.cjs:399) and again after the reset inside `prepareWorkspace` (run.cjs:250). If either env var is inherited by the worker, the initial metadata and any failure-path commit status published from the `finally` block point at that inherited SHA even though no phase has actually run against it. If `prepareWorkspace` fails before the reset and `TARGET_SHA` is unset, the published status describes the stale pre-fetch checkout HEAD. Resolve the tested SHA strictly from git after a successful reset, and avoid publishing a commit-specific status before a target commit has actually been selected.
- [SUGGESTION] .github/scripts/platform-testnet-sync/run.cjs:366-388: Status publisher retries permanent HTTP failures
`publishCommitStatusOnce` throws the same generic `Error` for every non-2xx response, and `publishCommitStatus` retries every thrown error up to `STATUS_PUBLISH_ATTEMPTS`. Permanent failures such as 401, 403, 404, and 422 cannot recover with retry, so the worker burns the retry budget (plus exponential backoff sleeps) before surfacing what is actually a token/repository/SHA misconfiguration. Distinguish transient failures (network errors, 429, 5xx) from deterministic HTTP errors and fail fast on the latter.
- [SUGGESTION] .github/scripts/platform-testnet-sync/run.cjs:339-353: Worker does not validate target_url symmetrically with the receiver workflow
The status workflow rejects any `target_url` whose origin is not `https://github.com` (workflow lines 87-101), but the worker forwards `metadata.target_url` and `PLATFORM_TESTNET_SYNC_TARGET_URL` verbatim to the createCommitStatus API. The env file is root-owned so the immediate risk is bounded, but the two reporting paths drift: a manual workflow_dispatch with a non-github URL is refused, while a misconfigured or accidentally typo'd env value can publish a commit status whose link points anywhere (http://, third-party CDN, internal phishing). Apply the same `parsedUrl.origin === 'https://github.com'` (or an explicit allowlist) check in the worker before forwarding.
In `ops/platform-testnet-sync/install-worker.sh`:
- [BLOCKING] ops/platform-testnet-sync/install-worker.sh:50-57: Pre-existing env file keeps prior ownership
When `/etc/platform-testnet-sync.env` already exists, the installer only reapplies `chmod 0600` and leaves whatever ownership the file came with. If the file was created or copied by a non-root account before the installer ran, that account retains read access to `GITHUB_TOKEN` and write access to the phase commands that the systemd unit executes. Because the file carries the statuses-write token and controls the worker's behavior, the root-run installer should force `root:root` ownership in both the create and existing-file paths.
In `ops/platform-testnet-sync/run-worker.sh`:
- [SUGGESTION] ops/platform-testnet-sync/run-worker.sh:13-24: Log pruning runs before the worker lock is acquired
The destructive `find ... -mtime +RETENTION -exec rm -rf {} +` block at lines 13-18 runs before `flock -n 9` acquires `run.lock` at line 20. A second timer invocation while a sync is in flight prunes first and only then exits because the lock is held. If retention is configured low or a sync exceeds the retention window, the prune can remove the active run directory, breaking `appendLog`/`writeMetadata` in the running worker. Acquire the lock before pruning so only the active worker performs retention cleanup.
| EnvironmentFile=/etc/platform-testnet-sync.env | ||
| ExecStart=/bin/bash -lc 'cd "${PLATFORM_REPO_DIR:-/opt/dash-platform}" && exec ops/platform-testnet-sync/run-worker.sh' |
There was a problem hiding this comment.
🔴 Blocking: Worker harness executes from the same mutable checkout it resets to the tested branch
The unit loads /etc/platform-testnet-sync.env (containing GITHUB_TOKEN with statuses: write) and then runs ops/platform-testnet-sync/run-worker.sh and .github/scripts/platform-testnet-sync/run.cjs from PLATFORM_REPO_DIR. That directory is the same checkout that prepareWorkspace later resets to origin/$PLATFORM_BRANCH (run.cjs:240). Consequently the harness that receives the token is itself branch-controlled code: any commit that lands on $PLATFORM_BRANCH can rewrite run-worker.sh or run.cjs to read process.env.GITHUB_TOKEN before any scrubbing happens. The trusted credential-handling code must live outside the mutable checkout — install it as root-owned files under /usr/local/lib/platform-testnet-sync/ (or similar) and let only the phase commands execute from the checked-out branch.
source: ['codex']
There was a problem hiding this comment.
Resolved in 3546112 — Worker harness executes from the same mutable checkout it resets to the tested branch no longer present.
Auto-resolved by the review system based on the latest commit diff. If you believe this was closed in error, reopen the thread.
| const phaseEnv = { | ||
| ...process.env, | ||
| CORE_VERSION: metadata.core_version, | ||
| PLATFORM_SHA: metadata.platform_sha, | ||
| PLATFORM_TESTNET_SYNC_RUN_DIR: runDir, | ||
| }; | ||
| delete phaseEnv.GITHUB_TOKEN; | ||
| delete phaseEnv.GH_TOKEN; |
There was a problem hiding this comment.
🔴 Blocking: Phase commands can recover GITHUB_TOKEN from parent process /proc/$PPID/environ
sanitizePhaseEnv and the explicit deletes at run.cjs:428-429 only strip GITHUB_TOKEN/GH_TOKEN from the child phase environment. The long-lived Node parent still holds those secrets in process.env because they were inherited from systemd's EnvironmentFile and are needed later for publishCommitStatus. On a standard Linux host, a same-UID child can read /proc/$PPID/environ (and scan other same-UID /proc/*/environ), so any code executed during baseline-readiness, platform-build, or platform-sync — including branch-controlled build scripts — can exfiltrate the token despite the child-env scrub. Move status publishing into an isolated helper (separate UID/container/process) that is invoked after phases finish, or hold the token in a file readable only after phases complete so it never lives in the credential-holding process's environment while untrusted code runs.
source: ['codex']
There was a problem hiding this comment.
Resolved in 3546112 — Phase commands can recover GITHUB_TOKEN from parent process /proc/$PPID/environ no longer present.
Auto-resolved by the review system based on the latest commit diff. If you believe this was closed in error, reopen the thread.
| if [[ ! -f /etc/platform-testnet-sync.env ]]; then | ||
| install -m 0600 \ | ||
| "${SCRIPT_DIR}/platform-testnet-sync.env.example" \ | ||
| /etc/platform-testnet-sync.env | ||
| echo "Created /etc/platform-testnet-sync.env; fill it before enabling the timer." | ||
| else | ||
| chmod 0600 /etc/platform-testnet-sync.env | ||
| fi |
There was a problem hiding this comment.
🔴 Blocking: Pre-existing env file keeps prior ownership
When /etc/platform-testnet-sync.env already exists, the installer only reapplies chmod 0600 and leaves whatever ownership the file came with. If the file was created or copied by a non-root account before the installer ran, that account retains read access to GITHUB_TOKEN and write access to the phase commands that the systemd unit executes. Because the file carries the statuses-write token and controls the worker's behavior, the root-run installer should force root:root ownership in both the create and existing-file paths.
| if [[ ! -f /etc/platform-testnet-sync.env ]]; then | |
| install -m 0600 \ | |
| "${SCRIPT_DIR}/platform-testnet-sync.env.example" \ | |
| /etc/platform-testnet-sync.env | |
| echo "Created /etc/platform-testnet-sync.env; fill it before enabling the timer." | |
| else | |
| chmod 0600 /etc/platform-testnet-sync.env | |
| fi | |
| if [[ ! -f /etc/platform-testnet-sync.env ]]; then | |
| install -o root -g root -m 0600 \ | |
| "${SCRIPT_DIR}/platform-testnet-sync.env.example" \ | |
| /etc/platform-testnet-sync.env | |
| echo "Created /etc/platform-testnet-sync.env; fill it before enabling the timer." | |
| else | |
| chown root:root /etc/platform-testnet-sync.env | |
| chmod 0600 /etc/platform-testnet-sync.env | |
| fi |
source: ['claude', 'codex']
There was a problem hiding this comment.
Resolved in 3546112 — Pre-existing env file keeps prior ownership no longer present.
Auto-resolved by the review system based on the latest commit diff. If you believe this was closed in error, reopen the thread.
| function resolveTargetSha() { | ||
| return process.env.TARGET_SHA | ||
| || process.env.GITHUB_SHA | ||
| || gitOutput(['rev-parse', 'HEAD']); | ||
| } | ||
|
|
||
| async function runCommand(name, command, env, timeoutMinutes, options = {}) { | ||
| if (!command || !command.trim()) { | ||
| throw new Error(`Missing command for phase: ${name}`); | ||
| } | ||
|
|
||
| const { captureStdout = false } = options; | ||
| const logFileName = `${name}.log`; | ||
| appendLog(logFileName, `$ ${command}\n\n`); | ||
|
|
||
| const timeoutMs = timeoutMinutes * 60 * 1000; | ||
| const startedAt = Date.now(); | ||
|
|
||
| return new Promise((resolve, reject) => { | ||
| const child = spawn('bash', ['-c', `set -euo pipefail\n${command}`], { | ||
| cwd: repoRoot, | ||
| env, | ||
| stdio: ['ignore', 'pipe', 'pipe'], | ||
| detached: true, | ||
| }); | ||
|
|
||
| let timedOut = false; | ||
| let settled = false; | ||
| let killTimeout; | ||
| let stdout = ''; | ||
|
|
||
| const timeout = setTimeout(() => { | ||
| timedOut = true; | ||
| killProcessGroup(child, 'SIGTERM'); | ||
| killTimeout = setTimeout(() => { | ||
| if (!settled) { | ||
| killProcessGroup(child, 'SIGKILL'); | ||
| } | ||
| }, 30_000); | ||
| }, timeoutMs); | ||
|
|
||
| child.stdout.on('data', (chunk) => { | ||
| process.stdout.write(chunk); | ||
| if (captureStdout) { | ||
| stdout += chunk.toString(); | ||
| } | ||
| appendLog(logFileName, chunk.toString()); | ||
| }); | ||
|
|
||
| child.stderr.on('data', (chunk) => { | ||
| process.stderr.write(chunk); | ||
| appendLog(logFileName, chunk.toString()); | ||
| }); | ||
|
|
||
| child.on('error', (error) => { | ||
| settled = true; | ||
| clearTimeout(timeout); | ||
| clearTimeout(killTimeout); | ||
| reject(error); | ||
| }); | ||
|
|
||
| child.on('close', (code, signal) => { | ||
| settled = true; | ||
| clearTimeout(timeout); | ||
| clearTimeout(killTimeout); | ||
| const durationSeconds = Math.round((Date.now() - startedAt) / 1000); | ||
| appendLog(logFileName, `\nexit_code=${code} signal=${signal || ''} duration_seconds=${durationSeconds}\n`); | ||
|
|
||
| if (timedOut) { | ||
| reject(new Error(`${name} timed out after ${timeoutMinutes} minutes`)); | ||
| return; | ||
| } | ||
|
|
||
| if (code === 0) { | ||
| resolve({ stdout, durationSeconds }); | ||
| return; | ||
| } | ||
|
|
||
| reject(new Error(`${name} exited with code ${code}${signal ? ` (${signal})` : ''}`)); | ||
| }); | ||
| }); | ||
| } | ||
|
|
||
| async function prepareWorkspace(metadata, timeoutMinutes) { | ||
| if (process.env.PLATFORM_TESTNET_SYNC_SKIP_WORKSPACE_PREP === '1') { | ||
| return; | ||
| } | ||
|
|
||
| const branch = process.env.PLATFORM_BRANCH || 'v3.1-dev'; | ||
| metadata.phase = 'workspace-prepare'; | ||
| writeMetadata(metadata); | ||
| validatePlatformBranch(branch); | ||
|
|
||
| await runCommand( | ||
| 'workspace-prepare', | ||
| [ | ||
| 'git fetch --prune origin "$PLATFORM_BRANCH"', | ||
| 'git checkout "$PLATFORM_BRANCH"', | ||
| 'git reset --hard "origin/$PLATFORM_BRANCH"', | ||
| 'git clean -ffdx -e .platform-testnet-sync/', | ||
| ].join('\n'), | ||
| { | ||
| ...sanitizePhaseEnv(process.env), | ||
| PLATFORM_BRANCH: branch, | ||
| }, | ||
| timeoutMinutes, | ||
| ); | ||
|
|
||
| const targetSha = resolveTargetSha(); | ||
| metadata.target_sha = targetSha; | ||
| metadata.platform_sha = targetSha; | ||
| process.env.TARGET_SHA = targetSha; | ||
| writeMetadata(metadata); |
There was a problem hiding this comment.
🟡 Suggestion: target_sha can describe a commit that was never tested
resolveTargetSha prefers TARGET_SHA and GITHUB_SHA over git rev-parse HEAD, and it runs at main() before workspace preparation (run.cjs:399) and again after the reset inside prepareWorkspace (run.cjs:250). If either env var is inherited by the worker, the initial metadata and any failure-path commit status published from the finally block point at that inherited SHA even though no phase has actually run against it. If prepareWorkspace fails before the reset and TARGET_SHA is unset, the published status describes the stale pre-fetch checkout HEAD. Resolve the tested SHA strictly from git after a successful reset, and avoid publishing a commit-specific status before a target commit has actually been selected.
source: ['claude', 'codex']
There was a problem hiding this comment.
Resolved in 3546112 — target_sha can describe a commit that was never tested no longer present.
Auto-resolved by the review system based on the latest commit diff. If you believe this was closed in error, reopen the thread.
| find "${PLATFORM_TESTNET_SYNC_LOG_DIR}" \ | ||
| -mindepth 1 \ | ||
| -maxdepth 1 \ | ||
| -type d \ | ||
| -mtime +"${PLATFORM_TESTNET_SYNC_LOG_RETENTION_DAYS}" \ | ||
| -exec rm -rf {} + | ||
|
|
||
| exec 9>"${PLATFORM_TESTNET_SYNC_STATE_DIR}/run.lock" | ||
| if ! flock -n 9; then | ||
| echo "Platform testnet sync is already running" | ||
| exit 0 | ||
| fi |
There was a problem hiding this comment.
🟡 Suggestion: Log pruning runs before the worker lock is acquired
The destructive find ... -mtime +RETENTION -exec rm -rf {} + block at lines 13-18 runs before flock -n 9 acquires run.lock at line 20. A second timer invocation while a sync is in flight prunes first and only then exits because the lock is held. If retention is configured low or a sync exceeds the retention window, the prune can remove the active run directory, breaking appendLog/writeMetadata in the running worker. Acquire the lock before pruning so only the active worker performs retention cleanup.
| find "${PLATFORM_TESTNET_SYNC_LOG_DIR}" \ | |
| -mindepth 1 \ | |
| -maxdepth 1 \ | |
| -type d \ | |
| -mtime +"${PLATFORM_TESTNET_SYNC_LOG_RETENTION_DAYS}" \ | |
| -exec rm -rf {} + | |
| exec 9>"${PLATFORM_TESTNET_SYNC_STATE_DIR}/run.lock" | |
| if ! flock -n 9; then | |
| echo "Platform testnet sync is already running" | |
| exit 0 | |
| fi | |
| exec 9>"${PLATFORM_TESTNET_SYNC_STATE_DIR}/run.lock" | |
| if ! flock -n 9; then | |
| echo "Platform testnet sync is already running" | |
| exit 0 | |
| fi | |
| find "${PLATFORM_TESTNET_SYNC_LOG_DIR}" \ | |
| -mindepth 1 \ | |
| -maxdepth 1 \ | |
| -type d \ | |
| -mtime +"${PLATFORM_TESTNET_SYNC_LOG_RETENTION_DAYS}" \ | |
| -exec rm -rf {} + |
source: ['claude', 'codex']
There was a problem hiding this comment.
Resolved in 3546112 — Log pruning runs before the worker lock is acquired no longer present.
Auto-resolved by the review system based on the latest commit diff. If you believe this was closed in error, reopen the thread.
| if (!response.ok) { | ||
| const errorBody = await response.text(); | ||
| throw new Error(`Unable to publish commit status: HTTP ${response.status} ${errorBody}`); | ||
| } | ||
|
|
||
| return 'published'; | ||
| } | ||
|
|
||
| async function publishCommitStatus(status, metadata) { | ||
| let lastError; | ||
| for (let attempt = 1; attempt <= STATUS_PUBLISH_ATTEMPTS; attempt += 1) { | ||
| try { | ||
| return await publishCommitStatusOnce(status, metadata); | ||
| } catch (error) { | ||
| lastError = error; | ||
| console.error(`Commit status publish attempt ${attempt} failed: ${error.message}`); | ||
| if (attempt < STATUS_PUBLISH_ATTEMPTS) { | ||
| await sleep(2 ** attempt * 1000); | ||
| } | ||
| } | ||
| } | ||
|
|
||
| throw lastError; |
There was a problem hiding this comment.
🟡 Suggestion: Status publisher retries permanent HTTP failures
publishCommitStatusOnce throws the same generic Error for every non-2xx response, and publishCommitStatus retries every thrown error up to STATUS_PUBLISH_ATTEMPTS. Permanent failures such as 401, 403, 404, and 422 cannot recover with retry, so the worker burns the retry budget (plus exponential backoff sleeps) before surfacing what is actually a token/repository/SHA misconfiguration. Distinguish transient failures (network errors, 429, 5xx) from deterministic HTTP errors and fail fast on the latter.
source: ['claude', 'codex']
There was a problem hiding this comment.
Resolved in 3546112 — Status publisher retries permanent HTTP failures no longer present.
Auto-resolved by the review system based on the latest commit diff. If you believe this was closed in error, reopen the thread.
| const targetUrl = metadata.target_url | ||
| || process.env.PLATFORM_TESTNET_SYNC_TARGET_URL | ||
| || (process.env.GITHUB_RUN_ID | ||
| ? `${process.env.GITHUB_SERVER_URL || 'https://github.com'}/${repository}/actions/runs/${process.env.GITHUB_RUN_ID}` | ||
| : undefined); | ||
|
|
||
| const body = { | ||
| state: status === 'sync_passed' ? 'success' : 'failure', | ||
| context: STATUS_CONTEXT, | ||
| description: statusDescription(status, metadata), | ||
| }; | ||
|
|
||
| if (targetUrl) { | ||
| body.target_url = targetUrl; | ||
| } |
There was a problem hiding this comment.
🟡 Suggestion: Worker does not validate target_url symmetrically with the receiver workflow
The status workflow rejects any target_url whose origin is not https://github.com (workflow lines 87-101), but the worker forwards metadata.target_url and PLATFORM_TESTNET_SYNC_TARGET_URL verbatim to the createCommitStatus API. The env file is root-owned so the immediate risk is bounded, but the two reporting paths drift: a manual workflow_dispatch with a non-github URL is refused, while a misconfigured or accidentally typo'd env value can publish a commit status whose link points anywhere (http://, third-party CDN, internal phishing). Apply the same parsedUrl.origin === 'https://github.com' (or an explicit allowlist) check in the worker before forwarding.
source: ['claude']
There was a problem hiding this comment.
Resolved in 3546112 — Worker does not validate target_url symmetrically with the receiver workflow no longer present.
Auto-resolved by the review system based on the latest commit diff. If you believe this was closed in error, reopen the thread.
| const repository = process.env.GITHUB_REPOSITORY || 'dashpay/platform'; | ||
| if (!repository) { | ||
| throw new Error('GITHUB_REPOSITORY is required to publish commit status'); | ||
| } |
There was a problem hiding this comment.
💬 Nitpick: Unreachable !repository guard after default fallback
const repository = process.env.GITHUB_REPOSITORY || 'dashpay/platform' always produces a non-empty string, so the following if (!repository) throw is dead code. The fallback also silently routes status posts to dashpay/platform when GITHUB_REPOSITORY is unset, which can mask a worker configured for a fork or staging clone. Either require GITHUB_REPOSITORY and drop the default, or drop the dead guard — keeping both is misleading.
| const repository = process.env.GITHUB_REPOSITORY || 'dashpay/platform'; | |
| if (!repository) { | |
| throw new Error('GITHUB_REPOSITORY is required to publish commit status'); | |
| } | |
| const repository = process.env.GITHUB_REPOSITORY; | |
| if (!repository) { | |
| throw new Error('GITHUB_REPOSITORY is required to publish commit status'); | |
| } |
source: ['claude', 'codex']
There was a problem hiding this comment.
Resolved in 3546112 — Unreachable !repository guard after default fallback no longer present.
Auto-resolved by the review system based on the latest commit diff. If you believe this was closed in error, reopen the thread.
thepastaclaw
left a comment
There was a problem hiding this comment.
Code Review
Cumulative PR is now narrow: a status receiver workflow, README badge, and docs. The previously flagged worker harness (run.cjs, systemd unit, install/run shell, env example) was deleted in 3546112, resolving 8 of 9 prior findings. One prior issue remains: the surviving workflow_dispatch entry on platform-testnet-sync-status.yml lets any actor with workflow-dispatch privilege forge a 'Sync Passed' commit status on an arbitrary in-repo SHA, since payload validation only proves the SHA exists. Two independent Codex passes re-flagged this; carrying forward as blocking.
🔴 1 blocking
🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.
In `.github/workflows/platform-testnet-sync-status.yml`:
- [BLOCKING] .github/workflows/platform-testnet-sync-status.yml:7-40: workflow_dispatch path can forge 'Platform testnet sync' commit statuses
The job is granted statuses: write and accepts workflow_dispatch input directly for status, target_sha, target_url, core_version, platform_sha, and completed_at. The only checks performed are the enum match on status, a 40-hex regex on the SHA, repos.getCommit (which only proves the SHA exists in the repo), and an origin restriction on target_url. None of those prove the external sync worker actually ran against that commit. Any account with workflow-dispatch privilege on this repo (broader than the operator credential the design intends) can therefore publish state=success, context='Platform testnet sync' on an arbitrary commit. If branch protection, dashboards, or release automation treat this context as authoritative — and docs/PLATFORM_TESTNET_SYNC.md presents it that way — this is a status-forgery path that bypasses the worker boundary. Either drop the workflow_dispatch trigger (rely on repository_dispatch from the operator-held credential only), or gate the manual path behind a protected environment with a required reviewer and a restricted actor allowlist. Note: per the file-handling constraint, suggestion field is null; do not interpret that as agreement to leave the trigger in place.
| workflow_dispatch: | ||
| inputs: | ||
| status: | ||
| description: Final completed run status | ||
| required: true | ||
| type: choice | ||
| options: | ||
| - sync_passed | ||
| - build_failed | ||
| - sync_failed | ||
| target_sha: | ||
| description: Platform commit SHA to report against | ||
| required: false | ||
| type: string | ||
| target_url: | ||
| description: URL for run logs or status details | ||
| required: false | ||
| type: string | ||
| core_version: | ||
| description: Public Core version used by the run | ||
| required: false | ||
| type: string | ||
| platform_sha: | ||
| description: Platform commit SHA built and synced | ||
| required: false | ||
| type: string | ||
| completed_at: | ||
| description: Completion timestamp, preferably ISO 8601 UTC | ||
| required: false | ||
| type: string | ||
|
|
||
| permissions: | ||
| contents: read | ||
| statuses: write |
There was a problem hiding this comment.
🔴 Blocking: workflow_dispatch path can forge 'Platform testnet sync' commit statuses
The job is granted statuses: write and accepts workflow_dispatch input directly for status, target_sha, target_url, core_version, platform_sha, and completed_at. The only checks performed are the enum match on status, a 40-hex regex on the SHA, repos.getCommit (which only proves the SHA exists in the repo), and an origin restriction on target_url. None of those prove the external sync worker actually ran against that commit. Any account with workflow-dispatch privilege on this repo (broader than the operator credential the design intends) can therefore publish state=success, context='Platform testnet sync' on an arbitrary commit. If branch protection, dashboards, or release automation treat this context as authoritative — and docs/PLATFORM_TESTNET_SYNC.md presents it that way — this is a status-forgery path that bypasses the worker boundary. Either drop the workflow_dispatch trigger (rely on repository_dispatch from the operator-held credential only), or gate the manual path behind a protected environment with a required reviewer and a restricted actor allowlist. Note: per the file-handling constraint, suggestion field is null; do not interpret that as agreement to leave the trigger in place.
source: ['codex']
Summary
ops/latest-core-testnet-syncNotes
The long-running sync is intentionally not a GitHub-hosted scheduled workflow. The worker publishes only final completed commit statuses:
Sync PassedBuild FailedSync FailedNo
runningstatus is published, so the previous completed result remains visible while a new sync is active.Validation
node --check .github/scripts/latest-core-testnet-sync/run.cjsbash -n ops/latest-core-testnet-sync/run-worker.shbash -n ops/latest-core-testnet-sync/install-worker.sh.github/workflows/latest-core-testnet-sync-status.ymlwith PyYAMLsystemd-analyze verify ops/latest-core-testnet-sync/latest-core-testnet-sync.service ops/latest-core-testnet-sync/latest-core-testnet-sync.timerSync PassedBuild FailedSummary by CodeRabbit
New Features
Documentation