Skip to content

Add ops cheat sheet and smoke/uptime monitoring scripts#96

Merged
graphite-app[bot] merged 1 commit into
mainfrom
alastair/ops-tooling
May 7, 2026
Merged

Add ops cheat sheet and smoke/uptime monitoring scripts#96
graphite-app[bot] merged 1 commit into
mainfrom
alastair/ops-tooling

Conversation

@alastairong1
Copy link
Copy Markdown
Contributor

@alastairong1 alastairong1 commented Apr 28, 2026

Summary

  • New docs/ops.md — journalctl + curl recipes for the deployed service, cache_warmer status interpretation, smoke-test usage
  • New scripts/smoke.sh — end-to-end correctness + latency probe; asserts on response shape (not just status), reports per-check latency, exits non-zero on FAIL
  • New scripts/uptimerobot-setup.sh — one-shot creation of the 3 baseline UptimeRobot monitors (liveness, /health/detailed status=ok, cache_warmer running=true)

Why

Ops knowledge for the preview environment was living outside the repo. This pulls it inline so:

  • New team members can debug the live service from journalctl recipes
  • Smoke tests can run post-deploy or on a cron
  • UptimeRobot setup is reproducible per-environment via one script

Forward-looking note

Test plan

  • bash -n scripts/smoke.sh (syntax check)
  • Run ./scripts/smoke.sh against preview once detailed-health lands; expect PASS on liveness + 401 checks
  • Read docs/ops.md end-to-end

Part of the deployed-state split tracked in #94.

Summary by CodeRabbit

  • Documentation

    • Added an operations guide with service health checks, log troubleshooting recipes, smoke-test usage, and monitoring recommendations.
  • Chores

    • Added a post-deploy smoke test tool that runs authenticated/unauthenticated probes, reports PASS/FAIL/WARN (including latency-based SLOW warnings), and exits non‑zero on failures.
    • Added an automated setup tool to provision external uptime monitors and optionally attach alert contacts.

@alastairong1 alastairong1 requested a review from findolor April 28, 2026 20:02
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 28, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

Adds ops runbook documentation, a shell smoke-test runner for post-deploy validation, and an UptimeRobot provisioning script to create liveness and keyword monitors for the rest-api service.

Changes

Operational Monitoring and Smoke Testing Infrastructure

Layer / File(s) Summary
Operational Monitoring Strategy
docs/ops.md
Service health curl checks, /health/detailed expectations (cache_warmer), journalctl diagnostic queries for 429s/ERRORs/slow requests, smoke-test usage, and cron/monitoring example.
Smoke Test Implementation
scripts/smoke.sh
Bash smoke runner with env-driven configuration, probe() capturing HTTP status and latency, PASS/FAIL/WARN handling, unauthenticated and conditional authenticated probes, and exit-code summary.
UptimeRobot Monitoring Setup
scripts/uptimerobot-setup.sh
Creates three UptimeRobot monitors: /health liveness and two keyword checks on /health/detailed; supports API key, interval, optional alert contact, and friendly label overrides.

Sequence Diagram

sequenceDiagram
  participant Cron as Cron Scheduler
  participant Smoke as Smoke Test
  participant API as REST API
  participant Monitor as UptimeRobot
  participant Alert as Alert System

  Cron->>Smoke: Execute smoke tests
  Smoke->>API: Probe /health and /health/detailed
  API-->>Smoke: Responses + latency
  Smoke->>Smoke: Validate responses & latency
  Smoke-->>Alert: Fire on test FAIL

  Monitor->>API: HTTP health checks and keyword validation
  API-->>Monitor: Status responses
  alt Monitor detects failure
    Monitor-->>Alert: Trigger alert
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I hopped through logs and curl-filled nights,
Crafted probes to check the lights,
A script to sniff the API's mood,
Uptime robots to watch and brood,
All set — the garden stays polite.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add ops cheat sheet and smoke/uptime monitoring scripts' directly and clearly describes the main changes: three new additions (docs/ops.md, scripts/smoke.sh, scripts/uptimerobot-setup.sh) representing operational tooling for the service.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch alastair/ops-tooling

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@findolor findolor changed the base branch from main to graphite-base/96 April 30, 2026 11:48
@findolor findolor force-pushed the alastair/ops-tooling branch from 9a6514a to 8ff4b51 Compare April 30, 2026 11:48
@findolor findolor changed the base branch from graphite-base/96 to alastair/health-detailed-endpoint April 30, 2026 11:48
Copy link
Copy Markdown
Collaborator

findolor commented Apr 30, 2026


How to use the Graphite Merge Queue

Add the label add-to-gt-merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@findolor findolor changed the base branch from alastair/health-detailed-endpoint to graphite-base/96 May 1, 2026 06:59
@findolor findolor force-pushed the alastair/ops-tooling branch from 8ff4b51 to abee3da Compare May 4, 2026 07:40
@findolor findolor force-pushed the graphite-base/96 branch from 8b055a6 to dc10129 Compare May 4, 2026 07:40
@findolor findolor changed the base branch from graphite-base/96 to alastair/health-detailed-endpoint May 4, 2026 07:40
@findolor findolor force-pushed the alastair/ops-tooling branch from abee3da to 26859c3 Compare May 4, 2026 10:47
@findolor findolor force-pushed the alastair/health-detailed-endpoint branch from dc10129 to 6e0d829 Compare May 4, 2026 10:47
@graphite-app graphite-app Bot changed the base branch from alastair/health-detailed-endpoint to graphite-base/96 May 6, 2026 18:12
graphite-app Bot pushed a commit that referenced this pull request May 7, 2026
## Motivation

`GET /health` only proves that the API process is up. It does not show whether raindex local DB sync is configured, actively syncing, ready for reads, or failing.

The detailed endpoint gives monitoring and operators an on-demand snapshot without this API needing to know raindex local DB internals.

## Solution

- Add `GET /health/detailed` while keeping the existing `GET /health` liveness response unchanged.
- Bump `lib/rain.orderbook` to `0f2f3d19212e6682c70e52a335d66988a10083a6` and use the new `get_local_db_sync_snapshot()` API for raindex local DB sync status.
- Report app DB health plus raindex sync state, including configured/healthy flags, scheduler state, network readiness, orderbook readiness, sync phase, last synced block, update timestamp, and errors.
- Refactor the detailed health implementation to call `raindex.client().get_local_db_sync_snapshot().await` instead of opening/querying SQLite directly.
- Remove the direct `rusqlite` dependency from this crate.
- Add OpenAPI response types for raindex sync networks and orderbooks.
- Map raindex component status as `active`, `syncing`, `failure`, or `not_configured`; top-level API health remains `ok`, `degraded`, or `error`.
- Update local test fixtures for orderbook YAML spec version 5 and adjust small SDK API changes from the submodule bump.

## Chained PRs

- **#96** (`alastair/ops-tooling`) is stacked on this PR and should be merged after this PR.

## Dependent PRs

- **rainlanguage/raindex#2563** must merge before this PR. This PR bumps `lib/rain.orderbook` to a commit that depends on that raindex change.

## Checks

- [x] `nix develop -c cargo fmt`
- [x] `nix develop -c cargo test`
- [x] `nix develop -c rainix-rs-static`

Part of the deployed-state split tracked in #94.

<!-- This is an auto-generated comment: release notes by coderabbit.ai -->

## Summary by CodeRabbit

* **New Features**
  * Added a detailed health endpoint providing structured status information about the application's database connectivity and synchronization components, including component-level metrics and error details.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
@findolor findolor force-pushed the alastair/ops-tooling branch from 26859c3 to 24c9c0c Compare May 7, 2026 08:31
@findolor findolor force-pushed the graphite-base/96 branch from 6e0d829 to 9b74421 Compare May 7, 2026 08:31
@findolor findolor changed the base branch from graphite-base/96 to main May 7, 2026 08:31
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
scripts/smoke.sh (1)

51-53: 💤 Low value

auth_header as an unquoted string breaks if API_KEY or API_SECRET contains spaces or special characters; prefer an array.

The # shellcheck disable=SC2086 suppression sidesteps the warning but doesn't eliminate the risk. Since args is already an array, extending that pattern here keeps the suppression comment unnecessary and handles pathological key values correctly.

♻️ Proposed refactor
-    local auth_header=""
+    local auth_args=()
     if [[ -n "$API_KEY" && -n "$API_SECRET" ]]; then
-        auth_header="-u $API_KEY:$API_SECRET"
+        auth_args=(-u "$API_KEY:$API_SECRET")
     fi
 
     local tmp
     tmp=$(mktemp)
-    # shellcheck disable=SC2086
     local result
-    result=$(curl -sS -X "$method" $auth_header \
+    result=$(curl -sS -X "$method" "${auth_args[@]}" \
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/smoke.sh` around lines 51 - 53, The auth header is built as an
unquoted string (auth_header) which breaks on spaces/special chars; change it to
an array form consistent with args (e.g., build auth_args as an array or append
to the existing args array) and use that array when invoking curl/command so
each credential is a separate element; update the conditional that currently
sets auth_header="-u $API_KEY:$API_SECRET" to instead add ("-u"
"$API_KEY:$API_SECRET") to the args array (or create auth_args with those
elements) to avoid word-splitting and remove the SC2086 suppression.
scripts/uptimerobot-setup.sh (1)

75-76: ⚡ Quick win

Add a --max-time to the UptimeRobot API curl call.

Without it the script hangs indefinitely if the API is unreachable or slow to respond.

♻️ Proposed fix
-    resp=$(curl -sS -X POST "${args[@]}" \
-        https://api.uptimerobot.com/v2/newMonitor)
+    resp=$(curl -sS -X POST --max-time 15 "${args[@]}" \
+        https://api.uptimerobot.com/v2/newMonitor)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/uptimerobot-setup.sh` around lines 75 - 76, The curl invocation that
sets resp (the POST to https://api.uptimerobot.com/v2/newMonitor) can hang
indefinitely; add a timeout option (e.g., --max-time 30) to the curl arguments
so the request fails after a bounded number of seconds, making sure to include
the option in the same args array or inline curl invocation that constructs
resp.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/ops.md`:
- Around line 62-67: The grep regex in the docs snippet currently requires a
decimal point ("duration_ms\":[0-9]+\.[0-9]+") so integer-only durations are
missed; update the pattern to allow optional fractional parts (e.g. use
"duration_ms\":[0-9]+(?:\.[0-9]+)?" or "\d+(?:\.\d+)?") so matches include both
"duration_ms":5001 and "duration_ms":5001.23; adjust the grep expression in the
journalctl pipeline accordingly (leave the awk $2 > 5000 check unchanged).

In `@scripts/smoke.sh`:
- Line 128: The jq filter in the probe call probe "GET /v1/orders/token/{usdc}"
... uses ".orders | type == \"array\" and .pagination" which evaluates
.pagination against the .orders array and always fails; fix it by making the
type check apply only to .orders (wrap that expression in parentheses) and
separately test that the top-level .pagination exists (use a null-check like
.pagination != null), i.e. change the filter so it reads: (.orders | type ==
"array") and (.pagination != null) so .pagination is evaluated on the root
object.

---

Nitpick comments:
In `@scripts/smoke.sh`:
- Around line 51-53: The auth header is built as an unquoted string
(auth_header) which breaks on spaces/special chars; change it to an array form
consistent with args (e.g., build auth_args as an array or append to the
existing args array) and use that array when invoking curl/command so each
credential is a separate element; update the conditional that currently sets
auth_header="-u $API_KEY:$API_SECRET" to instead add ("-u"
"$API_KEY:$API_SECRET") to the args array (or create auth_args with those
elements) to avoid word-splitting and remove the SC2086 suppression.

In `@scripts/uptimerobot-setup.sh`:
- Around line 75-76: The curl invocation that sets resp (the POST to
https://api.uptimerobot.com/v2/newMonitor) can hang indefinitely; add a timeout
option (e.g., --max-time 30) to the curl arguments so the request fails after a
bounded number of seconds, making sure to include the option in the same args
array or inline curl invocation that constructs resp.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 53a31c5f-4c07-4f79-9891-310e4278ad0c

📥 Commits

Reviewing files that changed from the base of the PR and between 9b74421 and 24c9c0c.

📒 Files selected for processing (3)
  • docs/ops.md
  • scripts/smoke.sh
  • scripts/uptimerobot-setup.sh

Comment thread docs/ops.md
Comment on lines +62 to +67
journalctl -u rest-api --since '1 hour ago' --no-pager \
| grep 'request completed' \
| grep -oE 'duration_ms":[0-9]+\.[0-9]+' \
| awk -F: '$2 > 5000 { print }' \
| wc -l
```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

grep pattern misses integer-only duration_ms values — slow requests will be silently under-counted.

The pattern duration_ms":[0-9]+\.[0-9]+ requires a decimal point, so requests like "duration_ms":5001 (no fractional part) won't match, and the slow-request count will be lower than the real value.

🐛 Proposed fix
-  | grep -oE 'duration_ms":[0-9]+\.[0-9]+' \
-  | awk -F: '$2 > 5000 { print }' \
+  | grep -oE 'duration_ms":[0-9]+(\.[0-9]+)?' \
+  | awk -F: '$2 > 5000 { print }' \
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
journalctl -u rest-api --since '1 hour ago' --no-pager \
| grep 'request completed' \
| grep -oE 'duration_ms":[0-9]+\.[0-9]+' \
| awk -F: '$2 > 5000 { print }' \
| wc -l
```
journalctl -u rest-api --since '1 hour ago' --no-pager \
| grep 'request completed' \
| grep -oE 'duration_ms":[0-9]+(\.[0-9]+)?' \
| awk -F: '$2 > 5000 { print }' \
| wc -l
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/ops.md` around lines 62 - 67, The grep regex in the docs snippet
currently requires a decimal point ("duration_ms\":[0-9]+\.[0-9]+") so
integer-only durations are missed; update the pattern to allow optional
fractional parts (e.g. use "duration_ms\":[0-9]+(?:\.[0-9]+)?" or
"\d+(?:\.\d+)?") so matches include both "duration_ms":5001 and
"duration_ms":5001.23; adjust the grep expression in the journalctl pipeline
accordingly (leave the awk $2 > 5000 check unchanged).

Comment thread scripts/smoke.sh
# 3. Authenticated endpoints — only run if creds are set
if [[ -n "$API_KEY" && -n "$API_SECRET" ]]; then
probe "GET /v1/tokens" GET "/v1/tokens" 200 '.tokens | type == "array"'
probe "GET /v1/orders/token/{usdc}" GET "/v1/orders/token/$USDC_BASE" 200 '.orders | type == "array" and .pagination'
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Broken jq filter — this probe will always FAIL even on a healthy endpoint.

After the pipe |, the input is the .orders array. Evaluating .pagination on an array returns null, so type == "array" and .pagination becomes true and null which is false in jq. jq -e exits non-zero for false, marking the check as a shape mismatch regardless of the actual response.

🐛 Proposed fix
-    probe "GET /v1/orders/token/{usdc}"          GET "/v1/orders/token/$USDC_BASE" 200 '.orders | type == "array" and .pagination'
+    probe "GET /v1/orders/token/{usdc}"          GET "/v1/orders/token/$USDC_BASE" 200 '(.orders | type == "array") and (.pagination != null)'

To confirm the behaviour locally:

#!/bin/bash
# Expect: first prints "false", second prints "true"
echo '{"orders":[{"id":1}],"pagination":{"page":1}}' \
  | jq '.orders | type == "array" and .pagination'

echo '{"orders":[{"id":1}],"pagination":{"page":1}}' \
  | jq '(.orders | type == "array") and (.pagination != null)'
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/smoke.sh` at line 128, The jq filter in the probe call probe "GET
/v1/orders/token/{usdc}" ... uses ".orders | type == \"array\" and .pagination"
which evaluates .pagination against the .orders array and always fails; fix it
by making the type check apply only to .orders (wrap that expression in
parentheses) and separately test that the top-level .pagination exists (use a
null-check like .pagination != null), i.e. change the filter so it reads:
(.orders | type == "array") and (.pagination != null) so .pagination is
evaluated on the root object.

@graphite-app
Copy link
Copy Markdown

graphite-app Bot commented May 7, 2026

Merge activity

## Summary
- New `docs/ops.md` — journalctl + curl recipes for the deployed service, cache_warmer status interpretation, smoke-test usage
- New `scripts/smoke.sh` — end-to-end correctness + latency probe; asserts on response shape (not just status), reports per-check latency, exits non-zero on FAIL
- New `scripts/uptimerobot-setup.sh` — one-shot creation of the 3 baseline UptimeRobot monitors (liveness, /health/detailed status=ok, cache_warmer running=true)

## Why
Ops knowledge for the preview environment was living outside the repo. This pulls it inline so:
- New team members can debug the live service from journalctl recipes
- Smoke tests can run post-deploy or on a cron
- UptimeRobot setup is reproducible per-environment via one script

## Forward-looking note
- The smoke + UptimeRobot scripts probe `/health/detailed.cache_warmer`, which lights up only after **#95** (detailed health endpoint) and the cache-warmer PR (TBD, see #94) merge. Until then those checks return FAIL/keyword-miss against `main` — that's expected.
- No application code touched in this PR; safe to merge ahead of the dependent endpoints.

## Test plan
- [ ] `bash -n scripts/smoke.sh` (syntax check)
- [ ] Run `./scripts/smoke.sh` against preview once detailed-health lands; expect PASS on liveness + 401 checks
- [ ] Read `docs/ops.md` end-to-end

Part of the deployed-state split tracked in #94.

<!-- This is an auto-generated comment: release notes by coderabbit.ai -->

## Summary by CodeRabbit

* **Documentation**
  * Added comprehensive operations guide with service health check commands, troubleshooting queries, and monitoring recommendations.

* **Chores**
  * Added post-deployment smoke test script for validating service functionality and latency.
  * Added automated monitoring setup script for configuring external health checks.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
@graphite-app graphite-app Bot force-pushed the alastair/ops-tooling branch from 24c9c0c to e441fa2 Compare May 7, 2026 09:35
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
scripts/smoke.sh (1)

128-128: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Broken jq filter — probe will always FAIL on a healthy endpoint.

This was flagged in a previous review and remains unresolved.

🐛 Proposed fix
-    probe "GET /v1/orders/token/{usdc}"          GET "/v1/orders/token/$USDC_BASE" 200 '.orders | type == "array" and .pagination'
+    probe "GET /v1/orders/token/{usdc}"          GET "/v1/orders/token/$USDC_BASE" 200 '(.orders | type == "array") and (.pagination != null)'
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/smoke.sh` at line 128, The jq filter in the probe call for GET
"/v1/orders/token/$USDC_BASE" is malformed and always fails; update the
expression so it evaluates the two conditions separately and combines them, e.g.
change the filter from '.orders | type == "array" and .pagination' to '(.orders
| type == "array") and (.pagination != null)' in the probe invocation so it
correctly checks that .orders is an array and .pagination is present.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@scripts/smoke.sh`:
- Line 116: The probe for "GET /health/detailed" uses a truthy check '.status'
which accepts any non-null value (e.g., "degraded"); change the assertion to an
explicit equality check '.status == "ok"' so the probe only passes when the
detailed health status is exactly "ok" (update the probe entry for "GET
/health/detailed" accordingly).
- Around line 51-53: The current assignment auth_header="-u
$API_KEY:$API_SECRET" exposes credentials in process listings; instead, create a
temporary netrc file (use mktemp) containing a machine entry with login set to
$API_KEY and password set to $API_SECRET, update the curl invocation to remove
auth_header and pass --netrc-file <tempfile>, and ensure the tempfile is
securely permissioned and removed after use (trap cleanup on EXIT); modify any
references to auth_header in the script (e.g., the curl call that used
$auth_header) to use the --netrc-file option and remove the SC2086 suppression.

---

Duplicate comments:
In `@scripts/smoke.sh`:
- Line 128: The jq filter in the probe call for GET
"/v1/orders/token/$USDC_BASE" is malformed and always fails; update the
expression so it evaluates the two conditions separately and combines them, e.g.
change the filter from '.orders | type == "array" and .pagination' to '(.orders
| type == "array") and (.pagination != null)' in the probe invocation so it
correctly checks that .orders is an array and .pagination is present.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1466c004-63fa-404f-a6d3-9a9020df49ad

📥 Commits

Reviewing files that changed from the base of the PR and between 24c9c0c and e441fa2.

📒 Files selected for processing (3)
  • docs/ops.md
  • scripts/smoke.sh
  • scripts/uptimerobot-setup.sh
✅ Files skipped from review due to trivial changes (1)
  • docs/ops.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • scripts/uptimerobot-setup.sh

Comment thread scripts/smoke.sh
Comment on lines +51 to +53
if [[ -n "$API_KEY" && -n "$API_SECRET" ]]; then
auth_header="-u $API_KEY:$API_SECRET"
fi
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

API credentials exposed in the process table.

auth_header="-u $API_KEY:$API_SECRET" expands into the curl command line, making credentials visible in /proc/<pid>/cmdline and via ps aux on multi-user or shared CI systems. Prefer passing credentials through --netrc-file with a temporary file, or via a --config file written to a mktemp path.

🔒 Proposed fix using a temporary netrc file
-    local auth_header=""
-    if [[ -n "$API_KEY" && -n "$API_SECRET" ]]; then
-        auth_header="-u $API_KEY:$API_SECRET"
-    fi
+    local auth_args=()
+    local netrc_tmp=""
+    if [[ -n "$API_KEY" && -n "$API_SECRET" ]]; then
+        netrc_tmp=$(mktemp)
+        printf 'default login %s password %s\n' "$API_KEY" "$API_SECRET" > "$netrc_tmp"
+        auth_args=(--netrc-file "$netrc_tmp")
+    fi

Then update the curl invocation and cleanup:

-    result=$(curl -sS -X "$method" $auth_header \
+    result=$(curl -sS -X "$method" "${auth_args[@]}" \
         -o "$tmp" \
         -w '%{http_code} %{time_total}\n' \
         --max-time 30 \
         "$API_URL$path" 2>&1) || true
+    [[ -n "$netrc_tmp" ]] && rm -f "$netrc_tmp"

This also eliminates the SC2086 word-splitting suppression comment.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/smoke.sh` around lines 51 - 53, The current assignment
auth_header="-u $API_KEY:$API_SECRET" exposes credentials in process listings;
instead, create a temporary netrc file (use mktemp) containing a machine entry
with login set to $API_KEY and password set to $API_SECRET, update the curl
invocation to remove auth_header and pass --netrc-file <tempfile>, and ensure
the tempfile is securely permissioned and removed after use (trap cleanup on
EXIT); modify any references to auth_header in the script (e.g., the curl call
that used $auth_header) to use the --netrc-file option and remove the SC2086
suppression.

Comment thread scripts/smoke.sh

# 1. Public endpoints (no auth)
probe "GET /health" GET "/health" 200 '.status == "ok"'
probe "GET /health/detailed" GET "/health/detailed" 200 '.status'
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

/health/detailed assertion is weaker than intended — passes for any non-null .status.

Line 115 uses .status == "ok" (boolean), but line 116 uses .status (truthy check only). A response with "status": "degraded" would still pass. If the intent is to verify the endpoint is healthy, use the same equality check.

🐛 Proposed fix
-probe "GET /health/detailed"                 GET "/health/detailed" 200 '.status'
+probe "GET /health/detailed"                 GET "/health/detailed" 200 '.status == "ok"'
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
probe "GET /health/detailed" GET "/health/detailed" 200 '.status'
probe "GET /health/detailed" GET "/health/detailed" 200 '.status == "ok"'
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/smoke.sh` at line 116, The probe for "GET /health/detailed" uses a
truthy check '.status' which accepts any non-null value (e.g., "degraded");
change the assertion to an explicit equality check '.status == "ok"' so the
probe only passes when the detailed health status is exactly "ok" (update the
probe entry for "GET /health/detailed" accordingly).

@graphite-app graphite-app Bot merged commit e441fa2 into main May 7, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants