Skip to content

fix: validate live sync payload before swapping snapshot (#159)#164

Open
greatjourney589 wants to merge 6 commits into
MkDev11:mainfrom
greatjourney589:fix/issue-159-live-sync-validation
Open

fix: validate live sync payload before swapping snapshot (#159)#164
greatjourney589 wants to merge 6 commits into
MkDev11:mainfrom
greatjourney589:fix/issue-159-live-sync-validation

Conversation

@greatjourney589
Copy link
Copy Markdown

@greatjourney589 greatjourney589 commented May 26, 2026

Summary

Hardens the live repo-sync path so a malformed but successful (HTTP 200) upstream
response can no longer wipe the in-memory snapshot or zero out persisted weights.
The untrusted third-party body is now parsed and validated into local maps
before the authoritative snapshot is touched; any failure is rejected wholesale
and handled like a failed fetch, leaving the last known-good snapshot and DB
weights untouched.

Validation added in refreshLiveIfStale (src/lib/repos-server.ts):

  • Shape check — reject anything that isn't a plain object (null, arrays,
    primitives), e.g. a GitHub {"message":"Not Found"} error object or a schema
    change to an array.
  • Per-entry validation — each key must match owner/name
    (REPO_FULL_NAME_RE) and each value must be a plain object; bad entries are
    skipped and counted.
  • Empty guard — if no valid owner/name entries survive, reject (covers {}
    from a broken upstream deploy or a truncated file).
  • Sanity floor — reject a snapshot that would collapse the known-good live
    universe below MIN_LIVE_RETAIN_FRACTION (50%), failing loud rather than
    silently zeroing repos.

The validated candidate is only swapped in synchronously (no awaits through the
DB transaction), so readers never observe a half-applied snapshot.

Related Issues

Fixes #159.

Type of Change

  • Bug fix
  • New feature
  • Enhancement
  • Refactor
  • Documentation
  • Other (describe):

Testing

  • pnpm build passes
  • Manual browser smoke test (for UI changes)
  • N/A — docs / config only

Checklist

  • Self-reviewed the diff
  • Follows existing code patterns and naming
  • No unrelated changes included
  • Documentation updated if behavior changed

Summary by CodeRabbit

  • Improvements
    • Stricter validation of upstream repository data; invalid entries are filtered out before being applied.
    • Added safeguards to prevent unsafe or excessive upstream changes from degrading repository state.
    • Atomic application of validated snapshots to the live in-memory view to ensure consistent updates.
    • Live-sync logs now report how many upstream entries were skipped.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 26, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: fbf3109c-cc63-42f7-9564-d8de13000217

📥 Commits

Reviewing files that changed from the base of the PR and between b0b0a42 and 95b9db3.

📒 Files selected for processing (1)
  • src/lib/repos-server.ts

📝 Walkthrough

Walkthrough

The PR hardens live repo sync: it validates upstream JSON as an unknown/plain object, accepts only plain-object entries whose keys match a strict owner/name regex, rejects snapshots with zero valid entries or that shrink past a configured fraction, atomically swaps validated in-memory maps, and logs skipped upstream entries.

Changes

Live repo sync validation and atomic update

Layer / File(s) Summary
Validation constants and helpers
src/lib/repos-server.ts
Adds regex pattern for valid owner/name repo identifiers, MIN_LIVE_RETAIN_FRACTION constant for sanity-floor checks, and isPlainObject utility to guard against non-object HTTP payloads.
Fetch response parsing and per-entry validation
src/lib/repos-server.ts
Parses HTTP response as unknown, rejects non-plain-object payloads, iterates entries and validates each key/value pair (owner/name regex and plain-object value), skips invalid records, counts skipped entries, and fails the refresh if no valid entries are collected.
Sanity-floor check and atomic map swap
src/lib/repos-server.ts
Compares candidate snapshot size against previous in-memory size (or cold-start floor); rejects refreshes shrinking beyond the configured threshold. Then synchronously swaps validated candidate maps into liveByLc and liveConfigJsonByLc before the DB upsert transaction, and includes skipped-entry counts in the success log.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐇 A noisy payload drifts from far, untrusted and gray,
I sniff for plain objects and chase the wrong away.
Owner/name must match, each entry trimmed and kept,
A sanity-floor stands guard while old maps safely slept.
Now the hub wakes steady — no empty-night mishap day.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding payload validation before updating the live snapshot, which directly addresses the core security concern in issue #159.
Linked Issues check ✅ Passed The PR implements all coding requirements from #159: shape validation (non-object rejection), per-entry validation (owner/name regex), empty guard, sanity floor check (MIN_LIVE_RETAIN_FRACTION), atomic snapshot swaps, and failure-as-bad-fetch treatment with logging.
Out of Scope Changes check ✅ Passed All changes in refreshLiveIfStale are directly scoped to validating and securing the live sync payload processing path as specified in #159; no unrelated modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

src/lib/repos-server.ts

ESLint skipped: missing config or dependency (missing-dependency). The ESLint configuration references a package that is not available in the sandbox.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/lib/repos-server.ts (1)

294-306: 💤 Low value

Consider logging the skipped count for observability.

The skipped count is computed but only appears in the error message when all entries fail validation. Logging it on success (when skipped > 0) would help diagnose partial upstream data quality issues without requiring intervention.

💡 Optional: Log skipped entries on success
       if (nextByLc.size === 0) {
         throw new Error(`no valid "owner/name" entries (${Object.keys(parsed).length} keys, ${skipped} skipped)`);
       }
+      if (skipped > 0) {
+        console.warn(`[repos] live sync: skipped ${skipped} invalid entries`);
+      }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lib/repos-server.ts` around lines 294 - 306, The loop currently tracks
skipped but only reports it when nextByLc.size === 0; update the code after the
loop to emit an info/debug log when skipped > 0 so partial-validation issues are
observable: reference the variables parsed, skipped, REPO_FULL_NAME_RE and
nextByLc and call the existing logger (or add one if none is available) to log a
short message like "skipped X invalid repo entries out of Y" including skipped
and Object.keys(parsed).length; keep the existing throw for the all-invalid case
unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/lib/repos-server.ts`:
- Around line 294-306: The loop currently tracks skipped but only reports it
when nextByLc.size === 0; update the code after the loop to emit an info/debug
log when skipped > 0 so partial-validation issues are observable: reference the
variables parsed, skipped, REPO_FULL_NAME_RE and nextByLc and call the existing
logger (or add one if none is available) to log a short message like "skipped X
invalid repo entries out of Y" including skipped and Object.keys(parsed).length;
keep the existing throw for the all-invalid case unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 996b81d6-bf66-4dfe-aef1-cb2a83172445

📥 Commits

Reviewing files that changed from the base of the PR and between aab5b71 and b0b0a42.

📒 Files selected for processing (1)
  • src/lib/repos-server.ts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Live repo-weight sync trusts any HTTP-200 body — a malformed/empty upstream response zeroes every repo and empties the dashboard

2 participants