feat(daemon): provider cost guardrails — transition auditing, remote-provider lock, rollback#497
feat(daemon): provider cost guardrails — transition auditing, remote-provider lock, rollback#497
Conversation
…provider lock, rollback Closes #370. Adds server-side provider cost guardrails for extraction and synthesis provider changes: Provider transition auditing: - Detects provider transitions on config saves (POST /api/config) - Records transitions with timestamp, source, actor, and risk level - Retains last 100 transitions in persistent audit log - Logs warnings when audit truncation occurs Remote-provider locking: - allowRemoteProviders flag (default true) gates remote providers - validateProviderSafety rejects remote providers when lock is active - Implicit lock bypass (omitting the flag while selecting a remote provider) returns 400; explicit allowRemoteProviders: true is allowed - Runtime lock in loadPipelineConfig falls back to local provider when remote providers are disabled Rollback API: - POST /api/config/provider-safety/rollback rolls back the latest provider transition, restoring the previous provider - Clears stale model/endpoint/base_url fields at both flat (pipelineV2) and nested (extraction/synthesis) levels to prevent silent failures - Single-flight serialization prevents concurrent rollback corruption - Source file resolution with 404 on missing/deleted files - Each transition can only be rolled back once API: - GET /api/config/provider-safety returns snapshot + transitions - POST /api/config/provider-safety/rollback performs rollback - POST /api/config returns auditError on audit-write failures - All 100-entry audit retention and error codes documented in API.md
|
Hi @aaf2tbz - I'm an automated code reviewer powered by pr-reviewer. I'm taking a look at the feature work in |
PR-Reviewer-Ant
left a comment
There was a problem hiding this comment.
Automated review by pr-reviewer | model: claude-sonnet-4-6 | commit:
12cb6a0e
[Automated Review — pr-reviewer bot] Found one blocking correctness/security issue (cost lock silently lifted), one warning-level auth gap on a new mutation endpoint, and two nitpicks.
Confidence: High [sufficient_diff_evidence, targeted_context_included] - The lock-lifting defect is directly provable from the diff: writeFileSync(filePath, content, 'utf-8') writes the caller-supplied YAML verbatim. When that YAML omits allowRemoteProviders (defaulting to true at read-time) but selects only local providers, the lockImplicitlyLifted branch logs a warning and falls through — the saved file no longer contains allowRemoteProviders: false. The prior lock state is gone from disk. This is directly visible in the new POST /api/config handler block and the readProviderSafetySnapshot default-to-true logic. The auth gap is provable from the diff by inspection of the new rollback route. Runtime repro is not required to confirm either finding.
…back on admin - Re-inject allowRemoteProviders: false into YAML when lock is active and incoming config omits the flag (prevents silent lock erosion) - Add admin permission check on POST /api/config/provider-safety/rollback - Add comments explaining single-flight serialization guarantee and rollback source-file fallback behavior - Add 3 regression tests for lock preservation
PR-Reviewer-Ant
left a comment
There was a problem hiding this comment.
Automated review by pr-reviewer | model: claude-sonnet-4-6 | commit:
506fbad3
[Automated Review — pr-reviewer bot] PR #497 adds provider cost guardrails: transition auditing, remote-provider lock, and rollback. The core logic is sound and the test coverage is substantial. Three issues are flagged: one warning-level test gap that removes regression coverage for the key lock-preservation feature, one warning-level dead configuration field that misleads future readers, and one nitpick (duplicate import).
Confidence: Medium [sufficient_diff_evidence, missing_cross_module_context] - The key changed files (provider-safety.ts, misc-routes.ts, memory-config.ts, both test files) are fully present in the diff. The issues below are directly provable from the changed lines. Missing context: the auth middleware wiring (which permission level POST /api/config requires in team/hybrid mode) is not included, so I cannot confirm whether lifting the cost lock via a plain config write requires any elevated permission beyond what the endpoint already requires. This doesn't affect the issues flagged, but it limits confidence on the overall security posture.
Unmapped findings (not on changed lines):
- packages/daemon/src/memory-config.ts:71 -
DEFAULT_PIPELINE_V2.extraction.allowRemoteProviders: trueis dead configuration. InloadPipelineConfig, the final fallback forallowRemoteProvidersisd.allowRemoteProviders(the top-level field, line ~480), neverd.extraction.allowRemoteProviders. The extraction-level nested default is never consulted. Future maintainers reading the default constant may assume the extraction-level field has independent weight as a fallback and try to configure the lock there, when only the top-level field matters at runtime. RemoveallowRemoteProvidersfrom theextractionsub-object inDEFAULT_PIPELINE_V2, or add an explicit comment that the value is mirrored from the resolved top-level and carries no independent default semantics.
…duplicate import - Export preserveLockInYaml from provider-safety.ts so tests import the actual production function, not a local duplicate - Remove unused yaml import from misc-routes (moved to provider-safety) - Merge duplicate mkdtempSync imports in test file
…fix comment placement
- Use asRecord(parse(content)) ?? {} to handle null/empty YAML input
- Add recall permission check on GET /api/config/provider-safety
- Move single-flight serialization comment to adjacent rollback handler
PR-Reviewer-Ant
left a comment
There was a problem hiding this comment.
Automated review by pr-reviewer | model: claude-sonnet-4-6 | commit:
afeaadc0
[Automated Review — pr-reviewer bot] PR #497 adds provider cost guardrails to the Signet daemon: transition auditing, remote-provider locking, and rollback. The implementation is well-structured and the stated goals are achieved. Two issues warrant attention before merge: a race condition in the single-flight rollback serialization that can corrupt the audit trail under concurrent requests, and a breaking type change in @signet/core.
Confidence: High [sufficient_diff_evidence, targeted_context_included] - The race condition is directly provable from the finally block ordering in misc-routes.ts lines 253–256 and the absence of a re-check after await _rollbackSignal.promise. The breaking change is explicit in packages/core/src/types.ts at the PipelineV2Config interface. No runtime trace or external module is needed to verify either finding.
…oviders, atomic config write - Replace if-guard with while loop via acquire() helper so multiple concurrent waiters re-check _rollbackSignal after resolution - Make PipelineV2Config.allowRemoteProviders optional (was required, breaking downstream packages) - Use atomicWriteText (temp+rename) for config writes in rollback, consistent with atomic audit writes
PR-Reviewer-Ant
left a comment
There was a problem hiding this comment.
Automated review by pr-reviewer | model: claude-sonnet-4-6 | commit:
fec95486
[Automated Review — pr-reviewer bot] PR #497 adds provider cost guardrails, remote-provider locking, audit logging, and rollback. The implementation is large and generally well-structured, but there are three substantive issues: a missing permission gate on the forward-direction lock bypass, an unsafe cast pattern in preserveLockInYaml that can silently fail, and an unserialised TOCTOU window in the audit log for concurrent config saves.
Confidence: High [sufficient_diff_evidence, targeted_context_included] - All three issues are directly provable from the changed lines. The missing permission check is visible in the POST /api/config diff (no checkPermission call before the lock-bypass path, unlike the rollback and GET endpoints which both add one). The preserveLockInYaml cast issue is explicit in provider-safety.ts lines 121-123. The audit TOCTOU is structural in appendProviderTransitions (read-then-write without a write-side lock), confirmed by the fact that _rollbackSignal only serialises the rollback handler.
…ve non-null assertion - Add admin permission check on POST /api/config for guarded YAML files only (not markdown), preventing unauthorized lock-bypass in team/hybrid mode without breaking dashboard saves in local mode - Replace bare 'as Record<string, unknown>' with asRecord() in preserveLockInYaml to handle non-object YAML values correctly - Replace resolve! non-null assertion with sentinel pattern - Document pre-existing appendProviderTransitions write race as known limitation
PR-Reviewer-Ant
left a comment
There was a problem hiding this comment.
Automated review by pr-reviewer | model: claude-sonnet-4-6 | commit:
af627bc3
[Automated Review — pr-reviewer] This PR adds provider cost guardrails (transition auditing, remote-provider lock, rollback API) to the daemon. The implementation is generally sound, but there are three concrete issues: a missing error code in the API docs, silent YAML comment stripping during config-save lock preservation (no user warning unlike the rollback endpoint), and an undocumented breaking permission change for team/hybrid mode operators.
Confidence: High [sufficient_diff_evidence, targeted_context_included] - All findings are directly traceable to changed lines in the diff. The API docs omission is explicit (the new Returns 400... block names only 400, not 403). The comment-stripping gap is provable by comparing the rollback endpoint note in docs/API.md against the absence of an equivalent note in the config-save path. The breaking permission change is confirmed by reading checkPermission applied to GUARDED_CONFIG_FILES in the new handler.
…for 403 - Add commentsStripped: true to POST /api/config response when lock preservation strips YAML comments via parse/stringify round-trip - Add conflict-detection warning in readProviderSafetySnapshot when top-level and extraction allowRemoteProviders disagree, matching loadPipelineConfig behavior - Document commentsStripped field and 403 error for guarded config saves in API.md
PR-Reviewer-Ant
left a comment
There was a problem hiding this comment.
Automated review by pr-reviewer | model: claude-sonnet-4-6 | commit:
e52936b3
[Automated Review — pr-reviewer] PR #497 adds provider cost guardrails (transition auditing, remote-provider lock, rollback). Auth gating, atomic audit writes, and single-flight rollback serialization are correctly implemented. Four issues found across two files: one security gap (case-insensitive FS bypass on macOS/Windows), one correctness gap (rollback retry strips comments without audit indication), one documented-but-unfixed write race, and one dead-code duplication.
Confidence: Medium [sufficient_diff_evidence, missing_cross_module_context] - The case-insensitive FS bypass, audit-write race, and retry behavior are all directly provable from the changed lines in misc-routes.ts and provider-safety.ts. Confidence is medium rather than high because checkPermission internals (auth/policy.ts) and the full file-validation call stack are not in the diff, so edge cases in local-mode behavior of checkPermission cannot be fully verified.
…ry comment-stripped signal - Case-insensitive GUARDED_CONFIG_FILES matching prevents Agent.yaml from bypassing admin guard on macOS/Windows case-insensitive filesystems - Unified atomicWriteJson/atomicWriteText into single atomicWrite helper with prefix parameter (eliminates duplicate logic) - Added isRetry field to rollback result, surfaced as commentsStripped when a retry re-strips YAML comments without client indication - Documented pre-existing appendProviderTransitions race (no change)
PR-Reviewer-Ant
left a comment
There was a problem hiding this comment.
Automated review by pr-reviewer | model: claude-sonnet-4-6 | commit:
31f4a525
[Automated Review — pr-reviewer] Reviewed PR #497 (provider cost guardrails). Found one blocking type error and one warning-level logic gap in rollback path resolution.
Confidence: High [sufficient_diff_evidence, targeted_context_included] - Both issues are directly provable from the diff. The isRetry type mismatch is explicit in provider-safety.ts (return statement includes isRetry but declared return type omits it) and in misc-routes.ts (result.isRetry accessed on the narrowly-typed value). The resolveRollbackFilePath fallthrough is readable from the control flow in provider-safety.ts lines 248-266.
- Add isRetry: boolean to declared return type so TypeScript recognizes result.isRetry at the call site in misc-routes
PR-Reviewer-Ant
left a comment
There was a problem hiding this comment.
Automated review by pr-reviewer | model: claude-sonnet-4-6 | commit:
9ef4123e
[Automated Review — pr-reviewer bot] PR #497 adds provider cost guardrails with transition auditing, remote-provider locking, and rollback. The implementation is thorough and multi-round reviewed. I found one security-adjacent audit integrity gap and several warnings, no blockers.
Confidence: Medium [sufficient_diff_evidence, missing_cross_module_context] - The diff is self-contained for all new logic. Two missing artifacts reduce certainty: (1) packages/daemon/src/auth/policy.ts — I cannot verify that checkPermission in local mode unconditionally returns allowed: true, which is load-bearing for the 403 guard on guarded config saves. (2) The isPipelineProvider export from @signet/core — I cannot confirm its accepted value set matches exactly the union in PipelineExtractionConfig.provider, which matters for isValidTransitionEntry soundness.
… in rollback, add synthesis rollback test - readProviderTransitions now logs non-ENOENT I/O errors instead of silently returning []; ENOENT is still treated as missing file - applyProviderRollback no longer creates empty extraction/synthesis sub-blocks when the config uses only flat keys (extractionProvider) - Removed unused ensureRecord helper (no callers after refactor) - Added synthesis-role rollback end-to-end test - Added regression test for empty sub-block avoidance - Audit concurrent write race acknowledged as pre-existing limitation
PR-Reviewer-Ant
left a comment
There was a problem hiding this comment.
Automated review by pr-reviewer | model: claude-sonnet-4-6 | commit:
e0db5b33
Automated bot review (pr-reviewer) of PR #497: provider cost guardrails — transition auditing, remote-provider lock, rollback. After 10 rounds of review most structural concerns are resolved; three discussion-worthy findings remain.
Confidence: Medium [sufficient_diff_evidence, missing_cross_module_context] - The core logic (single-flight, atomic writes, lock preservation, validation ordering) is directly readable from the diff and is sound. Missing context: (1) whether any existing dashboard/SDK client reads and acts on the new commentsStripped or isRetry response fields; (2) how downstream consumers of PipelineSynthesisConfig behave when synthesis is pinned to 'ollama' vs. following extraction implicitly — that determination requires reading callers outside the diff.
…Retry, clean rollback response - Remove else branch in applyProviderRollback that created a synthesis: block when one did not exist in the config, preventing silent provider pinning after rollback - Remove commentsStripped from rollback response (conflicted with config-save semantics); isRetry via ...result spread is the canonical signal for retry detection - Remove unused isRetry local variable in rollback handler - Document isRetry field in API.md rollback response example - Remove redundant ?? 'ollama' on resolveExtractionFallbackProvider (default already covers it) - Add test: synthesis rollback with no existing synthesis block
PR-Reviewer-Ant
left a comment
There was a problem hiding this comment.
Automated review by pr-reviewer | model: claude-sonnet-4-6 | commit:
689c8ce5
[Automated Review — pr-reviewer bot] PR #497 adds provider cost guardrails (audit, remote-provider lock, rollback). 11 prior review rounds addressed most issues. Two warning-level findings remain: a silent no-op rollback for synthesis when the current config lacks a synthesis block, and preserveLockInYaml fabricating a pipelineV2 section from nothing when the incoming config omits it. No blocking security issues found.
Confidence: Medium [sufficient_diff_evidence, missing_cross_module_context] - The synthesis no-op and preserveLockInYaml concerns are directly visible in the provider-safety.ts diff. Confidence is medium because checkPermission and authConfig mode semantics are referenced but their implementations are not in this diff — the permission-gating correctness relies on them behaving as documented.
… pipelineV2 - executeProviderRollback now detects when applyProviderRollback produces no content change (e.g. synthesis rollback on config with no synthesis block) and throws RollbackError(400) instead of silently consuming the audit entry - preserveLockInYaml returns content unchanged when incoming config has no memory or pipelineV2 section, preventing creation of a pipelineV2 block from nothing on partial config updates - Added tests for no-op rollback error and preserveLockInYaml guard - Updated ping-pong test to use realistic config state
PR-Reviewer-Ant
left a comment
There was a problem hiding this comment.
Automated review by pr-reviewer | model: claude-sonnet-4-6 | commit:
70edf45b
[Automated Review — pr-reviewer] Large, well-structured PR across 12 review rounds. One unhandled-exception path in applyProviderRollback surfaces as a misleading 500 to callers; two lower-priority nits around isRetry semantics and a docs/code inconsistency. No security or data-corruption blockers found.
Confidence: High [sufficient_diff_evidence, targeted_context_included] - The applyProviderRollback throw path is directly visible in the added function (provider-safety.ts line ~320). The executeProviderRollback caller and the route-level catch block are both in the diff, making the 500-vs-400 divergence traceable end-to-end without runtime reproduction. The isRetry issue is similarly provable from the return expression alone.
Unmapped findings (not on changed lines):
- docs/API.md:401 - The rollback API docs state "Older entries are silently dropped" on truncation, but
appendProviderTransitionsemits alogger.warnwhen truncation occurs. Minor doc/code inconsistency; the implementation behaviour (logged warning) is preferable — the docs should be updated to match.
…try docs - applyProviderRollback now throws RollbackError(400) instead of plain Error for missing previous provider and missing pipelineV2 section, preventing misleading 500 responses - API.md: isRetry docs expanded to cover both retry-after-audit-failure and manual-provider-restore cases - API.md: audit truncation wording corrected from 'silently dropped' to 'dropped with a logged warning'
PR-Reviewer-Ant
left a comment
There was a problem hiding this comment.
Automated review by pr-reviewer | model: claude-sonnet-4-6 | commit:
13318ab8
[Automated Review — pr-reviewer bot] After 13 revision rounds this PR is well-tested and addresses its stated goals. A few residual concerns are worth human attention before merge: one logic gap in the fallback rollback path, one ordering subtlety in the single-flight finally block, and a potentially confusing dual allowRemoteProviders field.
Confidence: Medium [sufficient_diff_evidence, missing_runtime_repro] - All changed files are present in the diff. Single-flight lock ordering and YAML mutation logic are statically verifiable. The fallback rollback path behavior (resolveRollbackFilePath → executeProviderRollback with non-standard source) requires runtime tracing to fully confirm; the yaml library's treatment of undefined values during stringify is assumed correct but not runtime-verified.
…e-flight ordering - resolveRollbackFilePath now throws RollbackError(404) when a transition source doesn't match any known config file, instead of silently falling through to a potentially wrong fallback file - Added null-before-resolve ordering comment to the single-flight rollback finally block, explaining why _rollbackSignal must be cleared before resolve?.() to prevent microtask deadlock
PR-Reviewer-Ant
left a comment
There was a problem hiding this comment.
Automated review by pr-reviewer | model: claude-sonnet-4-6 | commit:
fe37dfb2
[Automated Review — pr-reviewer bot] PR #497 adds provider cost guardrails (transition auditing, remote-provider lock, rollback). Found one blocking correctness bug in the audit-write-failure retry path and a related documentation inaccuracy in the inline comment. All other concerns from prior rounds appear resolved.
Confidence: High [sufficient_diff_evidence, targeted_context_included] - The retry-path bug is directly provable from the diff: executeProviderRollback in provider-safety.ts calls atomicWrite(filePath, nextContent, ".rollback-") (config written), then the audit write may fail. On retry the file now contains the output of a prior stringify() call — normalized YAML, no comments. applyProviderRollback on that content calls parse then mutates (setting the same provider value and undefined-assigning already-absent fields), then stringify. For normalized YAML input the yaml package round-trip is idempotent: stringify(parse(stringify(obj))) === stringify(obj). Therefore nextContent === beforeContent evaluates to true, the guard at line ~280 fires, and RollbackError(400) is thrown before any write. The audit entry is never consumed. The behavior is directly observable from the diff without runtime reproduction.
When executeProviderRollback writes the config but the subsequent audit write fails, the next retry now detects the idempotent YAML state (stringify round-trip produces identical content) and marks the audit entry consumed audit-only instead of throwing RollbackError(400). Moved markRolledBack above the no-op guard for correct scoping. Updated comments to reflect new retry semantics. Added regression test for retry-audit-consume path. Co-authored-by: Aaf2tbz <averyfelts@aol.com>
All findings addressed through 15 review rounds.
PR-Reviewer-Ant
left a comment
There was a problem hiding this comment.
Automated review by pr-reviewer | model: gpt-5.3-codex | commit:
3b7d26a4
[Automated Review by pr-reviewer] Found concurrency and rollback-idempotency issues that can violate the PR’s stated rollback safety guarantees.
Confidence: Medium [sufficient_diff_evidence, targeted_context_included, missing_runtime_repro] - The rollback single-flight path uses await acquire() before setting _rollbackSignal, which introduces a check-then-set race under concurrent requests in the same event loop tick. The no-op rollback guard also relies on raw string equality (nextContent === beforeContent) after YAML parse/stringify, which can produce false negatives when formatting/comments change even if provider state is unchanged.
| snapshot: stripInternal(snapshot), | ||
| snapshotError, | ||
| transitions: transitions.map(stripActor), | ||
| latestRiskyTransition: latestRiskyTransition ? stripActor(latestRiskyTransition) : null, |
There was a problem hiding this comment.
Single-flight rollback serialization is still racy. await acquire() returns immediately when _rollbackSignal is null, but because this is awaited, two concurrent handlers can both pass the check before either sets _rollbackSignal, then both enter rollback. This can reintroduce concurrent rollback corruption. Use an atomic check+set pattern (no async gap between check and lock acquisition) or a proper mutex queue.
1. Replace async acquire-then-set single-flight with synchronous boolean flag. The old await acquire() had an async gap between null-check and signal assignment that allowed concurrent rollbacks under event-loop interleaving. Concurrent requests now get 409. 2. Replace raw string equality no-op guard (nextContent === beforeContent) with semantic provider-state comparison. YAML parse/stringify can change formatting/comments even when provider state is identical, causing false negatives on the no-op path. Now extracts current provider for the target role and compares against entry.from. Co-authored-by: Aaf2tbz <averyfelts@aol.com>
PR-Reviewer-Ant
left a comment
There was a problem hiding this comment.
Automated review by pr-reviewer | model: gpt-5.3-codex | commit:
2c29a2bd
[Automated Review by pr-reviewer] Found a rollback correctness regression that can leave stale provider fields and diverges from the PR’s documented behavior.
Confidence: High [sufficient_diff_evidence, targeted_context_included] - The new no-op branch in executeProviderRollback short-circuits before applyProviderRollback runs, so stale model/endpoint fields are never cleared when provider already equals entry.from. This is directly visible in the new provider-safety.ts control flow and conflicts with the stated rollback semantics in docs/API.md.
R16's semantic no-op guard compared provider state and returned early before applyProviderRollback ran, leaving stale model/endpoint/base_url fields uncleared. Now always runs applyProviderRollback first (clears stale fields), then compares parsed JSON objects to detect semantic no-op. Preserves field clearing while avoiding unnecessary config writes on retry. Co-authored-by: Aaf2tbz <averyfelts@aol.com>
PR-Reviewer-Ant
left a comment
There was a problem hiding this comment.
Automated review by pr-reviewer | model: gpt-5.3-codex | commit:
0d58ae7a
[Automated Review by pr-reviewer] Found a rollback correctness regression that diverges from the PR’s stated behavior and can consume audit entries incorrectly.
Confidence: High [sufficient_diff_evidence, targeted_context_included] - In executeProviderRollback, the new semantic no-op branch marks entries rolledBack: true and returns success for any no-op, which includes the documented case of synthesis rollback when no synthesis block exists. This directly conflicts with the PR description/docs claiming a 400 with no audit consumption for that case.
R17 consumed audit entries for all semantic no-ops, including synthesis rollback on configs with no synthesis block. Now validates inside the no-op branch: if no provider exists for the target role, throws 400 without consuming the audit entry. Legitimate retries (provider already matches entry.from) still consume the entry. Co-authored-by: Aaf2tbz <averyfelts@aol.com>
PR-Reviewer-Ant
left a comment
There was a problem hiding this comment.
Automated review by pr-reviewer | model: gpt-5.3-codex | commit:
6450e253
[Automated Review by pr-reviewer] Found one blocking correctness issue in rollback source-file resolution that can make valid transitions non-revertible.
Confidence: High [sufficient_diff_evidence, targeted_context_included] - The new rollback path resolves the target config file by case-sensitive suffix matching against fixed names, while config save accepts case variants and records that original filename in transition source. This mismatch is directly visible in the new resolve logic and can deterministically return 404 for existing files.
resolveRollbackFilePath used case-sensitive endsWith() to match transition source against CONFIG_FILE_CANDIDATES. Config saves can record case variants (e.g. Agent.yaml) that would fail the match, causing 404 on valid transitions. Now normalizes both sides to lowercase before comparison. Co-authored-by: Aaf2tbz <averyfelts@aol.com>
PR-Reviewer-Ant
left a comment
There was a problem hiding this comment.
Automated review by pr-reviewer | model: gpt-5.3-codex | commit:
a1f66df3
[Automated Review by pr-reviewer] Found one blocking rollback correctness bug that can reject valid retries/manual-restore states for extraction configs using nested provider fields.
Confidence: High [sufficient_diff_evidence, targeted_context_included] - In executeProviderRollback() the no-op branch computes extraction currentProvider using topLevelProvider ?? blockProvider, but topLevelProvider is forced to a string (String(... ?? "")) and is therefore never null. This masks nested extraction.provider and can throw 400 even when the nested provider already matches rollback target. The logic is directly visible in the changed provider-safety.ts code and conflicts with the PR’s retry/manual-restore handling claims.
Replaced String(x ?? '') coercion with typeof guards so topLevelProvider and blockProvider are null when absent instead of empty string. This ensures ?? correctly falls through to nested provider for extraction configs. While the bug was unreachable (applyProviderRollback always sets extractionProvider for extraction role, making isNoOp false for nested-only configs), the null semantics now match the canonical readProviderSafetySnapshot resolution pattern (flatExtraction ?? nestedExtraction). Added regression test for nested-only extraction rollback verifying the extractionProvider flat key is added. Co-authored-by: Aaf2tbz <averyfelts@aol.com>
PR-Reviewer-Ant
left a comment
There was a problem hiding this comment.
Automated review by pr-reviewer | model: gpt-5.3-codex | commit:
a8090b3b
[Automated Review by pr-reviewer] I found a blocking rollback-state logic issue that can make subsequent rollbacks fail incorrectly.
Confidence: High [sufficient_diff_evidence, targeted_context_included] - In executeProviderRollback, rollback-generated transitions are appended as normal unconsumed entries (providerTransitions) with a non-null from, and future selection logic treats any unrolled entry with from as rollback-eligible. Those entries have source api/config/provider-safety/rollback, which resolveRollbackFilePath rejects with 404 because it does not map to a config filename. This creates a reproducible stuck state after a successful rollback.
…lection Rollback-generated audit entries (source: api/config/provider-safety/rollback) were selectable as rollback targets. resolveRollbackFilePath can't resolve that source to a config file, causing 404 stuck state on subsequent rollbacks. Now excludes rollback-sourced entries from selection. Added regression test. Co-authored-by: Aaf2tbz <averyfelts@aol.com>
PR-Reviewer-Ant
left a comment
There was a problem hiding this comment.
Automated review by pr-reviewer | model: gpt-5.3-codex | commit:
b9340a8f
[Automated Review by pr-reviewer] I found one blocking correctness issue in rollback source-file resolution that can strand valid transitions and make rollback fail on case-sensitive filesystems.
Confidence: High [sufficient_diff_evidence, targeted_context_included] - The bug is directly visible in the new resolution logic: transitions from guarded saves can record mixed-case filenames (e.g., Agent.yaml), but resolveRollbackFilePath canonicalizes to a candidate name (agent.yaml) before checking existsSync. On Linux/case-sensitive FS this yields a false 404 even though the recorded source file exists.
Case-insensitive match resolved to candidate casing (e.g. agent.yaml) instead of the actual recorded filename (e.g. Agent.yaml), causing false 404 on case-sensitive filesystems. Now extracts the actual filename from match.source preserving original casing. Co-authored-by: Aaf2tbz <averyfelts@aol.com>
PR-Reviewer-Ant
left a comment
There was a problem hiding this comment.
Automated review by pr-reviewer | model: gpt-5.3-codex | commit:
d22149c3
[Automated Review by pr-reviewer] Found one blocking correctness issue: rollback target selection still includes rollback-generated audit entries in the route path, which can cause false 404s and prevent valid subsequent rollbacks.
Confidence: High [sufficient_diff_evidence, targeted_context_included] - The diff shows executeProviderRollback explicitly excludes source === "api/config/provider-safety/rollback", but resolveRollbackFilePath does not. The route calls resolveRollbackFilePath first and will fail early with 404 on rollback-sourced entries before executeProviderRollback can apply its skip logic. This is directly provable from the changed code paths.
…ath too R21 only excluded rollback-sourced entries in executeProviderRollback, but the route calls resolveRollbackFilePath first which had its own independent selection. Extracted shared isRollbackEligible() predicate used by both functions to prevent drift. Co-authored-by: Aaf2tbz <averyfelts@aol.com>
PR-Reviewer-Ant
left a comment
There was a problem hiding this comment.
Automated review by pr-reviewer | model: gpt-5.3-codex | commit:
2384e48f
[Automated Review by pr-reviewer] I found one correctness issue in the new rollback endpoint request handling; otherwise the implementation largely matches the PR description and adds the claimed provider-safety controls.
Confidence: Medium [sufficient_diff_evidence, targeted_context_included, missing_runtime_repro] - The rollback route now accepts malformed JSON by swallowing parse errors and defaulting to an empty body, which can trigger an unintended rollback despite an invalid client payload. This is directly visible in the changed rollback handler logic, but I did not run the endpoint to reproduce behavior end-to-end.
Rollback endpoint swallowed JSON parse errors as empty object, allowing malformed requests to execute rollback with default role selection. Now returns 400 for invalid JSON on this state-mutating admin endpoint. Co-authored-by: Aaf2tbz <averyfelts@aol.com>
|
avery, thank you for putting this together. there’s a lot of thoughtful work in here, and i can tell you really chased down the edge cases. i do think it’s a useful implementation, and i’m glad you explored it. i think my hesitation is mostly about timing and overlap, not that i think this is bad work. with #462 already in flight, this feels like it starts solving some of the same area from a different angle, and i worry merging this now would just add more moving parts right before we’re trying to simplify that whole provider/config path. i think it would probably slow #462 down even more, and then we’d likely end up reshaping parts of this again afterward anyway. so my preference would be to leave this open for now, get #462 merged first, and then come back to this idea once that foundation is in place. i still think there’s useful stuff here, and i’d expect we can absolutely pull pieces from this PR when we revisit it. so this is more “not yet” than “no”. i appreciate the work here a lot. |
Summary
Fixes #370 by adding server-side provider cost guardrails for extraction and synthesis provider changes. This closes the remaining financial-safety gap: provider-change visibility, explicit remote-provider locking, audit history, and rollback.
Changes
New files
packages/daemon/src/provider-safety.ts— Provider transition detection, audit storage, config validation, and rollback helperspackages/daemon/src/provider-safety.test.ts— Unit tests for validation, rollback, audit persistence, and edge cases (17 tests)packages/daemon/src/routes/misc-routes-provider-safety.test.ts— Serialization, lock-bypass, and integration tests (15 tests)Modified files
packages/core/src/types.ts— AddedallowRemoteProviderstoPipelineExtractionConfigandPipelineV2Configpackages/daemon/src/memory-config.ts— Integrated remote-provider lock at runtime: readsallowRemoteProvidersfrom config, falls back to local provider when lock is active, resets model/endpoint on lockpackages/daemon/src/routes/misc-routes.ts— Provider-safety endpoints and config-save validationdocs/API.md— Documented provider-safety API, rollback, error codes, and audit retentionProvider transition auditing
POST /api/config).daemon/provider-transitions.json)Remote-provider locking
allowRemoteProvidersflag (defaulttrue) gates paid/remote providersvalidateProviderSafetyrejects config saves that select remote providers while the lock is active (400)allowRemoteProviders: trueis allowedallowRemoteProviders(but selects only local providers), the lock is re-injected into the YAML before writing to prevent silent erosionloadPipelineConfigfalls back to local provider and resets model/endpoint when lock is activePipelineExtractionConfig.allowRemoteProviders(nested) andPipelineV2Config.allowRemoteProviders(top-level) both supported, with top-level taking precedence and conflict warning loggedRollback API
POST /api/config/provider-safety/rollback— rolls back the latest provider transition, restoring the previous provideradminpermission in team/hybrid modemodel,endpoint,base_urlfields at both flat (pipelineV2.extractionModel) and nested (pipelineV2.extraction.model) levels to prevent silent failuresloadPipelineConfig'srawobject)rolledBack: truepipelineV2section (structuredRollbackError, not 500)API endpoints
GET /api/config/provider-safety— Returns provider snapshot + transition history + latest risky transition. InternalallowRemoteProvidersExplicitfield is stripped from the response.POST /api/config/provider-safety/rollback— Performs rollback, returns rolled-back entry + new transition entries +isRetryflagPOST /api/config— Now returnsauditErrorfield when audit-write fails after config save succeeds (config is correct but rollback via audit trail is unavailable)validateProviderSafetyandreadProviderSafetySnapshotuse flat-wins precedence for extraction provider, matchingloadPipelineConfigValidation correctness
readProviderSafetySnapshotprovider precedence matchesloadPipelineConfigexactly: flat extraction keys win over nested (line 372 of memory-config.ts: "Flat extraction keys take precedence over nested keys")allowRemoteProvidersresolution matches between validation and runtime: top-level wins, extraction-level as fallbackextractionModel,extractionEndpoint,extractionBaseUrl) resolved atpipelineV2level, not document rootResilience
auditErrorin response without failing the saveType
feat— new user-facing feature (bumps minor)fix— bug fixrefactor— restructure without behavior changechore— build, deps, config, docsperf— performance improvementtest— test coveragePackages affected
@signet/core@signet/daemon@signet/cli/ dashboard@signet/sdk@signet/connector-*@signet/webPR Readiness (MANDATORY)
INDEX.md+dependencies.yaml)Testing
bun testpassesbun run typecheckpasses (pre-existing failures in connector-pi, connector-oh-my-pi)bun run lintpasses (pre-existing formatting issues across 1800+ files)AI disclosure
Assisted-bytags in commits)Review Rounds
Round 1 (12cb6a0): Initial implementation
Round 2 (506fbad): Fixed cost lock silently erased on non-remote config save (BLOCKING) —
preserveLockInYamlre-injectsallowRemoteProviders: falsewhen lock is active and incoming YAML omits it; added admin auth gate on rollback endpoint (WARNING); added explanatory comments for single-flight serialization and rollback source-file fallthrough (2 nitpicks); added 3 regression tests for lock preservationRound 3 (6f9c8cc): Moved
preserveLockInYamltoprovider-safety.tsand exported it so tests exercise the actual production function instead of a local duplicate; merged duplicatemkdtempSyncimport in test file.Round 4 (afeaadc): Fixed null dereference crash in
preserveLockInYamlwhen YAML parses to null/empty (useasRecordguard); addedrecallpermission check onGET /api/config/provider-safetyfor team/hybrid mode; moved single-flight serialization comment to adjacent rollback handler where the variable is used.Round 5 (fec9548): Fixed race condition in single-flight rollback serialization —
ifchanged towhileloop viaacquire()helper so multiple concurrent waiters re-check after resolution; madePipelineV2Config.allowRemoteProvidersoptional to matchPipelineExtractionConfig(breaking change fix); replaced non-atomicwriteFileSyncwithatomicWriteText(temp+rename) inexecuteProviderRollbackfor consistency with audit writes.Round 6 (af627bc): Added admin permission check on
POST /api/configfor guarded config files only (scoped to avoid breaking dashboard markdown saves in local mode); replaced bareas Recordcasts inpreserveLockInYamlwithasRecord()for non-object YAML value safety; replacedresolve!non-null assertion withresolve: (() => void) | undefined+resolve?.()sentinel pattern; documented pre-existingappendProviderTransitionsconcurrent write race as known limitation.Round 7 (e52936b): Added
commentsStripped: trueresponse field whenpreserveLockInYamlstrips YAML comments during lock preservation; documented in API.md. Added conflict-detection warning inreadProviderSafetySnapshotmatchingloadPipelineConfigbehavior when top-level and extractionallowRemoteProvidersconflict. Documented 403 error in API.md for guarded config saves without admin permission in team/hybrid mode.Round 8 (7cb8bad): Fixed case-insensitive filesystem bypass —
GUARDED_CONFIG_FILESnow uses case-insensitive matching (Agent.yamlno longer bypasses admin guard on macOS/Windows). UnifiedatomicWriteJsonandatomicWriteTextinto singleatomicWritehelper with prefix parameter. AddedcommentsStripped: trueto rollback response when retry is detected (provider already rolled back, comments re-stripped). Documented pre-existing concurrent audit write race (no code change needed, already annotated).Round 9 (9ef4123): Added
isRetry: booleantoexecuteProviderRollbackreturn type declaration (was missing, causing TypeScript error atresult.isRetryaccess). Non-standard source fallback inresolveRollbackFilePathevaluated and confirmed intentional (lock-of-locks safety net — documented in existing inline comment).Round 10 (e0db5b3): Fixed
readProviderTransitionsto log non-ENOENT I/O errors instead of silently returning empty array. RemovedensureRecordhelper —applyProviderRollbacknow uses directasRecord()checks so rollback on flat-key-only configs does not create emptyextraction:orsynthesis:sub-blocks. Added synthesis-role rollback end-to-end test. Added regression test for empty sub-block avoidance. Audit concurrent write race acknowledged as pre-existing limitation (no code change).Round 11 (689c8ce): Removed
elsebranch inapplyProviderRollbackthat created asynthesis:block when none existed (preventing silent provider pinning after rollback). RemovedcommentsStrippedfrom rollback response —isRetry(via...resultspread) is the sole retry signal. Removed unusedisRetrylocal variable. DocumentedisRetryin API.md rollback response. Removed redundant?? 'ollama'onresolveExtractionFallbackProvider. Added test for no-synthesis-block rollback. Non-atomicwriteFileSyncon config-save path noted as pre-existing (no change).Round 12 (70edf45):
executeProviderRollbacknow throwsRollbackError(400)whenapplyProviderRollbackproduces no content change (e.g. synthesis rollback on config with no synthesis block) — audit entry is not consumed.preserveLockInYamlreturns content unchanged when incoming config has nomemoryorpipelineV2section, preventing creation of a pipelineV2 block from nothing on partial config updates. Added tests for both cases.markRolledBackinner function and_rollbackSignalscoping noted as style-only (no change).Round 13 (13318ab):
applyProviderRollbacknow throwsRollbackError(400)instead of plainErrorfor missing previous provider and missing pipelineV2 section, preventing misleading 500 responses. API.mdisRetrydocs expanded to cover both retry-after-audit-failure and manual-provider-restore cases. API.md audit truncation wording corrected from "silently dropped" to "dropped with a logged warning".Round 14 (fe37dfb):
resolveRollbackFilePathnow throwsRollbackError(404)when a transition source doesn't match any known config file, instead of silently falling through to a potentially wrong fallback file. Added null-before-resolve ordering comment to single-flight finally block.PipelineExtractionConfig.allowRemoteProvidersredundancy with resolved top-level value noted as out-of-scope for this PR (would require@signet/corebreaking change).Round 15 (): Fixed audit-write-failure retry path — when rollback config write succeeds but audit write fails, the next retry now detects the idempotent YAML state (no-op guard) and marks the audit entry consumed audit-only instead of throwing RollbackError(400). Moved markRolledBack above the no-op guard for correct scoping. Updated comments to reflect new retry semantics. Added regression test for retry-audit-consume path.
Round 16: Fixed two concurrency/correctness issues: (1) Replaced async acquire-then-set single-flight pattern with synchronous
_rollbackInProgressboolean — the oldawait acquire()had an async gap between null-check and signal assignment that allowed concurrent rollbacks under event-loop interleaving. Concurrent requests now get 409 "Rollback already in progress" instead of racing. (2) Replaced raw string equality no-op detection (nextContent === beforeContent) with semantic comparison — always runsapplyProviderRollbackto clear stale model/endpoint/base_url fields, then compares parsed JSON objects to decide if config write is needed.Round 17: Bot flagged R16 no-op path skipped
applyProviderRollback, leaving stale model/endpoint/base_url fields uncleared. Fixed by always runningapplyProviderRollbackfirst (clears stale fields), then comparing parsed JSON objects (JSON.stringify(beforeRoot) === JSON.stringify(nextRoot)) to detect semantic no-op. This preserves stale-field clearing while avoiding unnecessary config writes on retry.Round 18: Bot flagged R17 no-op guard consumed audit entries for all semantic no-ops, including synthesis rollback on configs with no synthesis block. Fixed by adding provider-match validation inside the no-op branch: if no provider exists for the target role, throws 400 ("No {role} configuration found to roll back") without consuming the audit entry. Legitimate retries (provider already matches
entry.from) still consume the entry. Added regression test for absent-role-block 400 case. Updated existing retry test to use proper synthesis block.Round 19: Bot flagged case-sensitive suffix matching in
resolveRollbackFilePath— config saves can record case variants likeAgent.yamlbut the.endsWith()check is case-sensitive, causing 404 on case-insensitive filesystems. Fixed by normalizing bothmatch.sourceand candidate to lowercase before comparison.Round 20: Bot flagged
String(pipeline.extractionProvider ?? "")producing empty string instead of null, preventing??fallthrough to nested provider. Deep investigation confirmed: (1) for extraction,applyProviderRollbackalways setsextractionProvider, soisNoOpis false for nested-only configs — the bug was unreachable but latent. (2) For synthesis,topLevelProvideris always null, so??correctly falls through. Fixed the null semantics anyway: replacedString(x ?? "")with propertypeof === "string" && length > 0guards sotopLevelProvider/blockProviderarenullwhen absent, matching the canonicalreadProviderSafetySnapshotresolution pattern (flatExtraction ?? nestedExtraction). Added regression test for nested-only extraction rollback that verifiesextractionProviderflat key is added.Round 21: Bot flagged rollback-generated audit entries (source:
"api/config/provider-safety/rollback") being selectable as rollback targets.resolveRollbackFilePathcan't resolve that source to a config file → 404 stuck state on subsequent rollbacks. Fixed by excluding rollback-sourced entries from selection (candidate.source !== "api/config/provider-safety/rollback"). Added regression test: rollback succeeds, then second rollback throws 404 (no eligible transitions left, rollback entries skipped).Round 22: Bot flagged case-insensitive match resolving to candidate casing instead of actual filename.
match.sourcecould beAgent.yamlbutjoin(agentsDir, fromSource)usedagent.yaml→ false 404 on case-sensitive filesystems. Fixed by extracting the actual filename frommatch.source(preserving original casing) viamatch.source.slice(-fromSource.length).Round 23: Bot flagged
resolveRollbackFilePathdidn't exclude rollback-sourced entries — R21 only fixedexecuteProviderRollback. The route callsresolveRollbackFilePathfirst, so rollback-sourced entries could still be selected there, producing 404 before the other filter runs. Fixed by extracting sharedisRollbackEligible()predicate used by both functions.Round 24: Bot flagged malformed JSON body being swallowed as
{}on a state-mutating admin endpoint, allowing accidental rollback. Now returns 400 for invalid JSON instead of defaulting to empty body.