Skip to content

Sync dev to main#170

Closed
tis24dev wants to merge 41 commits intomainfrom
dev
Closed

Sync dev to main#170
tis24dev wants to merge 41 commits intomainfrom
dev

Conversation

@tis24dev
Copy link
Owner

@tis24dev tis24dev commented Mar 10, 2026

  • test(orchestrator): maximize coverage for network_apply
  • ci: bump the actions-updates group across 1 directory with 2 updates (ci: bump the actions-updates group across 1 directory with 2 updates #164)
  • backup/pve: clarify datastore skip logs
  • feat(webhook): optional Discord content fallback for embed-only messages
  • Revert "feat(webhook): optional Discord content fallback for embed-only messages"
  • Update go.mod
  • Harden restore decisions against untrusted backup metadata
  • Enforce staged backup checksum verification in restore/decrypt flows
  • fix(orchestrator): preserve staged network symlinks
  • fix(orchestrator): harden restore path validation against broken symlink escapes
  • fix(backup): sanitize datastore and storage path segments
  • fix(io): bound timed fs and input goroutine buildup
  • fix(orchestrator): preserve operational restore errors during path validation
  • Separate PBS override scan roots from real datastore identities.
  • fix(orchestrator): resolve symlinked base dirs before validating relative links
  • fix(safefs): harden timeout test cleanup and limiter capture
  • fix(input): preserve completed inflight reads across retries
  • fix(pbs): disambiguate datastore output keys across CLI and overrides
  • test(input): assert inflight cleanup after completed retry consumption
  • fix(safefs): return TimeoutError when deadline expires before run
  • fix(restore): validate compatibility before full restore fallback
  • fix(orchestrator): bound restore metadata reads
  • fix(orchestrator): validate staged network symlink targets
  • fix(orchestrator): validate symlink targets in firewall staged sync
  • fix(orchestrator): recognize full Proxmox system names
  • fix(orchestrator): validate raw backup checksums during discovery
  • test: add PBS datastore status filename collision regression coverage
  • fix(pbs): always apply datastore path overrides without CLI discovery

Summary by CodeRabbit

  • New Features

    • Archive analysis during restore to auto-detect backup type, cluster payload, and drive restore decisions.
    • Checksum/integrity verification for discovered and staged backups.
    • Filesystem-based PBS namespace/datastore discovery.
  • Bug Fixes

    • Stricter path and symlink safety during restore to prevent escapes.
    • Improved datastore override handling with deterministic, collision-safe output keys.
    • More robust checksum normalization and validation.
  • Documentation

    • Clarified PBS datastore path configuration and restore inventory behavior.
  • Chores

    • Updated Go toolchain and CI release actions.

tis24dev and others added 28 commits February 27, 2026 15:30
- Add network_apply_additional_test.go with extensive tests for rollback (arm/disarm), NIC repair CLI (overrides/conflicts), snapshot IP parsing, command selection, and error paths.
- Use FakeFS/FakeCommandRunner plus PATH stubs to deterministically exercise both success and failure branches.
- Bring network_apply.go coverage to ~99% with no production code changes.
…164)

Bumps the actions-updates group with 2 updates in the / directory: [goreleaser/goreleaser-action](https://github.com/goreleaser/goreleaser-action) and [actions/attest-build-provenance](https://github.com/actions/attest-build-provenance).


Updates `goreleaser/goreleaser-action` from 6 to 7
- [Release notes](https://github.com/goreleaser/goreleaser-action/releases)
- [Commits](goreleaser/goreleaser-action@v6...v7)

Updates `actions/attest-build-provenance` from 3 to 4
- [Release notes](https://github.com/actions/attest-build-provenance/releases)
- [Changelog](https://github.com/actions/attest-build-provenance/blob/main/RELEASE.md)
- [Commits](actions/attest-build-provenance@v3...v4)

---
updated-dependencies:
- dependency-name: goreleaser/goreleaser-action
  dependency-version: '7'
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: actions-updates
- dependency-name: actions/attest-build-provenance
  dependency-version: '4'
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: actions-updates
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Make PVE datastore skip messages user-friendly: log disabled storages as SKIP,
offline storages as actionable WARNING, and always emit DEBUG skip details with
active/enabled/status and an explicit reason.
Fix detected datastores report header to match output column order.
Extend unit tests to cover runtime flag parsing and the new skip logging, and
make Makefile coverage targets honor the go.mod toolchain via GOTOOLCHAIN.
Adds per-endpoint opt-in Discord content text (WEBHOOK_<ENDPOINT>_DISCORD_CONTENT_ENABLED/..._DISCORD_CONTENT) to avoid “empty” messages when embeds aren’t rendered. Default remains unchanged (embed-only). Includes 2000-char truncation, docs/template updates, and tests.
Derive restore compatibility, cluster gating, and hostname warnings from archive-backed facts instead of candidate manifest metadata. Add restore archive analysis for internal backup metadata and category-based fallback, update restore planning and compatibility checks to consume trusted decision inputs, and add regression tests covering manifest spoofing for backup type and cluster mode.
Require checksum verification for staged backup archives before any restore or decrypt operation proceeds. Add strict SHA256 parsing and normalization, enforce checksum source agreement between sidecar and manifest metadata, reject raw backup candidates with no verifiable integrity source, centralize shared prepare-path verification for bundle and raw inputs, preserve source artifact checksum separately from the prepared plain archive checksum, and update regression tests to cover missing, mismatched, and legacy checksum scenarios.
Use Lstat in network staged apply so symlink entries are handled as symlinks instead of being dereferenced and copied as regular file contents. Reject symlinked staged network directories, extend the FS abstraction with Lstat, and add regression tests covering preserved symlinks and invalid symlinked stage roots.
…ink escapes

Replace the restore path safety checks with a shared FS-aware resolver that walks paths component by component using Lstat and Readlink, correctly handling missing tails and rejecting intermediate symlink escapes. Apply the new validation to regular restore, safety restore, symlink and hardlink handling, and add regression tests for broken escaping symlinks, symlink loops, and symlinked restore roots.
Introduce a shared path-safe key for PBS datastore names and PVE storage names when building collector output paths, preventing path traversal and filename collisions from raw config values. Keep raw names unchanged in metadata and command invocations, update all affected PBS/PVE collector call sites, and add regression tests covering unsafe names while preserving the existing layout for already-safe names.
Prevent unbounded goroutine accumulation in timeout/cancel wrappers by limiting in-flight safefs operations and reusing a single in-flight read per reader/file descriptor in the input package. Keep public APIs unchanged, preserve current timeout semantics, and add regression coverage for limiter saturation, repeated timeouts, and race-safe cleanup of timed operations.
…lidation

Teach the restore path resolver to distinguish security violations from local operational failures, so only real root-escape conditions are reported as illegal paths. Keep broken-symlink and traversal hardening intact, allow in-root permission and ENOTDIR failures to surface through the normal restore flow, and add regression tests covering the new security vs operational error classification.
- add explicit PBS datastore metadata for source, CLI identity and output key
- derive stable path-based output keys for PBS_DATASTORE_PATH overrides
- keep existing output names for real CLI-discovered PBS datastores
- skip datastore show/status CLI calls for override-only entries
- use filesystem-only namespace discovery for override paths
- keep PXAR outputs isolated for colliding override basenames
- preserve distinct override entries in PBS datastore inventory
- exclude override-only inventory entries from datastore.cfg fallback restore
- add regression coverage for basename collisions and restore safety
- update PBS_DATASTORE_PATH documentation to reflect scan-root semantics
…tive links

resolvePathRelativeToBaseWithinRootFS now canonicalizes baseDir by resolving existing symlinks before validating relative link targets. This closes the bypass where an archive could create an apparently harmless parent symlink and then install a relative symlink that escapes destRoot once materialized by the kernel.
Make safefs timeout tests failure-safe by draining blocked workers from cleanup and separating worker cleanup from global state restore. Also capture limiter and fs operation hooks locally in runLimited/Stat/ReadDir/Statfs to avoid races with global resets while async work is still in flight.
Keep completed in-flight line and password reads attached to their state until
a caller consumes the buffered result, instead of clearing the inflight state
from the producer goroutine.

This fixes the race where a read could complete after a timeout, be cleaned up
before the next retry started, and leave the completed input unreachable.
Add deterministic tests for both retry-while-pending and completion-before-retry
cases to lock the behavior down.
Add a PBS-specific output-key resolver to guarantee stable, unique datastore filenames and directories across auto-detected datastores and PBS_DATASTORE_PATH overrides. This fixes real collisions between CLI datastore names and path-derived override keys, makes pbsDatastore.pathKey() source-aware when OutputKey is unset, and keeps inventory output_key values aligned with the actual files written by PBS collectors. Includes regression tests for CLI-vs-override collisions, override fallback behavior, and inventory consistency.
The new tests verify that line and password inflight reads remain attached
after timeout until the completed result is reused, and that state.inflight
is cleared immediately after that consumer receives the result.
update runLimited to re-check ctx.Err() after effectiveTimeout()
avoid calling run() when effectiveTimeout() returns 0 because the context deadline already elapsed
preserve the no-timeout path only for genuinely disabled timeouts
add a regression test for the expired-deadline edge case in internal/safefs
When restore archive analysis fails in the UI workflow, do not jump
immediately to full restore.
Build a best-effort RestoreDecisionInfo from the manifest, run the
existing compatibility check first, and only then fall back to
runFullRestoreWithUI.
Also add a regression test covering the analysis-error path to ensure
the compatibility warning is still shown before the full restore
fallback executes.
Reject oversized backup_metadata.txt entries during restore archive inspection by reading through a fixed limit before parsing. This prevents unbounded memory allocation from crafted archives and ensures truncated metadata caused by limit overflow is treated as invalid rather than partially trusted. Add regression tests covering oversized metadata rejection and the public restore analysis path.
Validate symlink targets during staged network apply before recreating them
under the destination tree. Reuse the existing path-security resolver to
ensure targets stay within the allowed destination root, rewrite safe
absolute targets to relative links, and reject absolute or relative targets
that escape the subtree. Add regression tests for safe rewrites and escaping
symlink targets.
Validate symlink targets in syncDirExact before recreating them under the
destination subtree. Reuse the existing path-security resolver to ensure
firewall and SDN staged symlinks stay within the allowed destination root,
rewrite safe absolute targets to relative links, and reject absolute or
relative targets that escape the subtree. Add regression tests for safe
rewrites and escaping symlink targets.
Verify and fix parseSystemTypeString in compatibility.go.

The parser now recognizes space-separated and full-name variants such as
"Proxmox VE", "Proxmox Backup", and "Proxmox Backup Server", in addition
to the existing acronym and hyphenated forms.

Add regression tests to ensure these values map correctly to
SystemTypePVE and SystemTypePBS.
Validate and normalize manifest and sidecar SHA256 values before
accepting local or rclone raw backup candidates.

Reject candidates when the manifest checksum is malformed, the .sha256
sidecar cannot be parsed, or both checksum sources disagree. Persist the
normalized checksum and its source on the candidate so downstream staged
integrity verification only works with pre-validated values.

Add coverage for local and rclone discovery, malformed and conflicting
checksums, and reuse of the candidate integrity expectation during
staged verification.
Adds a regression test for collectPBSCommands() covering colliding PBS datastore output keys. The test verifies that assignUniquePBSDatastoreOutputKeys() disambiguates colliding datastore status filenames and that two distinct datastore_*_status.json files are written with the correct datastore-specific content.
Make PBS datastore enumeration best-effort in getDatastoreList so configured PBSDatastorePaths overrides are still appended when proxmox-backup-manager is missing, the datastore list command fails, or its JSON output is invalid. Preserve context cancellation as a fatal error, keep unique output key assignment, and add regression tests for missing CLI, command failure, and parse failure fallback cases.
@github-actions
Copy link

github-actions bot commented Mar 10, 2026

Dependency Review

The following issues were found:
  • ✅ 0 vulnerable package(s)
  • ✅ 0 package(s) with incompatible licenses
  • ✅ 0 package(s) with invalid SPDX license definitions
  • ⚠️ 2 package(s) with unknown licenses.
See the Details below.

License Issues

.github/workflows/release.yml

PackageVersionLicenseIssue Type
actions/attest-build-provenance4.*.*NullUnknown License
goreleaser/goreleaser-action7.*.*NullUnknown License
Allowed Licenses: MIT, Apache-2.0, BSD-2-Clause, BSD-3-Clause, ISC, LicenseRef-scancode-google-patent-license-golang

OpenSSF Scorecard

PackageVersionScoreDetails
actions/actions/attest-build-provenance 4.*.* UnknownUnknown
actions/goreleaser/goreleaser-action 7.*.* 🟢 5
Details
CheckScoreReason
Binary-Artifacts🟢 10no binaries found in the repo
Maintained🟢 1016 commit(s) and 1 issue activity found in the last 90 days -- score normalized to 10
Code-Review🟢 3Found 5/16 approved changesets -- score normalized to 3
Packaging⚠️ -1packaging workflow not detected
Dangerous-Workflow🟢 10no dangerous workflow patterns detected
CII-Best-Practices⚠️ 0no effort to earn an OpenSSF best practices badge detected
Token-Permissions⚠️ 0detected GitHub workflow tokens with excessive permissions
License🟢 10license file detected
Fuzzing⚠️ 0project is not fuzzed
Signed-Releases⚠️ -1no releases found
Branch-Protection⚠️ -1internal error: error during branchesHandler.setup: internal error: some github tokens can't read classic branch protection rules: https://github.com/ossf/scorecard-action/blob/main/docs/authentication/fine-grained-auth-token.md
Security-Policy⚠️ 0security policy file not detected
SAST⚠️ 0SAST tool is not run on all commits -- score normalized to 0
Pinned-Dependencies🟢 5dependency not pinned by hash detected -- score normalized to 5

Scanned Files

  • .github/workflows/release.yml

@coderabbitai
Copy link

coderabbitai bot commented Mar 10, 2026

📝 Walkthrough

Walkthrough

Widespread refactor: adds checksum normalization and integrity verification, root-aware path-security and symlink validation, deterministic PBS datastore output keys, centralized bundle preparation, inflight input deduplication, filesystem operation limiting, and extensive tests across orchestrator, backup, PBS, PVE, safefs, and input packages.

Changes

Cohort / File(s) Summary
CI & Tooling
/.github/workflows/release.yml, Makefile, go.mod, internal/config/templates/backup.env
Bump GoReleaser and attest actions, update go toolchain, add GOTOOLCHAIN extraction and make coverage targets respect it, and clarify PBS_DATASTORE_PATH comments.
Documentation
docs/CONFIGURATION.md, docs/RESTORE_GUIDE.md
Clarify PBS_DATASTORE_PATH semantics and note PBS_DATASTORE_PATH-derived inventory entries are diagnostic-only and not regenerated into datastore.cfg.
Checksum Utilities
internal/backup/checksum.go, internal/backup/checksum_*_test.go
Add NormalizeChecksum and ParseChecksumData; integrate normalization into VerifyChecksum and legacy loaders; update tests to use deterministic checksum helpers.
Collector helpers
internal/backup/collector.go, internal/backup/collector_helpers_extra_test.go
Add collectorPathKey for safe, collision-resistant path keys; include unit test for collision behavior.
PBS: model, assignment & collection
internal/backup/collector_pbs_datastore.go, internal/backup/collector_pbs.go, internal/backup/collector_pbs_datastore_inventory.go, internal/backup/collector_pbs_*.go, internal/pbs/namespaces.go
Expand pbsDatastore/definition/inventory with Source/CLIName/NormalizedPath/OutputKey; add deterministic unique OutputKey assignment and path-key utilities; adapt collection to use pathKey/OutputKey and skip CLI status for override-only datastores; add DiscoverNamespacesFromFilesystem helper.
PBS tests & collision coverage
internal/backup/collector_pbs_commands_coverage_test.go, internal/backup/collector_pbs_*_test.go, internal/pbs/namespaces_test.go
Large set of tests covering override handling, path-key separation for PXAR/namespace outputs, CLI vs override collisions, and filesystem namespace discovery.
PVE collector & tests
internal/backup/collector_pve.go, internal/backup/collector_pve_parse_test.go, internal/backup/collector_pve_test.go
Introduce pathKey() via collectorPathKey; switch output filenames/directories to path-safe keys; improve skip/ignore logging and runtime flag parsing; add tests for unsafe names and user-friendly skip messages.
Input coordination
internal/input/input.go, internal/input/input_test.go
Replace goroutine-per-call reads with per-reader/FD inflight deduplication state (line/password), preserve cancellation semantics, and add extensive tests simulating blocking reads and timeouts.
Path security subsystem
internal/orchestrator/path_security.go, internal/orchestrator/path_security_test.go, internal/orchestrator/deps.go, internal/orchestrator/deps_test.go
Add FS-aware, root-bound path resolution API (resolvePathWithinRootFS, resolvePathRelativeToBaseWithinRootFS, etc.), classify security vs operational errors, add Lstat to FS interface and tests.
Restore safety & extraction
internal/orchestrator/backup_safety.go, internal/orchestrator/restore.go, internal/orchestrator/*restore*_test.go
Switch extraction and symlink/hardlink handling to FS-backed resolution helpers, perform pre/post symlink validations within root FS, and add many path-security tests.
Restore decision & analysis
internal/orchestrator/restore_decision.go, internal/orchestrator/restore_decision_test.go
Add AnalyzeRestoreArchive and RestoreDecisionInfo (source, backup type, cluster payload, hostname) to derive restore decisions from archive contents and internal metadata; include tests.
Restore workflow integration
internal/orchestrator/restore_workflow_ui.go, internal/orchestrator/selective.go, internal/orchestrator/restore_plan.go, internal/orchestrator/restore_workflow_*.go
Use archive-analysis as primary source for categories/decision (with manifest fallback); PlanRestore now takes clusterBackup bool; update UI wiring and fallbacks.
Integrity & decrypt flow
internal/orchestrator/decrypt_integrity.go, internal/orchestrator/decrypt_*.go, internal/orchestrator/decrypt_*_test.go
Introduce stagedIntegrityExpectation and verification: resolve expectations from manifest/candidate/checksum files, verify staged archive integrity, add Integrity field to candidates/prepared bundles, and add unit tests for conflict and verification paths.
Bundle preparation refactor
internal/orchestrator/decrypt_prepare_common.go, internal/orchestrator/decrypt_tui.go, internal/orchestrator/decrypt_workflow_*.go
Extract preparePlainBundleCommon with archiveDecryptFunc callback used by TUI/UI flows; centralize staging, integrity verification, decryption, plain-archive resolution, and checksum population; update tests to compute checksums from payload bytes.
Backup discovery integrity
internal/orchestrator/backup_sources.go, internal/orchestrator/backup_sources_test.go
Compute manifest checksums, read local/rclone sidecar checksums, attach Integrity to decryptCandidate, track integrity_missing/unavailable counters, and extend tests for checksum fetch/validation scenarios.
Compatibility API
internal/orchestrator/compatibility.go, internal/orchestrator/compatibility_test.go
Add parseSystemTypeString helper; refactor DetectBackupType to use it; change ValidateCompatibility signature to accept (currentSystem, backupType SystemType) and update tests.
Network staged apply & symlink safety
internal/orchestrator/network_staged_apply.go, internal/orchestrator/network_staged_apply_test.go, internal/orchestrator/network_apply_additional_test.go
Add root-aware overlay copy variants, symlink validation/rewriting and post-creation verification within root, explicit errors for unsupported staged types, and many tests for safe/unsafe symlink behaviors and cleanup.
PBS staged apply inventory
internal/orchestrator/pbs_staged_apply.go, internal/orchestrator/pbs_staged_apply_additional_test.go
Add Origin and CLIName to staged inventory entries; ignore override-origin entries when generating datastore.cfg; add test ensuring override entries are skipped.
Safefs concurrency limiting
internal/safefs/safefs.go, internal/safefs/safefs_test.go
Introduce operationLimiter and generic runLimited[T] wrapper to cap concurrent FS operations and centralize timeout handling; refactor Stat/ReadDir/Statfs to use limiter and add concurrency/deadline tests.
Extensive tests added
internal/orchestrator/*_test.go, internal/safefs/*_test.go, internal/input/*_test.go, internal/backup/*_test.go, many others
Massive expansion of unit and integration-style tests across orchestrator, network, restore, integrity, safefs, input, and backup collectors to exercise new behaviors, security checks, and edge cases.

Sequence Diagram(s)

sequenceDiagram
    participant Archive as Restore Archive
    participant Analysis as AnalyzeRestoreArchive
    participant Metadata as MetadataParser
    participant Categories as CategoryDetector
    participant Workflow as RestoreWorkflow

    Archive->>Analysis: provide archive path + logger
    activate Analysis
    Analysis->>Archive: open & decompress archive
    Analysis->>Metadata: read internal metadata file
    Metadata-->>Analysis: parsed metadata (type, cluster, hostname)
    Analysis->>Categories: scan archive paths
    Categories-->>Analysis: available categories
    Analysis->>Workflow: return categories + RestoreDecisionInfo (or error → fallback)
    deactivate Analysis
Loading
sequenceDiagram
    participant Client as TUI/UI
    participant Common as preparePlainBundleCommon
    participant Stage as Staging FS
    participant Integrity as IntegrityChecker
    participant Decrypt as DecryptCallback

    Client->>Common: request bundle prep (candidate, logger, decrypt callback)
    activate Common
    Common->>Stage: create workdir & stage artifacts
    Common->>Integrity: verifyStagedArchiveIntegrity(staged archive)
    Integrity-->>Common: verified checksum / error
    Common->>Decrypt: invoke decrypt callback if needed
    Decrypt-->>Common: decrypted/plain archive
    Common->>Common: compute SHA256, update manifest, return preparedBundle
    deactivate Common
Loading
sequenceDiagram
    participant Collector as DatastoreCollector
    participant Sources as CLI+Overrides
    participant Keys as OutputKeyAssigner
    participant Writer as OutputWriter

    Collector->>Sources: enumerate CLI datastores + configured overrides
    Sources-->>Collector: merged datastore list
    Collector->>Keys: assignUniquePBSDatastoreOutputKeys(list)
    Keys-->>Collector: datastores with unique OutputKey
    Collector->>Writer: for each datastore, choose actions by Source/CLIName
    alt CLI-origin
        Collector->>Writer: run status/namespace commands and write outputs under OutputKey
    else Override-origin
        Collector->>Writer: skip status command, write path-derived outputs under distinct OutputKey
    end
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Poem

🐰 I hopped through checksums, keys, and roots so deep,
I herded symlinks safe, where no paths creep,
Datastores wear neat, collision-proof tags,
Bundles verified, no secret nags,
Rejoice — the backup rabbit guards your sleep! 🥕

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch dev

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
internal/orchestrator/decrypt_workflow_ui.go (1)

177-184: ⚠️ Potential issue | 🔴 Critical

Guard cand before logging its fields.

Line 180 dereferences cand.Source and cand.IsRclone before preparePlainBundleCommon gets a chance to validate input. A nil candidate now panics instead of returning the normal error path.

Suggested fix
 func preparePlainBundleWithUI(ctx context.Context, cand *decryptCandidate, version string, logger *logging.Logger, ui interface {
 	PromptDecryptSecret(ctx context.Context, displayName, previousError string) (string, error)
 }) (bundle *preparedBundle, err error) {
+	if cand == nil {
+		return nil, fmt.Errorf("decrypt candidate not available")
+	}
 	done := logging.DebugStart(logger, "prepare plain bundle (ui)", "source=%v rclone=%v", cand.Source, cand.IsRclone)
 	defer func() { done(err) }()
 	return preparePlainBundleCommon(ctx, cand, version, logger, func(ctx context.Context, encryptedPath, outputPath, displayName string) error {
 		return decryptArchiveWithSecretPrompt(ctx, encryptedPath, outputPath, displayName, ui.PromptDecryptSecret)
 	})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/orchestrator/decrypt_workflow_ui.go` around lines 177 - 184,
preparePlainBundleWithUI currently dereferences cand (cand.Source and
cand.IsRclone) inside logging.DebugStart which can panic if cand is nil; first
guard cand by checking if cand == nil and return the appropriate error before
calling logging.DebugStart (or assign safe local variables like source/isRclone
after the nil check), then call logging.DebugStart and proceed to invoke
preparePlainBundleCommon and decryptArchiveWithSecretPrompt unchanged; reference
preparePlainBundleWithUI, cand.Source, cand.IsRclone, preparePlainBundleCommon,
and decryptArchiveWithSecretPrompt to locate the fix.
internal/orchestrator/restore_workflow_ui.go (1)

93-117: ⚠️ Potential issue | 🔴 Critical

runFullRestoreWithUI is not a safe fallback here.

Line 117 routes into runFullRestoreWithUI, but that helper still calls extractPlainArchive on Line 952 regardless of dryRun, and this return skips the safety/cluster/PBS preflight later in runRestoreWorkflowWithUI. An archive-analysis failure can therefore turn a guarded selective restore into an unguarded full extract against the live system. Please abort here until the fallback path honors dry-run and runs the same preflight safeguards.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/orchestrator/restore_workflow_ui.go` around lines 93 - 117, The
current fallback directly calls runFullRestoreWithUI which can call
extractPlainArchive regardless of cfg.DryRun and skips the preflight logic in
runRestoreWorkflowWithUI; change this to abort the restore instead of invoking
runFullRestoreWithUI when fallbackToFullRestore is true by returning
ErrRestoreAborted (i.e., replace the runFullRestoreWithUI call with a return
ErrRestoreAborted); separately, make a follow-up change to ensure
runFullRestoreWithUI is updated to honor cfg.DryRun and to invoke the same
preflight/cluster/PBS checks (or a factored preflight helper used by
runRestoreWorkflowWithUI) before calling extractPlainArchive.
🧹 Nitpick comments (6)
internal/input/input_test.go (2)

241-248: Relax the sub-50ms deadlines to reduce CI flakiness.

These tests are validating timeout semantics, but 25ms/50ms windows are tight enough that scheduler jitter can make them intermittently fail on busy runners. I’d lift these into shared test constants with more headroom so failures stay behavioral rather than timing-sensitive.

Example cleanup
+const (
+	testContextTimeout = 150 * time.Millisecond
+	testWaitTimeout    = 2 * time.Second
+)
+
-ctx1, cancel1 := context.WithTimeout(context.Background(), 25*time.Millisecond)
+ctx1, cancel1 := context.WithTimeout(context.Background(), testContextTimeout)

Also applies to: 332-333, 354-365, 411-416, 449-466, 515-523, 551-555, 587-594

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/input/input_test.go` around lines 241 - 248, The test
TestReadLineWithContext_DeadlineReturnsDeadlineExceeded (and other
timing-sensitive tests) use very short deadlines like context.WithTimeout(...,
50*time.Millisecond) which causes CI flakiness; change these to use a shared
test timeout constant (e.g., testTimeout) with a larger value (e.g., 200-300ms)
and replace direct 25ms/50ms literals with that constant wherever used
(including similar tests referenced in the comment) so time windows have more
headroom while preserving the same timeout semantics.

346-351: Register unblock cleanup when the blocking fixtures are created.

These tests leave goroutines parked behind release/finish until the very end. If an earlier t.Fatalf fires first, those goroutines stay blocked and can bleed into later tests or race runs. Please move the unblocks into t.Cleanup right after the channels are created.

Example cleanup
 src := &blockingLineReader{
 	release:  make(chan struct{}),
 	finish:   make(chan struct{}),
 	returned: make(chan struct{}, 1),
 	payload:  "hello\n",
 }
+t.Cleanup(func() {
+	close(src.release)
+	close(src.finish)
+})

Also applies to: 404-408, 437-447, 505-513, 544-548, 575-583

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/input/input_test.go` around lines 346 - 351, The blocking test
fixtures create a blockingLineReader with channels (release, finish, returned)
and start goroutines that can remain blocked if the test fatals; immediately
after constructing the fixture (the blockingLineReader in input_test.go),
register a t.Cleanup that unblocks those goroutines by closing or otherwise
signaling the release and finish channels (and draining/closing returned if
needed) so they never remain parked across tests; update all similar fixtures
(the other blockingLineReader instances mentioned) to add the same t.Cleanup
right after creation to ensure cleanup on test failure.
internal/orchestrator/restore_errors_test.go (1)

801-802: Add a dedicated lstatErr injection hook.

ErrorInjectingFS.Lstat now satisfies the interface, but it still can't force Lstat failures the way this helper already can for OpenFile, Readlink, Link, and Symlink. That makes the new Lstat-based restore error paths harder to cover with this test double.

Suggested helper update
 type ErrorInjectingFS struct {
 	base        FS
 	mkdirAllErr error
 	openFileErr error
 	symlinkErr  error
 	readlinkErr error
 	linkErr     error
+	lstatErr    error
 }
 
-func (f *ErrorInjectingFS) Lstat(path string) (os.FileInfo, error) { return f.base.Lstat(path) }
+func (f *ErrorInjectingFS) Lstat(path string) (os.FileInfo, error) {
+	if f.lstatErr != nil {
+		return nil, f.lstatErr
+	}
+	return f.base.Lstat(path)
+}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/orchestrator/restore_errors_test.go` around lines 801 - 802, The
test double ErrorInjectingFS lacks a dedicated injection hook for Lstat; add a
new field (e.g. lstatErr error) to ErrorInjectingFS and modify the Lstat method
to check and return that injected error when non-nil (mirroring how
OpenFile/Readlink/Link/Symlink use their respective error hooks), so tests can
force Lstat failures via the lstatErr hook.
internal/orchestrator/compatibility.go (1)

64-79: Redundant case clause in switch statement.

Line 74 checks for "proxmox backup server", but this will never be reached because line 73 already matches "proxmox backup" which is a substring of "proxmox backup server". The check on line 74 is dead code.

♻️ Suggested fix
 	case strings.Contains(normalized, "pbs"),
 		strings.Contains(normalized, "proxmox-backup"),
-		strings.Contains(normalized, "proxmox backup"),
-		strings.Contains(normalized, "proxmox backup server"):
+		strings.Contains(normalized, "proxmox backup"):
 		return SystemTypePBS
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/orchestrator/compatibility.go` around lines 64 - 79, In
parseSystemTypeString, the switch case for SystemTypePBS contains a redundant
strings.Contains check for "proxmox backup server" because "proxmox backup"
already matches that substring; remove the "proxmox backup server" clause from
the case list for SystemTypePBS (leaving the checks for "pbs", "proxmox-backup",
and "proxmox backup") so the dead code is eliminated while preserving the
intended behavior.
internal/orchestrator/decrypt_prepare_common.go (1)

23-31: Keep the downloaded bundle path in a local variable.

This mutates the caller-owned candidate to a temp path that cleanup() later deletes. After an error, the caller is left holding a dead local path instead of the original remote reference.

♻️ Suggested refactor
-	var rcloneCleanup func()
+	var (
+		rcloneCleanup func()
+		bundlePath    = cand.BundlePath
+	)
 	if cand.IsRclone && cand.Source == sourceBundle {
 		logger.Debug("Detected rclone backup, downloading...")
 		localPath, cleanup, err := downloadRcloneBackup(ctx, cand.BundlePath, logger)
 		if err != nil {
 			return nil, fmt.Errorf("failed to download rclone backup: %w", err)
 		}
 		rcloneCleanup = cleanup
-		cand.BundlePath = localPath
+		bundlePath = localPath
 	}
@@
-		logger.Info("Extracting bundle %s", filepath.Base(cand.BundlePath))
-		staged, err = extractBundleToWorkdirWithLogger(cand.BundlePath, workDir, logger)
+		logger.Info("Extracting bundle %s", filepath.Base(bundlePath))
+		staged, err = extractBundleToWorkdirWithLogger(bundlePath, workDir, logger)

Also applies to: 60-61

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/orchestrator/decrypt_prepare_common.go` around lines 23 - 31, The
code mutates the caller-owned cand.BundlePath to a temp localPath returned by
downloadRcloneBackup, which leaves the caller holding a deleted path after
cleanup; instead, keep the downloaded path in a local variable (e.g. localPath)
and set rcloneCleanup to cleanup without overwriting cand.BundlePath, or if you
must use cand for downstream logic, store the original path in a temp
(origBundlePath) and restore cand.BundlePath after cleanup/failure; update the
rclone handling around downloadRcloneBackup, rcloneCleanup, and cleanup
invocation so that cand.BundlePath is never permanently replaced by the
transient localPath (same change for the other occurrence around lines 60-61).
internal/orchestrator/network_apply_additional_test.go (1)

1631-1708: Isolate the rollbackNetworkFilesNow subtests.

These cases reuse the same FakeFS and fixed 20260201_123456 paths, and the “marker remove error is non-fatal” branch intentionally leaves the pending marker behind. The later branches then only pass because the implementation overwrites those files, so the coverage is order-dependent.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/orchestrator/network_apply_additional_test.go` around lines 1631 -
1708, Tests share a single FakeFS and fixed timestamp which causes
order-dependent interference; for each t.Run create and use a fresh FakeFS (call
NewFakeFS and set restoreFS to it, with t.Cleanup removing its Root) and reset
restoreCmd as needed so each subtest gets an isolated filesystem and command
runner; ensure tests that expect a missing marker (like the "script write error
removes marker" case) point at their own FakeFS instance and that
removeFailFS/writeFailFS wrappers are constructed around that per-test FakeFS;
keep the FakeTime usage if fixed timestamps are required but do not reuse the
same FakeFS between subtests.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/backup/collector_pbs_datastore_inventory.go`:
- Around line 499-501: When merging datastore definitions ensure CLI path wins:
change the assignment logic around entry.Path so that when path (the
CLI-provided path) is non-empty it is used if entry.Path is empty OR origin ==
pbsDatastoreSourceCLI; i.e. prefer the CLI path from the runtime source
(pbsDatastoreSourceCLI) instead of leaving an existing config path. Update the
block that currently checks entry.Path == "" && path != "" to use the condition
(path != "" && (entry.Path == "" || origin == pbsDatastoreSourceCLI)) so
entry.Path is overwritten for CLI-originated merges (or alternatively record
both paths explicitly if you need to preserve the config value).

In `@internal/backup/collector_pbs_datastore.go`:
- Around line 779-795: The loop is accepting relative PBS_DATASTORE_PATH values
because normalizePBSDatastorePath only cleans the string; enforce that the
normalized path is absolute before adding to existing/datastores: after calling
normalizePBSDatastorePath(normalized) check path/filepath.IsAbs(normalized) (or
strings.HasPrefix(normalized, "/")) and if not, skip this override (or log a
warning) instead of appending; keep using buildPBSOverrideDisplayName,
buildPBSOverrideOutputKey, pbsDatastore and the existing map as before for valid
absolute paths.

In `@internal/input/input.go`:
- Around line 125-151: The code currently holds state.mu across the blocking
select, which prevents other callers from observing their ctx.Done(); fix by:
under state.mu lock create or read the local inflight (the *lineInflight
assigned to state.inflight) and then immediately unlock before doing the select
on ctx.Done() and inflight.done; after the select returns and before mutating
shared state (e.g. clearing state.inflight), re-acquire state.mu and only then
set state.inflight = nil if it still equals the captured inflight; apply the
same pattern for the other similar function range (lines manipulating
state.inflight, lineInflight, inflight.done, and reader.ReadString) so the lock
is not held while waiting.

In `@internal/orchestrator/backup_safety.go`:
- Around line 398-400: The current code treats any error from
resolvePathRelativeToBaseWithinRootFS as a root-escape and continues; instead
check the error with isPathSecurityError(err) and only log-and-continue for
security errors (keep the existing logger.Warning "Skipping symlink %s -> %s:
target escapes root" behavior for those). For non-security/operational errors
(permission/readlink/non-directory), surface them as restore failures:
log/return an error (or propagate) so rollback stops or reports failure rather
than silently dropping the symlink. Apply the same conditional handling for both
uses of resolvePathRelativeToBaseWithinRootFS (the occurrence around
target/linkTarget and the later occurrence).

In `@internal/orchestrator/backup_sources.go`:
- Around line 267-283: The checksum probe is using the leftover itemCtx (and its
cancel) so inspectRcloneChecksumFile(itemCtx, ...) can time out prematurely;
create a fresh context for the checksum fetch using context.WithTimeout(ctx,
timeout) (or a clearly named variable like checksumCtx, checksumCancel) and pass
that to inspectRcloneChecksumFile, call checksumCancel() when finished (defer
immediately after creating it) and avoid reusing the earlier itemCtx/cancel for
this call before you call resolveIntegrityExpectationValues; keep
resolveIntegrityExpectationValues using the original manifestChecksum and ensure
you still call the earlier cancel where appropriate.
- Around line 501-510: The checksum parsing functions read entire .sha256 files
into memory; update parseLocalChecksumFile and inspectRcloneChecksumFile to cap
the input before parsing by reading only a fixed small amount (e.g., limit to a
few KB or read only the first line) instead of buffering the whole file or using
CombinedOutput(), then pass that capped bytes/line to backup.ParseChecksumData;
ensure you handle truncated reads and return a clear error if the file exceeds
the cap or has no newline so the parser still receives a safe, bounded input.

In `@internal/orchestrator/decrypt_prepare_common.go`:
- Around line 79-86: Trim whitespace from manifestCopy.EncryptionMode before
lowercasing and using it to decide decryption: replace the current assignment of
currentEncryption := strings.ToLower(manifestCopy.EncryptionMode) with something
that trims first (e.g., currentEncryption :=
strings.ToLower(strings.TrimSpace(manifestCopy.EncryptionMode))) so branches
that check currentEncryption (the if currentEncryption == "age" check) behave
consistently with statusFromManifest and avoid mislabeling ciphertext as "none".

In `@internal/orchestrator/network_staged_apply.go`:
- Around line 205-228: The code validates the symlink target before creating it
(using validateOverlaySymlinkTargetWithinRoot) but does not re-check the actual
created link, so a TOCTOU can let the final symlink escape destRoot; after
calling restoreFS.Symlink(validatedTarget, dest) (in the same block where
ensureDirExistsWithInheritedMeta and remove/IsDir checks occur), immediately
read back the created link (e.g., using restoreFS.Readlink or the same
extractSymlink logic from internal/orchestrator/restore.go) and verify it still
resolves within destRoot; if the readback validation fails, remove the created
symlink (restoreFS.Remove(dest)) and return an error, ensuring the
cleanup-on-failure guard mirrors extractSymlink’s behavior.
- Around line 235-248: The code computes a relative symlink target using
filepath.Dir(dest) which is lexical and wrong when dest's parent is a symlink;
instead use the resolved parent path. In the function handling symlink rewriting
(around resolvePathRelativeToBaseWithinRootFS and the variable resolved),
replace the call filepath.Rel(filepath.Dir(dest), resolved) with a relative
computation from the resolved parent (e.g., filepath.Rel(filepath.Dir(resolved),
resolved)) so the rewrittenTarget is computed against the materialized parent
directory rather than the lexical dest parent; preserve the same error wrapping
and return filepath.Clean(rewrittenTarget).

In `@internal/orchestrator/restore_decision.go`:
- Around line 90-98: collectRestoreArchiveFacts currently ignores errors
returned by the decompressor's Close, letting truncated archives appear valid;
update the handling around createDecompressionReader/reader so Close() errors
are surfaced: convert the function to use a named error return (or capture the
function's return values) and replace the simple defer that calls closer.Close()
with a defer that captures the Close() error and, if non-nil and no previous
error was returned, set/return that error from collectRestoreArchiveFacts (use
createDecompressionReader and reader as the anchors to locate the change).

In `@internal/safefs/safefs.go`:
- Around line 75-83: The timeout handling is inconsistent: when a context
deadline triggers you sometimes return context.DeadlineExceeded instead of the
sentinel timeout error; update the early guard in runLimited and the
<-ctx.Done() branches (including operationLimiter.acquire) to normalize
deadline-driven cancellations by checking ctx.Err() and if errors.Is(ctx.Err(),
context.DeadlineExceeded) return the package sentinel timeoutErr
(safefs.ErrTimeout) else return ctx.Err(); ensure both acquire (method on
operationLimiter) and runLimited return the same timeoutErr when the context
deadline is exceeded.

---

Outside diff comments:
In `@internal/orchestrator/decrypt_workflow_ui.go`:
- Around line 177-184: preparePlainBundleWithUI currently dereferences cand
(cand.Source and cand.IsRclone) inside logging.DebugStart which can panic if
cand is nil; first guard cand by checking if cand == nil and return the
appropriate error before calling logging.DebugStart (or assign safe local
variables like source/isRclone after the nil check), then call
logging.DebugStart and proceed to invoke preparePlainBundleCommon and
decryptArchiveWithSecretPrompt unchanged; reference preparePlainBundleWithUI,
cand.Source, cand.IsRclone, preparePlainBundleCommon, and
decryptArchiveWithSecretPrompt to locate the fix.

In `@internal/orchestrator/restore_workflow_ui.go`:
- Around line 93-117: The current fallback directly calls runFullRestoreWithUI
which can call extractPlainArchive regardless of cfg.DryRun and skips the
preflight logic in runRestoreWorkflowWithUI; change this to abort the restore
instead of invoking runFullRestoreWithUI when fallbackToFullRestore is true by
returning ErrRestoreAborted (i.e., replace the runFullRestoreWithUI call with a
return ErrRestoreAborted); separately, make a follow-up change to ensure
runFullRestoreWithUI is updated to honor cfg.DryRun and to invoke the same
preflight/cluster/PBS checks (or a factored preflight helper used by
runRestoreWorkflowWithUI) before calling extractPlainArchive.

---

Nitpick comments:
In `@internal/input/input_test.go`:
- Around line 241-248: The test
TestReadLineWithContext_DeadlineReturnsDeadlineExceeded (and other
timing-sensitive tests) use very short deadlines like context.WithTimeout(...,
50*time.Millisecond) which causes CI flakiness; change these to use a shared
test timeout constant (e.g., testTimeout) with a larger value (e.g., 200-300ms)
and replace direct 25ms/50ms literals with that constant wherever used
(including similar tests referenced in the comment) so time windows have more
headroom while preserving the same timeout semantics.
- Around line 346-351: The blocking test fixtures create a blockingLineReader
with channels (release, finish, returned) and start goroutines that can remain
blocked if the test fatals; immediately after constructing the fixture (the
blockingLineReader in input_test.go), register a t.Cleanup that unblocks those
goroutines by closing or otherwise signaling the release and finish channels
(and draining/closing returned if needed) so they never remain parked across
tests; update all similar fixtures (the other blockingLineReader instances
mentioned) to add the same t.Cleanup right after creation to ensure cleanup on
test failure.

In `@internal/orchestrator/compatibility.go`:
- Around line 64-79: In parseSystemTypeString, the switch case for SystemTypePBS
contains a redundant strings.Contains check for "proxmox backup server" because
"proxmox backup" already matches that substring; remove the "proxmox backup
server" clause from the case list for SystemTypePBS (leaving the checks for
"pbs", "proxmox-backup", and "proxmox backup") so the dead code is eliminated
while preserving the intended behavior.

In `@internal/orchestrator/decrypt_prepare_common.go`:
- Around line 23-31: The code mutates the caller-owned cand.BundlePath to a temp
localPath returned by downloadRcloneBackup, which leaves the caller holding a
deleted path after cleanup; instead, keep the downloaded path in a local
variable (e.g. localPath) and set rcloneCleanup to cleanup without overwriting
cand.BundlePath, or if you must use cand for downstream logic, store the
original path in a temp (origBundlePath) and restore cand.BundlePath after
cleanup/failure; update the rclone handling around downloadRcloneBackup,
rcloneCleanup, and cleanup invocation so that cand.BundlePath is never
permanently replaced by the transient localPath (same change for the other
occurrence around lines 60-61).

In `@internal/orchestrator/network_apply_additional_test.go`:
- Around line 1631-1708: Tests share a single FakeFS and fixed timestamp which
causes order-dependent interference; for each t.Run create and use a fresh
FakeFS (call NewFakeFS and set restoreFS to it, with t.Cleanup removing its
Root) and reset restoreCmd as needed so each subtest gets an isolated filesystem
and command runner; ensure tests that expect a missing marker (like the "script
write error removes marker" case) point at their own FakeFS instance and that
removeFailFS/writeFailFS wrappers are constructed around that per-test FakeFS;
keep the FakeTime usage if fixed timestamps are required but do not reuse the
same FakeFS between subtests.

In `@internal/orchestrator/restore_errors_test.go`:
- Around line 801-802: The test double ErrorInjectingFS lacks a dedicated
injection hook for Lstat; add a new field (e.g. lstatErr error) to
ErrorInjectingFS and modify the Lstat method to check and return that injected
error when non-nil (mirroring how OpenFile/Readlink/Link/Symlink use their
respective error hooks), so tests can force Lstat failures via the lstatErr
hook.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a65cccb0-799c-4972-81ab-380966da93c6

📥 Commits

Reviewing files that changed from the base of the PR and between b7d9670 and 160a442.

📒 Files selected for processing (66)
  • .github/workflows/release.yml
  • Makefile
  • docs/CONFIGURATION.md
  • docs/RESTORE_GUIDE.md
  • go.mod
  • internal/backup/checksum.go
  • internal/backup/checksum_legacy_test.go
  • internal/backup/checksum_test.go
  • internal/backup/collector.go
  • internal/backup/collector_helpers_extra_test.go
  • internal/backup/collector_pbs.go
  • internal/backup/collector_pbs_commands_coverage_test.go
  • internal/backup/collector_pbs_datastore.go
  • internal/backup/collector_pbs_datastore_inventory.go
  • internal/backup/collector_pbs_datastore_inventory_test.go
  • internal/backup/collector_pbs_extra_test.go
  • internal/backup/collector_pbs_test.go
  • internal/backup/collector_pve.go
  • internal/backup/collector_pve_parse_test.go
  • internal/backup/collector_pve_test.go
  • internal/config/templates/backup.env
  • internal/input/input.go
  • internal/input/input_test.go
  • internal/orchestrator/backup_safety.go
  • internal/orchestrator/backup_safety_test.go
  • internal/orchestrator/backup_sources.go
  • internal/orchestrator/backup_sources_test.go
  • internal/orchestrator/compatibility.go
  • internal/orchestrator/compatibility_test.go
  • internal/orchestrator/decrypt.go
  • internal/orchestrator/decrypt_integrity.go
  • internal/orchestrator/decrypt_integrity_test.go
  • internal/orchestrator/decrypt_integrity_test_helpers_test.go
  • internal/orchestrator/decrypt_prepare_common.go
  • internal/orchestrator/decrypt_test.go
  • internal/orchestrator/decrypt_tui.go
  • internal/orchestrator/decrypt_tui_test.go
  • internal/orchestrator/decrypt_workflow_test.go
  • internal/orchestrator/decrypt_workflow_ui.go
  • internal/orchestrator/deps.go
  • internal/orchestrator/deps_test.go
  • internal/orchestrator/network_apply_additional_test.go
  • internal/orchestrator/network_staged_apply.go
  • internal/orchestrator/network_staged_apply_test.go
  • internal/orchestrator/path_security.go
  • internal/orchestrator/path_security_test.go
  • internal/orchestrator/pbs_staged_apply.go
  • internal/orchestrator/pbs_staged_apply_additional_test.go
  • internal/orchestrator/restore.go
  • internal/orchestrator/restore_decision.go
  • internal/orchestrator/restore_decision_test.go
  • internal/orchestrator/restore_errors_test.go
  • internal/orchestrator/restore_firewall.go
  • internal/orchestrator/restore_firewall_additional_test.go
  • internal/orchestrator/restore_plan.go
  • internal/orchestrator/restore_plan_test.go
  • internal/orchestrator/restore_test.go
  • internal/orchestrator/restore_workflow_decision_test.go
  • internal/orchestrator/restore_workflow_more_test.go
  • internal/orchestrator/restore_workflow_ui.go
  • internal/orchestrator/restore_workflow_ui_helpers_test.go
  • internal/orchestrator/selective.go
  • internal/pbs/namespaces.go
  • internal/pbs/namespaces_test.go
  • internal/safefs/safefs.go
  • internal/safefs/safefs_test.go

Comment on lines +171 to +184
state := getPasswordState(fd)

state.mu.Lock()
defer state.mu.Unlock()
if state.inflight == nil {
inflight := &passwordInflight{
done: make(chan passwordResult, 1),
completed: make(chan struct{}),
}
state.inflight = inflight
go func() {
b, err := readPassword(fd)
inflight.done <- passwordResult{b: b, err: MapInputError(err)}
close(inflight.completed)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -type f -name "input.go" | grep internal

Repository: tis24dev/proxsave

Length of output: 86


🏁 Script executed:

wc -l ./internal/input/input.go

Repository: tis24dev/proxsave

Length of output: 90


🏁 Script executed:

cat -n ./internal/input/input.go

Repository: tis24dev/proxsave

Length of output: 6770


Credentials not cleared in global per-FD cache on timeout; allows password replay to subsequent calls.

When ReadPasswordWithContext times out (lines 190-194), the function returns without clearing state.inflight, leaving the passwordInflight struct and its buffered channel containing the password bytes permanently in the global passwordStates map. A subsequent call to ReadPasswordWithContext on the same file descriptor will reuse the cached inflight (line 175 condition fails) and retrieve the previous password from the channel, exposing the credential to an unrelated operation. Additionally, the []byte is never explicitly zeroed. Passwords must be cleared from this cache on every exit path, or the cache must be prompt-scoped rather than globally keyed by fd alone.

tis24dev added 10 commits March 11, 2026 10:13
Add focused safefs tests for expired-deadline paths and make deadline-driven context cancellations consistently map to ErrTimeout/TimeoutError in runLimited and operationLimiter.acquire. This fixes inconsistent timeout classification where some context deadline expirations returned context.DeadlineExceeded instead of the safefs timeout sentinel, while preserving context.Canceled behavior.
Make restore archive inspection fail when the decompression reader reports an error on Close, instead of silently accepting the archive as valid. This updates inspectRestoreArchiveContents to preserve existing inspect errors while surfacing deferred decompressor failures, and adds targeted tests covering both close-error propagation and error-precedence behavior.
Update overlay symlink validation to rewrite absolute targets relative to the resolved destination parent instead of the lexical parent path. This fixes incorrect rewritten targets when the destination's parent directory is itself a symlink, and adds a regression test covering the symlinked-parent case.
Add post-creation symlink verification for network overlay and firewall sync paths, mirroring the existing restore symlink safety model. The new check reads back each created symlink, ensures its effective target still resolves within the destination root, removes the link on validation failure, and adds regression tests covering readback failures and post-create escape scenarios.
Trim manifest EncryptionMode before lowercasing in preparePlainBundleCommon so values like " age " still trigger decryption. This aligns decryption behavior with statusFromManifest and adds a regression test proving whitespace-padded age mode is decrypted instead of being misclassified as plain.
Stop reusing the raw-item metadata context for checksum probing in discoverRcloneBackups. The checksum fetch now gets its own per-command timeout budget, matching the documented RCLONE_TIMEOUT_CONNECTION behavior, and a regression test covers the case where a slow metadata read previously left too little time for the checksum fetch.
…ion errors

Distinguish path-security errors from operational failures during safety-backup symlink restore. Keep the existing warning-and-skip behavior for symlink targets that escape the restore root, but propagate non-security errors from resolvePathRelativeToBaseWithinRootFS so restore failures are reported instead of silently dropping symlinks. Add tests covering operational failures before and after symlink creation.
…() and inflight completion.

- lock only to create/read the current inflight operation
- release the mutex before waiting on cancellation or read completion
- re-acquire the mutex only to clear state.inflight if it still matches
  the captured inflight
- store completed line/password results on the inflight state and use done
  as a broadcast completion signal for concurrent waiters
- add regression tests for concurrent cancelled callers and preserve
  existing retry/inflight reuse behavior
Reject relative PBS_DATASTORE_PATH values when building the datastore list.
normalizePBSDatastorePath only cleans input, so relative paths could previously
be accepted as overrides.

Keep the existing override naming, output key, and deduplication logic for valid
absolute paths, and log a warning when skipping an invalid relative override.

Add a regression test covering relative PBS_DATASTORE_PATH entries.
Fix PBS datastore inventory merge so the runtime CLI path overrides datastore.cfg for the same datastore identity. Add a regression test covering config-vs-CLI path precedence during merge.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
internal/input/input.go (1)

166-168: ⚠️ Potential issue | 🟠 Major

Don’t retain completed passwords for later calls.

Line 176 scopes the cache only by fd, and Lines 198-215 intentionally leave a finished passwordInflight attached after a timed-out or cancelled caller exits. That lets a later, unrelated prompt on the same descriptor receive the previous secret, and it keeps the password bytes resident in memory until that happens. Please keep deduplication only while the read is still pending, or scope reuse to a prompt-specific token and zero abandoned results.

Also applies to: 176-215

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/input/input.go` around lines 166 - 168, The cache currently keyed
only by fd stores a completed passwordInflight and returns it to later callers;
instead ensure deduplication only while the read is pending by removing the
passwordInflight entry as soon as the read finishes (on success, timeout, or
cancel) unless there are live waiters tied to the same prompt token; change
reuse to require a prompt-specific token (e.g., include promptID/prompt string
in the key or require a matching token on Wait) and zero the password bytes
before discarding an abandoned result so completed secrets are not retained in
memory (update code paths around the passwordInflight struct, the map keyed by
fd, and the waiter/Done/Cancel handling to drop and zero the entry immediately
when no matching live waiter remains).
internal/orchestrator/backup_sources.go (1)

500-509: ⚠️ Potential issue | 🟠 Major

Bound checksum reads before parsing.

backup.ParseChecksumData only needs the first token, but these helpers still buffer the entire .sha256 payload via ReadFile / CombinedOutput. A corrupt or hostile checksum object can still turn discovery into an unbounded memory read here.

📏 Suggested fix
 import (
 	"context"
 	"errors"
 	"fmt"
+	"io"
 	"os/exec"
 	"path"
 	"path/filepath"
 	"sort"
 	"strings"
@@
 	"github.com/tis24dev/proxsave/internal/config"
 	"github.com/tis24dev/proxsave/internal/logging"
 )
+
+const maxChecksumBytes = 4 << 10
@@
 func parseLocalChecksumFile(checksumPath string) (string, error) {
-	data, err := restoreFS.ReadFile(checksumPath)
+	f, err := restoreFS.Open(checksumPath)
 	if err != nil {
-		return "", fmt.Errorf("read checksum file %s: %w", checksumPath, err)
+		return "", fmt.Errorf("open checksum file %s: %w", checksumPath, err)
+	}
+	defer f.Close()
+	data, err := io.ReadAll(io.LimitReader(f, maxChecksumBytes+1))
+	if err != nil {
+		return "", fmt.Errorf("read checksum file %s: %w", checksumPath, err)
+	}
+	if len(data) > maxChecksumBytes {
+		return "", fmt.Errorf("checksum file %s exceeds %d bytes", checksumPath, maxChecksumBytes)
 	}
 	checksum, err := backup.ParseChecksumData(data)
 	if err != nil {
 		return "", fmt.Errorf("parse checksum file %s: %w", checksumPath, err)
 	}
@@
 func inspectRcloneChecksumFile(ctx context.Context, remotePath string, logger *logging.Logger) (checksum string, err error) {
 	done := logging.DebugStart(logger, "inspect rclone checksum", "remote=%s", remotePath)
 	defer func() { done(err) }()
 	logging.DebugStep(logger, "inspect rclone checksum", "executing: rclone cat %s", remotePath)
 
 	cmd := exec.CommandContext(ctx, "rclone", "cat", remotePath)
-	output, err := cmd.CombinedOutput()
-	if err != nil {
-		return "", fmt.Errorf("rclone cat %s failed: %w (output: %s)", remotePath, err, strings.TrimSpace(string(output)))
+	stdout, err := cmd.StdoutPipe()
+	if err != nil {
+		return "", fmt.Errorf("pipe checksum file %s: %w", remotePath, err)
+	}
+	var stderr strings.Builder
+	cmd.Stderr = &stderr
+	if err := cmd.Start(); err != nil {
+		return "", fmt.Errorf("start rclone cat %s: %w", remotePath, err)
+	}
+	output, readErr := io.ReadAll(io.LimitReader(stdout, maxChecksumBytes+1))
+	if len(output) > maxChecksumBytes {
+		_ = cmd.Process.Kill()
+		_ = cmd.Wait()
+		return "", fmt.Errorf("checksum file %s exceeds %d bytes", remotePath, maxChecksumBytes)
+	}
+	if readErr != nil {
+		_ = cmd.Process.Kill()
+		_ = cmd.Wait()
+		return "", fmt.Errorf("read checksum file %s: %w", remotePath, readErr)
+	}
+	if err := cmd.Wait(); err != nil {
+		return "", fmt.Errorf("rclone cat %s failed: %w (output: %s)", remotePath, err, strings.TrimSpace(stderr.String()))
 	}
 	checksum, err = backup.ParseChecksumData(output)
 	if err != nil {
 		return "", fmt.Errorf("parse checksum file %s: %w", remotePath, err)
 	}

Also applies to: 512-523

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/orchestrator/backup_sources.go` around lines 500 - 509, The helper
parseLocalChecksumFile (and the similar helpers referenced around lines 512-523)
currently uses restoreFS.ReadFile which buffers the entire .sha256 payload
before calling backup.ParseChecksumData, allowing a malicious or corrupt
checksum file to cause unbounded memory usage; change the implementation to
stream only the needed prefix/token instead of ReadFile: open the file via
restoreFS.Open, wrap it in a buffered reader (bufio.Reader) and read only up to
a safe limit or read the first token (stopping at the first whitespace/newline)
and pass that trimmed token bytes/string to backup.ParseChecksumData; ensure you
close the file and apply the same approach to the other helper functions that
use CombinedOutput/ReadFile before parsing checksums.
🧹 Nitpick comments (4)
internal/backup/collector_pbs_datastore.go (1)

212-230: Consider adding a loop bound for defensive programming.

The for attempt := 0; ; attempt++ loop theoretically runs indefinitely if hash collisions persist. While practically impossible with SHA256-derived keys, adding a maximum iteration guard (e.g., 1000) would prevent pathological edge cases and improve code robustness.

🛡️ Optional defensive bound
-			for attempt := 0; ; attempt++ {
+			const maxAttempts = 1000
+			for attempt := 0; attempt < maxAttempts; attempt++ {
 				candidate := assignment.BaseKey
 				if !preferBase || attempt > 0 {
 					seed := assignment.Identity
 					if attempt > 0 {
 						seed = fmt.Sprintf("%s#%d", assignment.Identity, attempt)
 					}
 					candidate = fmt.Sprintf("%s_%s", assignment.BaseKey, pbsOutputKeyDigest(seed))
 				}

 				if owner, ok := usedKeys[candidate]; ok && owner != assignment.Identity {
 					continue
 				}

 				usedKeys[candidate] = assignment.Identity
 				identityKeys[assignment.Identity] = candidate
 				assignFn(&items[assignment.Index], candidate)
 				break
 			}
+			// If we exhausted attempts (shouldn't happen), fall back to index-based key
+			if _, assigned := identityKeys[assignment.Identity]; !assigned {
+				fallback := fmt.Sprintf("%s_%d", assignment.BaseKey, assignment.Index)
+				usedKeys[fallback] = assignment.Identity
+				identityKeys[assignment.Identity] = fallback
+				assignFn(&items[assignment.Index], fallback)
+			}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/backup/collector_pbs_datastore.go` around lines 212 - 230, The loop
in the key assignment (for attempt := 0; ; attempt++) can run unbounded; add a
defensive max attempts guard (e.g., const maxAttempts = 1000) and check attempt
>= maxAttempts each iteration to return or surface an error so we avoid infinite
loops on pathological collisions; update the block that uses assignment.BaseKey,
pbsOutputKeyDigest(seed), usedKeys, identityKeys and
assignFn(&items[assignment.Index], candidate) to break normally on success but
return or propagate an explicit error when the maxAttempts threshold is reached.
internal/safefs/safefs_test.go (1)

13-37: Consider thread-safe counter for errCalls.

The errCalls field is incremented without synchronization. While the current test usage appears single-threaded, context.Context.Err() is documented to be safe for concurrent use. If this type is reused in tests with concurrent access, it could cause data races.

♻️ Optional: Use atomic counter
+import "sync/atomic"
+
 type stagedDeadlineContext struct {
 	deadline time.Time
 	done     <-chan struct{}
-	errCalls int
+	errCalls atomic.Int32
 }

 func (c *stagedDeadlineContext) Err() error {
-	c.errCalls++
-	if c.errCalls == 1 {
+	if c.errCalls.Add(1) == 1 {
 		return nil
 	}
 	return context.DeadlineExceeded
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/safefs/safefs_test.go` around lines 13 - 37, The
stagedDeadlineContext.Err method increments errCalls without synchronization;
make errCalls thread-safe by replacing the plain int with an atomic counter
(e.g., int32/int64 used via sync/atomic or atomic.Int32/Int64) and update Err to
read/modify it using atomic operations (atomic.AddInt32/LoadInt32 or the
corresponding atomic type methods) so stagedDeadlineContext.Err() is safe for
concurrent calls.
internal/safefs/safefs.go (1)

153-158: Minor: effectiveTimeout is called twice per operation.

The TimeoutError is constructed with effectiveTimeout(ctx, timeout), then runLimited calls it again internally. While unlikely to cause issues in practice, caching the value could avoid the redundant calculation and ensure the error message always matches the actual timeout used.

♻️ Optional: Cache effective timeout
 func Stat(ctx context.Context, path string, timeout time.Duration) (fs.FileInfo, error) {
 	stat := osStat
-	return runLimited(ctx, timeout, &TimeoutError{Op: "stat", Path: path, Timeout: effectiveTimeout(ctx, timeout)}, func() (fs.FileInfo, error) {
+	effective := effectiveTimeout(ctx, timeout)
+	return runLimited(ctx, effective, &TimeoutError{Op: "stat", Path: path, Timeout: effective}, func() (fs.FileInfo, error) {
 		return stat(path)
 	})
 }

Apply similar changes to ReadDir and Statfs.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/safefs/safefs.go` around lines 153 - 158, Compute effectiveTimeout
once into a local variable and reuse it when constructing the TimeoutError so
the error's Timeout field matches the value used by runLimited; e.g. in Stat
(and likewise in ReadDir and Statfs) assign eff := effectiveTimeout(ctx,
timeout) and pass eff into TimeoutError{... Timeout: eff} while keeping the same
runLimited call.
internal/orchestrator/restore_decision_test.go (1)

162-174: Minor: Loop variable shadows outer archivePath.

The loop variable archivePath shadows the outer variable of the same name (line 133). Consider renaming to entry or p for clarity.

Suggested rename
-	for _, archivePath := range archivePaths {
-		if archivePath == restoreDecisionMetadataPath {
+	for _, entry := range archivePaths {
+		if entry == restoreDecisionMetadataPath {
 			foundMeta = true
 		}
-		if archivePath == "var/lib/pve-cluster/config.db" {
+		if entry == "var/lib/pve-cluster/config.db" {
 			foundCluster = true
 		}
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/orchestrator/restore_decision_test.go` around lines 162 - 174, The
test loop in restore_decision_test.go uses a loop variable named archivePath
which shadows the outer archivePath; rename the inner loop variable (e.g., to
entry or p) in the for-range that iterates over archivePaths so you compare
entry to restoreDecisionMetadataPath and "var/lib/pve-cluster/config.db" while
keeping the outer archivePath intact (references: variables archivePaths,
restoreDecisionMetadataPath, foundMeta, foundCluster).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/orchestrator/decrypt_prepare_common.go`:
- Around line 80-103: The code currently derives plainArchivePath by stripping
“.age” from staged.ArchivePath while branching on manifestCopy.EncryptionMode,
which can lead to mismatches and input==output for decryptArchive; fix by
validating and ensuring distinct paths: when currentEncryption
(manifestCopy.EncryptionMode) == "age" assert staged.ArchivePath has a ".age"
suffix (return an error if not) and ensure plainArchivePath !=
staged.ArchivePath (if equal, generate a new unique output name in workDir,
e.g., append ".decrypted" or a random suffix); when currentEncryption != "age"
assert staged.ArchivePath does not end with ".age" (return an error if it does)
before calling copyFile; update the logic around decryptArchive,
plainArchiveName/plainArchivePath, and copyFile to use these validated/adjusted
paths so decrypt and copy never operate with identical input and output.

---

Duplicate comments:
In `@internal/input/input.go`:
- Around line 166-168: The cache currently keyed only by fd stores a completed
passwordInflight and returns it to later callers; instead ensure deduplication
only while the read is pending by removing the passwordInflight entry as soon as
the read finishes (on success, timeout, or cancel) unless there are live waiters
tied to the same prompt token; change reuse to require a prompt-specific token
(e.g., include promptID/prompt string in the key or require a matching token on
Wait) and zero the password bytes before discarding an abandoned result so
completed secrets are not retained in memory (update code paths around the
passwordInflight struct, the map keyed by fd, and the waiter/Done/Cancel
handling to drop and zero the entry immediately when no matching live waiter
remains).

In `@internal/orchestrator/backup_sources.go`:
- Around line 500-509: The helper parseLocalChecksumFile (and the similar
helpers referenced around lines 512-523) currently uses restoreFS.ReadFile which
buffers the entire .sha256 payload before calling backup.ParseChecksumData,
allowing a malicious or corrupt checksum file to cause unbounded memory usage;
change the implementation to stream only the needed prefix/token instead of
ReadFile: open the file via restoreFS.Open, wrap it in a buffered reader
(bufio.Reader) and read only up to a safe limit or read the first token
(stopping at the first whitespace/newline) and pass that trimmed token
bytes/string to backup.ParseChecksumData; ensure you close the file and apply
the same approach to the other helper functions that use CombinedOutput/ReadFile
before parsing checksums.

---

Nitpick comments:
In `@internal/backup/collector_pbs_datastore.go`:
- Around line 212-230: The loop in the key assignment (for attempt := 0; ;
attempt++) can run unbounded; add a defensive max attempts guard (e.g., const
maxAttempts = 1000) and check attempt >= maxAttempts each iteration to return or
surface an error so we avoid infinite loops on pathological collisions; update
the block that uses assignment.BaseKey, pbsOutputKeyDigest(seed), usedKeys,
identityKeys and assignFn(&items[assignment.Index], candidate) to break normally
on success but return or propagate an explicit error when the maxAttempts
threshold is reached.

In `@internal/orchestrator/restore_decision_test.go`:
- Around line 162-174: The test loop in restore_decision_test.go uses a loop
variable named archivePath which shadows the outer archivePath; rename the inner
loop variable (e.g., to entry or p) in the for-range that iterates over
archivePaths so you compare entry to restoreDecisionMetadataPath and
"var/lib/pve-cluster/config.db" while keeping the outer archivePath intact
(references: variables archivePaths, restoreDecisionMetadataPath, foundMeta,
foundCluster).

In `@internal/safefs/safefs_test.go`:
- Around line 13-37: The stagedDeadlineContext.Err method increments errCalls
without synchronization; make errCalls thread-safe by replacing the plain int
with an atomic counter (e.g., int32/int64 used via sync/atomic or
atomic.Int32/Int64) and update Err to read/modify it using atomic operations
(atomic.AddInt32/LoadInt32 or the corresponding atomic type methods) so
stagedDeadlineContext.Err() is safe for concurrent calls.

In `@internal/safefs/safefs.go`:
- Around line 153-158: Compute effectiveTimeout once into a local variable and
reuse it when constructing the TimeoutError so the error's Timeout field matches
the value used by runLimited; e.g. in Stat (and likewise in ReadDir and Statfs)
assign eff := effectiveTimeout(ctx, timeout) and pass eff into TimeoutError{...
Timeout: eff} while keeping the same runLimited call.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b29ae230-1535-4f56-bef6-05f637508cf7

📥 Commits

Reviewing files that changed from the base of the PR and between 160a442 and 3801d8d.

📒 Files selected for processing (20)
  • internal/backup/collector_pbs_datastore.go
  • internal/backup/collector_pbs_datastore_inventory.go
  • internal/backup/collector_pbs_datastore_inventory_test.go
  • internal/backup/collector_pbs_test.go
  • internal/input/input.go
  • internal/input/input_test.go
  • internal/orchestrator/backup_safety.go
  • internal/orchestrator/backup_safety_test.go
  • internal/orchestrator/backup_sources.go
  • internal/orchestrator/backup_sources_test.go
  • internal/orchestrator/decrypt_prepare_common.go
  • internal/orchestrator/decrypt_test.go
  • internal/orchestrator/network_staged_apply.go
  • internal/orchestrator/network_staged_apply_test.go
  • internal/orchestrator/restore_decision.go
  • internal/orchestrator/restore_decision_test.go
  • internal/orchestrator/restore_firewall.go
  • internal/orchestrator/restore_firewall_additional_test.go
  • internal/safefs/safefs.go
  • internal/safefs/safefs_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • internal/backup/collector_pbs_datastore_inventory_test.go

Replace unbounded checksum sidecar reads in backup discovery with bounded
streaming reads for both local files and rclone sources.

- cap checksum input to 4 KiB before parsing
- remove full-file ReadFile / CombinedOutput usage on .sha256 paths
- return clear errors for oversized or empty checksum files
- preserve compatibility with valid bounded checksum files without trailing newline
- add coverage for local and rclone oversized / bounded-input cases
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/orchestrator/backup_sources.go`:
- Around line 557-569: readBoundedChecksumLine can return closedEarly=true when
we intentionally close stdout after reading a valid first line, but the current
check (if waitErr != nil && !closedEarly) ignores any rclone failures when
closedEarly is true; change the logic after cmd.Wait() to treat waitErr as an
error unless stderr is empty (i.e., only ignore waitErr when closedEarly is true
AND strings.TrimSpace(stderr.String()) == ""), so that genuine rclone failures
(non-empty stderr) are surfaced even if closedEarly is true; update the error
branches that reference cmd.Wait(), closedEarly, stderr, and remotePath
accordingly and add a test where a fake rclone writes a valid checksum then
exits with code 1 to ensure failures are not masked.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a4afc712-13a8-4c84-9705-d22efb7dde06

📥 Commits

Reviewing files that changed from the base of the PR and between 3801d8d and b9eb4d7.

📒 Files selected for processing (2)
  • internal/orchestrator/backup_sources.go
  • internal/orchestrator/backup_sources_test.go

Only ignore `rclone cat` wait errors when stdout was closed early after
reading a valid first checksum line and stderr is empty.

This prevents genuine rclone failures from being masked when the checksum
is read successfully but the command still exits non-zero with an error
message. Add a regression test for a fake rclone that prints a valid
checksum line and then exits with code 1.
Validate the staged archive path against the manifest encryption mode in
preparePlainBundleCommon before deriving the plain archive output path.

This rejects inconsistent inputs where `EncryptionMode=age` does not have
a `.age` archive suffix, or where a non-age archive still ends with
`.age`. It also ensures decryption never runs with identical input and
output paths by generating a unique output name when needed.

Add regression coverage for both mode/suffix mismatch cases and the
unique-path fallback for age archives.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
internal/orchestrator/backup_sources.go (1)

491-501: normalizeCandidateManifestChecksum mutates the manifest in-place.

The function both returns a normalized checksum and mutates manifest.SHA256 as a side effect. This dual behavior is subtle and could cause confusion. Consider either:

  1. Making it a pure function that only returns the normalized value
  2. Renaming to normalizeAndUpdateManifestChecksum to signal mutation

The current approach works but the mutation is easy to miss when reading call sites.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/orchestrator/backup_sources.go` around lines 491 - 501, The function
normalizeCandidateManifestChecksum currently mutates the input manifest.SHA256
while also returning the normalized value; either make it pure (do not assign
back to manifest.SHA256) and update all call sites to explicitly set
manifest.SHA256 when mutation is desired, or rename the function to
normalizeAndUpdateManifestChecksum to signal the side-effect; choose one
approach and apply consistently by modifying normalizeCandidateManifestChecksum
(or renaming it) and adjusting callers that rely on the in-place update so
behavior is explicit.
internal/orchestrator/decrypt_prepare_common.go (1)

73-82: Avoid mutating cand.BundlePath with an ephemeral download path.

cleanup() later removes the downloaded rclone bundle, but the caller-owned decryptCandidate is left pointing at that temp file. Keeping the local bundle path in a function-scoped variable avoids leaking invalid state back to the caller.

💡 Suggested refactor
+	bundlePath := cand.BundlePath
 	var rcloneCleanup func()
 	if cand.IsRclone && cand.Source == sourceBundle {
 		logger.Debug("Detected rclone backup, downloading...")
-		localPath, cleanup, err := downloadRcloneBackup(ctx, cand.BundlePath, logger)
+		localPath, cleanup, err := downloadRcloneBackup(ctx, bundlePath, logger)
 		if err != nil {
 			return nil, fmt.Errorf("failed to download rclone backup: %w", err)
 		}
 		rcloneCleanup = cleanup
-		cand.BundlePath = localPath
+		bundlePath = localPath
 	}
@@
 	switch cand.Source {
 	case sourceBundle:
-		logger.Info("Extracting bundle %s", filepath.Base(cand.BundlePath))
-		staged, err = extractBundleToWorkdirWithLogger(cand.BundlePath, workDir, logger)
+		logger.Info("Extracting bundle %s", filepath.Base(bundlePath))
+		staged, err = extractBundleToWorkdirWithLogger(bundlePath, workDir, logger)

Also applies to: 109-111

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/orchestrator/decrypt_prepare_common.go` around lines 73 - 82, The
code mutates the caller-owned decryptCandidate by setting cand.BundlePath to a
temporary rclone download path which is later removed by rcloneCleanup; instead,
keep the downloaded path in a function-local variable (e.g., localBundlePath)
returned by downloadRcloneBackup and use that local variable for downstream work
and cleanup, and do not assign it back to cand.BundlePath. Update both places
that currently set cand.BundlePath (the downloadRcloneBackup call handling and
the other block around the same logic) to use the local variable and ensure
rcloneCleanup still cleans that local path without mutating decryptCandidate.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/orchestrator/decrypt_prepare_common.go`:
- Around line 32-50: archiveBase and plainArchiveName can be dot-segments like
"." or ".." after trimming ".age", which allows filepath.Join(workDir,
plainArchiveName) to escape the scratch dir; before joining, validate
plainArchiveName (and archiveBase earlier) to reject or treat ".", "..", empty
or whitespace-only values — e.g., detect plainArchiveName == "." ||
plainArchiveName == ".." || strings.TrimSpace(plainArchiveName) == "" — and call
createUniquePreparedArchivePath(workDir, archiveBase or plainArchiveName)
instead of performing filepath.Join; update the branch that handles
currentEncryption == "age" to perform this guard using the existing variables
stagedArchivePath, workDir, archiveBase, plainArchiveName and the helper
createUniquePreparedArchivePath.
- Around line 129-174: The code currently treats any non-"age" encryption as
plaintext and sets manifestCopy.EncryptionMode = "none"; change the logic in the
block using currentEncryption (computed from manifestCopy.EncryptionMode) so
that: if currentEncryption == "age" then use decryptArchive as before; else if
currentEncryption == "none" (or empty) then allow the copy path and keep
EncryptionMode="none"; otherwise return an error (e.g., "unsupported encryption
mode: %s") and call cleanup(), rather than proceeding and overwriting
manifestCopy.EncryptionMode. Update the branch that calls copyFile
(staged.ArchivePath vs plainArchivePath) to only run for explicit "none"/empty
modes and ensure manifestCopy.EncryptionMode is only set to "none" when no
decryption was required.

---

Nitpick comments:
In `@internal/orchestrator/backup_sources.go`:
- Around line 491-501: The function normalizeCandidateManifestChecksum currently
mutates the input manifest.SHA256 while also returning the normalized value;
either make it pure (do not assign back to manifest.SHA256) and update all call
sites to explicitly set manifest.SHA256 when mutation is desired, or rename the
function to normalizeAndUpdateManifestChecksum to signal the side-effect; choose
one approach and apply consistently by modifying
normalizeCandidateManifestChecksum (or renaming it) and adjusting callers that
rely on the in-place update so behavior is explicit.

In `@internal/orchestrator/decrypt_prepare_common.go`:
- Around line 73-82: The code mutates the caller-owned decryptCandidate by
setting cand.BundlePath to a temporary rclone download path which is later
removed by rcloneCleanup; instead, keep the downloaded path in a function-local
variable (e.g., localBundlePath) returned by downloadRcloneBackup and use that
local variable for downstream work and cleanup, and do not assign it back to
cand.BundlePath. Update both places that currently set cand.BundlePath (the
downloadRcloneBackup call handling and the other block around the same logic) to
use the local variable and ensure rcloneCleanup still cleans that local path
without mutating decryptCandidate.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6d659d54-36c1-4947-a00d-5400bd816061

📥 Commits

Reviewing files that changed from the base of the PR and between b9eb4d7 and e752f47.

📒 Files selected for processing (4)
  • internal/orchestrator/backup_sources.go
  • internal/orchestrator/backup_sources_test.go
  • internal/orchestrator/decrypt_prepare_common.go
  • internal/orchestrator/decrypt_test.go

Comment on lines +32 to +50
archiveBase := filepath.Base(stagedArchivePath)
if archiveBase == "." || archiveBase == string(filepath.Separator) || strings.TrimSpace(archiveBase) == "" {
return "", fmt.Errorf("invalid staged archive path %s", stagedArchivePath)
}

if currentEncryption == "age" {
if !strings.HasSuffix(archiveBase, ".age") {
return "", fmt.Errorf("encrypted archive %s is missing .age suffix", stagedArchivePath)
}

plainArchiveName := strings.TrimSuffix(archiveBase, ".age")
if strings.TrimSpace(plainArchiveName) == "" {
return createUniquePreparedArchivePath(workDir, archiveBase)
}

plainArchivePath := filepath.Join(workDir, plainArchiveName)
if plainArchivePath == stagedArchivePath {
return createUniquePreparedArchivePath(workDir, plainArchiveName)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard dot-segment archive names before joining with workDir.

filepath.Base can still return "..", and trimming .age can produce "." or ".." (for example, "..age" / "...age"). In both cases, filepath.Join(workDir, ...) resolves to the work dir itself or its parent, so a malformed staged name can escape the scratch directory.

💡 Suggested fix
 func resolvePreparedArchivePath(workDir, stagedArchivePath, currentEncryption string) (string, error) {
 	archiveBase := filepath.Base(stagedArchivePath)
-	if archiveBase == "." || archiveBase == string(filepath.Separator) || strings.TrimSpace(archiveBase) == "" {
+	if archiveBase == "." || archiveBase == ".." || archiveBase == string(filepath.Separator) || strings.TrimSpace(archiveBase) == "" {
 		return "", fmt.Errorf("invalid staged archive path %s", stagedArchivePath)
 	}
 
 	if currentEncryption == "age" {
 		if !strings.HasSuffix(archiveBase, ".age") {
 			return "", fmt.Errorf("encrypted archive %s is missing .age suffix", stagedArchivePath)
 		}
 
 		plainArchiveName := strings.TrimSuffix(archiveBase, ".age")
-		if strings.TrimSpace(plainArchiveName) == "" {
+		if strings.TrimSpace(plainArchiveName) == "" || plainArchiveName == "." || plainArchiveName == ".." {
 			return createUniquePreparedArchivePath(workDir, archiveBase)
 		}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
archiveBase := filepath.Base(stagedArchivePath)
if archiveBase == "." || archiveBase == string(filepath.Separator) || strings.TrimSpace(archiveBase) == "" {
return "", fmt.Errorf("invalid staged archive path %s", stagedArchivePath)
}
if currentEncryption == "age" {
if !strings.HasSuffix(archiveBase, ".age") {
return "", fmt.Errorf("encrypted archive %s is missing .age suffix", stagedArchivePath)
}
plainArchiveName := strings.TrimSuffix(archiveBase, ".age")
if strings.TrimSpace(plainArchiveName) == "" {
return createUniquePreparedArchivePath(workDir, archiveBase)
}
plainArchivePath := filepath.Join(workDir, plainArchiveName)
if plainArchivePath == stagedArchivePath {
return createUniquePreparedArchivePath(workDir, plainArchiveName)
}
archiveBase := filepath.Base(stagedArchivePath)
if archiveBase == "." || archiveBase == ".." || archiveBase == string(filepath.Separator) || strings.TrimSpace(archiveBase) == "" {
return "", fmt.Errorf("invalid staged archive path %s", stagedArchivePath)
}
if currentEncryption == "age" {
if !strings.HasSuffix(archiveBase, ".age") {
return "", fmt.Errorf("encrypted archive %s is missing .age suffix", stagedArchivePath)
}
plainArchiveName := strings.TrimSuffix(archiveBase, ".age")
if strings.TrimSpace(plainArchiveName) == "" || plainArchiveName == "." || plainArchiveName == ".." {
return createUniquePreparedArchivePath(workDir, archiveBase)
}
plainArchivePath := filepath.Join(workDir, plainArchiveName)
if plainArchivePath == stagedArchivePath {
return createUniquePreparedArchivePath(workDir, plainArchiveName)
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/orchestrator/decrypt_prepare_common.go` around lines 32 - 50,
archiveBase and plainArchiveName can be dot-segments like "." or ".." after
trimming ".age", which allows filepath.Join(workDir, plainArchiveName) to escape
the scratch dir; before joining, validate plainArchiveName (and archiveBase
earlier) to reject or treat ".", "..", empty or whitespace-only values — e.g.,
detect plainArchiveName == "." || plainArchiveName == ".." ||
strings.TrimSpace(plainArchiveName) == "" — and call
createUniquePreparedArchivePath(workDir, archiveBase or plainArchiveName)
instead of performing filepath.Join; update the branch that handles
currentEncryption == "age" to perform this guard using the existing variables
stagedArchivePath, workDir, archiveBase, plainArchiveName and the helper
createUniquePreparedArchivePath.

Comment on lines +129 to +174
manifestCopy := *cand.Manifest
currentEncryption := strings.ToLower(strings.TrimSpace(manifestCopy.EncryptionMode))
logger.Info("Preparing archive %s for decryption (mode: %s)", manifestCopy.ArchivePath, statusFromManifest(&manifestCopy))

plainArchivePath, err := resolvePreparedArchivePath(workDir, staged.ArchivePath, currentEncryption)
if err != nil {
cleanup()
return nil, err
}

if currentEncryption == "age" {
if decryptArchive == nil {
cleanup()
return nil, fmt.Errorf("decrypt function not available")
}
displayName := cand.DisplayBase
if strings.TrimSpace(displayName) == "" {
displayName = filepath.Base(manifestCopy.ArchivePath)
}
if err := decryptArchive(ctx, staged.ArchivePath, plainArchivePath, displayName); err != nil {
cleanup()
return nil, err
}
} else if staged.ArchivePath != plainArchivePath {
if err := copyFile(restoreFS, staged.ArchivePath, plainArchivePath); err != nil {
cleanup()
return nil, fmt.Errorf("copy archive: %w", err)
}
}

archiveInfo, err := restoreFS.Stat(plainArchivePath)
if err != nil {
cleanup()
return nil, fmt.Errorf("stat decrypted archive: %w", err)
}

plainChecksum, err := backup.GenerateChecksum(ctx, logger, plainArchivePath)
if err != nil {
cleanup()
return nil, fmt.Errorf("generate checksum: %w", err)
}

manifestCopy.ArchivePath = plainArchivePath
manifestCopy.ArchiveSize = archiveInfo.Size()
manifestCopy.SHA256 = plainChecksum
manifestCopy.EncryptionMode = "none"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Reject unsupported encryption modes instead of coercing them to none.

Lines 139-174 only decrypt "age", but every other value falls through the plain-archive path and is then rewritten to "none". A tampered manifest like "gpg" would therefore be treated as a prepared plain bundle even though no decryption happened.

💡 Suggested fix
 	manifestCopy := *cand.Manifest
 	currentEncryption := strings.ToLower(strings.TrimSpace(manifestCopy.EncryptionMode))
+	switch currentEncryption {
+	case "", "none":
+		currentEncryption = "none"
+	case "age":
+	default:
+		cleanup()
+		return nil, fmt.Errorf("unsupported encryption mode %q", manifestCopy.EncryptionMode)
+	}
 	logger.Info("Preparing archive %s for decryption (mode: %s)", manifestCopy.ArchivePath, statusFromManifest(&manifestCopy))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
manifestCopy := *cand.Manifest
currentEncryption := strings.ToLower(strings.TrimSpace(manifestCopy.EncryptionMode))
logger.Info("Preparing archive %s for decryption (mode: %s)", manifestCopy.ArchivePath, statusFromManifest(&manifestCopy))
plainArchivePath, err := resolvePreparedArchivePath(workDir, staged.ArchivePath, currentEncryption)
if err != nil {
cleanup()
return nil, err
}
if currentEncryption == "age" {
if decryptArchive == nil {
cleanup()
return nil, fmt.Errorf("decrypt function not available")
}
displayName := cand.DisplayBase
if strings.TrimSpace(displayName) == "" {
displayName = filepath.Base(manifestCopy.ArchivePath)
}
if err := decryptArchive(ctx, staged.ArchivePath, plainArchivePath, displayName); err != nil {
cleanup()
return nil, err
}
} else if staged.ArchivePath != plainArchivePath {
if err := copyFile(restoreFS, staged.ArchivePath, plainArchivePath); err != nil {
cleanup()
return nil, fmt.Errorf("copy archive: %w", err)
}
}
archiveInfo, err := restoreFS.Stat(plainArchivePath)
if err != nil {
cleanup()
return nil, fmt.Errorf("stat decrypted archive: %w", err)
}
plainChecksum, err := backup.GenerateChecksum(ctx, logger, plainArchivePath)
if err != nil {
cleanup()
return nil, fmt.Errorf("generate checksum: %w", err)
}
manifestCopy.ArchivePath = plainArchivePath
manifestCopy.ArchiveSize = archiveInfo.Size()
manifestCopy.SHA256 = plainChecksum
manifestCopy.EncryptionMode = "none"
manifestCopy := *cand.Manifest
currentEncryption := strings.ToLower(strings.TrimSpace(manifestCopy.EncryptionMode))
switch currentEncryption {
case "", "none":
currentEncryption = "none"
case "age":
default:
cleanup()
return nil, fmt.Errorf("unsupported encryption mode %q", manifestCopy.EncryptionMode)
}
logger.Info("Preparing archive %s for decryption (mode: %s)", manifestCopy.ArchivePath, statusFromManifest(&manifestCopy))
plainArchivePath, err := resolvePreparedArchivePath(workDir, staged.ArchivePath, currentEncryption)
if err != nil {
cleanup()
return nil, err
}
if currentEncryption == "age" {
if decryptArchive == nil {
cleanup()
return nil, fmt.Errorf("decrypt function not available")
}
displayName := cand.DisplayBase
if strings.TrimSpace(displayName) == "" {
displayName = filepath.Base(manifestCopy.ArchivePath)
}
if err := decryptArchive(ctx, staged.ArchivePath, plainArchivePath, displayName); err != nil {
cleanup()
return nil, err
}
} else if staged.ArchivePath != plainArchivePath {
if err := copyFile(restoreFS, staged.ArchivePath, plainArchivePath); err != nil {
cleanup()
return nil, fmt.Errorf("copy archive: %w", err)
}
}
archiveInfo, err := restoreFS.Stat(plainArchivePath)
if err != nil {
cleanup()
return nil, fmt.Errorf("stat decrypted archive: %w", err)
}
plainChecksum, err := backup.GenerateChecksum(ctx, logger, plainArchivePath)
if err != nil {
cleanup()
return nil, fmt.Errorf("generate checksum: %w", err)
}
manifestCopy.ArchivePath = plainArchivePath
manifestCopy.ArchiveSize = archiveInfo.Size()
manifestCopy.SHA256 = plainChecksum
manifestCopy.EncryptionMode = "none"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/orchestrator/decrypt_prepare_common.go` around lines 129 - 174, The
code currently treats any non-"age" encryption as plaintext and sets
manifestCopy.EncryptionMode = "none"; change the logic in the block using
currentEncryption (computed from manifestCopy.EncryptionMode) so that: if
currentEncryption == "age" then use decryptArchive as before; else if
currentEncryption == "none" (or empty) then allow the copy path and keep
EncryptionMode="none"; otherwise return an error (e.g., "unsupported encryption
mode: %s") and call cleanup(), rather than proceeding and overwriting
manifestCopy.EncryptionMode. Update the branch that calls copyFile
(staged.ArchivePath vs plainArchivePath) to only run for explicit "none"/empty
modes and ensure manifestCopy.EncryptionMode is only set to "none" when no
decryption was required.

@tis24dev tis24dev closed this Mar 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant