Skip to content

Sync dev to main#169

Closed
tis24dev wants to merge 28 commits intomainfrom
dev
Closed

Sync dev to main#169
tis24dev wants to merge 28 commits intomainfrom
dev

Conversation

@tis24dev
Copy link
Owner

@tis24dev tis24dev commented Mar 10, 2026

  • test(orchestrator): maximize coverage for network_apply
  • ci: bump the actions-updates group across 1 directory with 2 updates (ci: bump the actions-updates group across 1 directory with 2 updates #164)
  • backup/pve: clarify datastore skip logs
  • feat(webhook): optional Discord content fallback for embed-only messages
  • Revert "feat(webhook): optional Discord content fallback for embed-only messages"
  • Update go.mod
  • Harden restore decisions against untrusted backup metadata
  • Enforce staged backup checksum verification in restore/decrypt flows
  • fix(orchestrator): preserve staged network symlinks
  • fix(orchestrator): harden restore path validation against broken symlink escapes
  • fix(backup): sanitize datastore and storage path segments
  • fix(io): bound timed fs and input goroutine buildup
  • fix(orchestrator): preserve operational restore errors during path validation
  • Separate PBS override scan roots from real datastore identities.
  • fix(orchestrator): resolve symlinked base dirs before validating relative links
  • fix(safefs): harden timeout test cleanup and limiter capture
  • fix(input): preserve completed inflight reads across retries
  • fix(pbs): disambiguate datastore output keys across CLI and overrides
  • test(input): assert inflight cleanup after completed retry consumption

Summary by CodeRabbit

  • New Features

    • Filesystem-based PBS datastore discovery and safer per-datastore output keys
    • Archive-driven restore analysis to detect backup type, cluster mode, and host
    • Improved archive preparation with unified decryption/preparation flow
  • Bug Fixes

    • Stronger path/symlink safety across restore and network file apply flows
    • Better checksum normalization and integrity verification for backups
    • Safer handling of storage/datastore names with special characters
  • Documentation

    • Clarified PBS datastore scan-root semantics and restore handling of override-only entries
  • Chores

    • Go toolchain bumped to 1.25.8 and workflow action updates

tis24dev and others added 19 commits February 27, 2026 15:30
- Add network_apply_additional_test.go with extensive tests for rollback (arm/disarm), NIC repair CLI (overrides/conflicts), snapshot IP parsing, command selection, and error paths.
- Use FakeFS/FakeCommandRunner plus PATH stubs to deterministically exercise both success and failure branches.
- Bring network_apply.go coverage to ~99% with no production code changes.
…164)

Bumps the actions-updates group with 2 updates in the / directory: [goreleaser/goreleaser-action](https://github.com/goreleaser/goreleaser-action) and [actions/attest-build-provenance](https://github.com/actions/attest-build-provenance).


Updates `goreleaser/goreleaser-action` from 6 to 7
- [Release notes](https://github.com/goreleaser/goreleaser-action/releases)
- [Commits](goreleaser/goreleaser-action@v6...v7)

Updates `actions/attest-build-provenance` from 3 to 4
- [Release notes](https://github.com/actions/attest-build-provenance/releases)
- [Changelog](https://github.com/actions/attest-build-provenance/blob/main/RELEASE.md)
- [Commits](actions/attest-build-provenance@v3...v4)

---
updated-dependencies:
- dependency-name: goreleaser/goreleaser-action
  dependency-version: '7'
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: actions-updates
- dependency-name: actions/attest-build-provenance
  dependency-version: '4'
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: actions-updates
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Make PVE datastore skip messages user-friendly: log disabled storages as SKIP,
offline storages as actionable WARNING, and always emit DEBUG skip details with
active/enabled/status and an explicit reason.
Fix detected datastores report header to match output column order.
Extend unit tests to cover runtime flag parsing and the new skip logging, and
make Makefile coverage targets honor the go.mod toolchain via GOTOOLCHAIN.
Adds per-endpoint opt-in Discord content text (WEBHOOK_<ENDPOINT>_DISCORD_CONTENT_ENABLED/..._DISCORD_CONTENT) to avoid “empty” messages when embeds aren’t rendered. Default remains unchanged (embed-only). Includes 2000-char truncation, docs/template updates, and tests.
Derive restore compatibility, cluster gating, and hostname warnings from archive-backed facts instead of candidate manifest metadata. Add restore archive analysis for internal backup metadata and category-based fallback, update restore planning and compatibility checks to consume trusted decision inputs, and add regression tests covering manifest spoofing for backup type and cluster mode.
Require checksum verification for staged backup archives before any restore or decrypt operation proceeds. Add strict SHA256 parsing and normalization, enforce checksum source agreement between sidecar and manifest metadata, reject raw backup candidates with no verifiable integrity source, centralize shared prepare-path verification for bundle and raw inputs, preserve source artifact checksum separately from the prepared plain archive checksum, and update regression tests to cover missing, mismatched, and legacy checksum scenarios.
Use Lstat in network staged apply so symlink entries are handled as symlinks instead of being dereferenced and copied as regular file contents. Reject symlinked staged network directories, extend the FS abstraction with Lstat, and add regression tests covering preserved symlinks and invalid symlinked stage roots.
…ink escapes

Replace the restore path safety checks with a shared FS-aware resolver that walks paths component by component using Lstat and Readlink, correctly handling missing tails and rejecting intermediate symlink escapes. Apply the new validation to regular restore, safety restore, symlink and hardlink handling, and add regression tests for broken escaping symlinks, symlink loops, and symlinked restore roots.
Introduce a shared path-safe key for PBS datastore names and PVE storage names when building collector output paths, preventing path traversal and filename collisions from raw config values. Keep raw names unchanged in metadata and command invocations, update all affected PBS/PVE collector call sites, and add regression tests covering unsafe names while preserving the existing layout for already-safe names.
Prevent unbounded goroutine accumulation in timeout/cancel wrappers by limiting in-flight safefs operations and reusing a single in-flight read per reader/file descriptor in the input package. Keep public APIs unchanged, preserve current timeout semantics, and add regression coverage for limiter saturation, repeated timeouts, and race-safe cleanup of timed operations.
…lidation

Teach the restore path resolver to distinguish security violations from local operational failures, so only real root-escape conditions are reported as illegal paths. Keep broken-symlink and traversal hardening intact, allow in-root permission and ENOTDIR failures to surface through the normal restore flow, and add regression tests covering the new security vs operational error classification.
- add explicit PBS datastore metadata for source, CLI identity and output key
- derive stable path-based output keys for PBS_DATASTORE_PATH overrides
- keep existing output names for real CLI-discovered PBS datastores
- skip datastore show/status CLI calls for override-only entries
- use filesystem-only namespace discovery for override paths
- keep PXAR outputs isolated for colliding override basenames
- preserve distinct override entries in PBS datastore inventory
- exclude override-only inventory entries from datastore.cfg fallback restore
- add regression coverage for basename collisions and restore safety
- update PBS_DATASTORE_PATH documentation to reflect scan-root semantics
…tive links

resolvePathRelativeToBaseWithinRootFS now canonicalizes baseDir by resolving existing symlinks before validating relative link targets. This closes the bypass where an archive could create an apparently harmless parent symlink and then install a relative symlink that escapes destRoot once materialized by the kernel.
Make safefs timeout tests failure-safe by draining blocked workers from cleanup and separating worker cleanup from global state restore. Also capture limiter and fs operation hooks locally in runLimited/Stat/ReadDir/Statfs to avoid races with global resets while async work is still in flight.
Keep completed in-flight line and password reads attached to their state until
a caller consumes the buffered result, instead of clearing the inflight state
from the producer goroutine.

This fixes the race where a read could complete after a timeout, be cleaned up
before the next retry started, and leave the completed input unreachable.
Add deterministic tests for both retry-while-pending and completion-before-retry
cases to lock the behavior down.
Add a PBS-specific output-key resolver to guarantee stable, unique datastore filenames and directories across auto-detected datastores and PBS_DATASTORE_PATH overrides. This fixes real collisions between CLI datastore names and path-derived override keys, makes pbsDatastore.pathKey() source-aware when OutputKey is unset, and keeps inventory output_key values aligned with the actual files written by PBS collectors. Includes regression tests for CLI-vs-override collisions, override fallback behavior, and inventory consistency.
The new tests verify that line and password inflight reads remain attached
after timeout until the completed result is reused, and that state.inflight
is cleared immediately after that consumer receives the result.
@github-actions
Copy link

github-actions bot commented Mar 10, 2026

Dependency Review

The following issues were found:
  • ✅ 0 vulnerable package(s)
  • ✅ 0 package(s) with incompatible licenses
  • ✅ 0 package(s) with invalid SPDX license definitions
  • ⚠️ 2 package(s) with unknown licenses.
See the Details below.

License Issues

.github/workflows/release.yml

PackageVersionLicenseIssue Type
actions/attest-build-provenance4.*.*NullUnknown License
goreleaser/goreleaser-action7.*.*NullUnknown License
Allowed Licenses: MIT, Apache-2.0, BSD-2-Clause, BSD-3-Clause, ISC, LicenseRef-scancode-google-patent-license-golang

OpenSSF Scorecard

PackageVersionScoreDetails
actions/actions/attest-build-provenance 4.*.* UnknownUnknown
actions/goreleaser/goreleaser-action 7.*.* 🟢 5
Details
CheckScoreReason
Code-Review🟢 3Found 5/16 approved changesets -- score normalized to 3
Binary-Artifacts🟢 10no binaries found in the repo
Maintained🟢 1016 commit(s) and 1 issue activity found in the last 90 days -- score normalized to 10
Dangerous-Workflow🟢 10no dangerous workflow patterns detected
Packaging⚠️ -1packaging workflow not detected
CII-Best-Practices⚠️ 0no effort to earn an OpenSSF best practices badge detected
Token-Permissions⚠️ 0detected GitHub workflow tokens with excessive permissions
License🟢 10license file detected
Fuzzing⚠️ 0project is not fuzzed
Signed-Releases⚠️ -1no releases found
Branch-Protection⚠️ -1internal error: error during branchesHandler.setup: internal error: some github tokens can't read classic branch protection rules: https://github.com/ossf/scorecard-action/blob/main/docs/authentication/fine-grained-auth-token.md
Security-Policy⚠️ 0security policy file not detected
SAST⚠️ 0SAST tool is not run on all commits -- score normalized to 0
Pinned-Dependencies🟢 5dependency not pinned by hash detected -- score normalized to 5

Scanned Files

  • .github/workflows/release.yml

@coderabbitai
Copy link

coderabbitai bot commented Mar 10, 2026

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

Adds extensive FS-aware path-security and symlink-safe restore logic, PBS datastore path normalization with deterministic output keys, archive-based restore analysis and decision flow, staged integrity verification and bundle preparation, per-reader input inflight tracking, safeFS operation limiting, numerous PBS/PVE collector updates, tests, and minor build/doc/toolchain bumps.

Changes

Cohort / File(s) Summary
Build & Tooling
\.github/workflows/release.yml, go.mod, Makefile
Toolchain bumped to go1.25.8; GitHub Actions versions updated; Makefile added COVER_GOTOOLCHAIN propagation for coverage commands.
Checksum utilities & tests
internal/backup/checksum.go, internal/backup/checksum_*.go
Added NormalizeChecksum and ParseChecksumData; VerifyChecksum and legacy loader updated to use normalization; tests updated to use valid SHA256 fixtures.
Collector path keys & PBS datastores
internal/backup/collector.go, internal/backup/collector_helpers_extra_test.go, internal/backup/collector_pbs*.go, internal/backup/collector_pbs_datastore*.go, internal/backup/collector_pbs_*tests.go
Introduces collectorPathKey and deterministic per-datastore OutputKey, path normalization, CLI vs override source handling, collision disambiguation, and many tests validating path-safe outputs and override behavior.
PVE collector changes & tests
internal/backup/collector_pve.go, internal/backup/collector_pve_*.go
Adds pveStorageEntry.pathKey(), refactors metadata/header format, improves skip logging, and uses pathKey for metadata and list filenames; tests for unsafe names and user-friendly skip reasons added.
PBS namespace helper
internal/pbs/namespaces.go, internal/pbs/namespaces_test.go
Exports DiscoverNamespacesFromFilesystem helper with test.
Restore path security module & usage
internal/orchestrator/path_security.go, internal/orchestrator/*_safety.go, internal/orchestrator/restore.go, internal/orchestrator/*restore*_test.go
New FS-aware path resolution with symlink hop limits and error taxonomy (security vs operational); replaced inlined resolution in restore and safety code to use resolvePathWithinRootFS / resolvePathRelativeToBaseWithinRootFS; extensive tests for symlink escape scenarios.
Restore decision & workflow changes
internal/orchestrator/restore_decision.go, internal/orchestrator/selective.go, internal/orchestrator/restore_workflow_ui.go, internal/orchestrator/restore_plan.go, internal/orchestrator/restore_*tests.go
Adds AnalyzeRestoreArchive to infer backup type/cluster/hostname from tar + metadata; PlanRestore signature now takes clusterBackup bool; restore workflow uses archive-derived RestoreDecisionInfo with fallback from manifest; tests added/updated.
Decrypt & integrity flow
internal/orchestrator/decrypt_prepare_common.go, internal/orchestrator/decrypt_integrity*.go, internal/orchestrator/decrypt*.go
New preparePlainBundleCommon and archiveDecryptFunc; staged integrity expectation resolution, verifyStagedArchiveIntegrity, and integration into decrypt preparation; TUI/CLI delegates to common helper; tests added.
Input concurrency & tests
internal/input/input.go, internal/input/input_test.go
Per-reader and per-FD in-flight read tracking; ReadLineWithContext/ReadPasswordWithContext reuse single in-flight goroutine, preserve cancellation semantics; comprehensive tests added.
Backup discovery integrity checks
internal/orchestrator/backup_sources.go, internal/orchestrator/backup_sources_test.go
Introduces integrityMissing/integrityUnavailable counters, manifest SHA256 field propagation, rclone/local checksum parsing and expectation logic, and tests enforcing checksum-based candidate skipping.
Network staging & symlink-aware overlay
internal/orchestrator/network_staged_apply.go, internal/orchestrator/network_staged_apply_test.go, internal/orchestrator/network_apply_additional_test.go
Refactors overlay copy to be root-aware, adds symlink-preserving overlay handlers, prohibits symlinked staged directories, and includes large test suite for apply/rollback/NIC workflows.
FS abstraction additions & safefs limiter
internal/orchestrator/deps.go, internal/orchestrator/deps_test.go, internal/safefs/safefs.go, internal/safefs/safefs_test.go
Adds Lstat to FS interface and test fakes; introduced operationLimiter and runLimited to bound concurrent filesystem ops for Stat/ReadDir/Statfs with tests ensuring capacity behavior.
PBS staged-apply & templates/docs
internal/orchestrator/pbs_staged_apply.go, internal/config/templates/backup.env, docs/CONFIGURATION.md, docs/RESTORE_GUIDE.md
PBS datastore inventory now carries Origin/CLIName; staged apply ignores override-origin entries when regenerating datastore.cfg; docs and template comments clarify PBS_DATASTORE_PATH are filesystem scan roots and may yield path-derived output keys.
Various tests & minor updates
multiple internal/*_test.go files
Large number of new/updated tests across orchestrator, backup, safefs, input, network, decrypt and restore packages to exercise new features and edge cases.

Sequence Diagram(s)

sequenceDiagram
    participant User as User
    participant Workflow as Restore Workflow
    participant Archive as Archive
    participant Analyzer as Archive Analyzer
    participant Decision as Decision Info
    participant Plan as Restore Plan

    User->>Workflow: Start restore with bundle
    Workflow->>Archive: Open prepared archive
    Workflow->>Analyzer: AnalyzeRestoreArchive(archivePath, logger)
    Analyzer->>Archive: Read tar entries & internal metadata
    Analyzer->>Decision: Build RestoreDecisionInfo (type, cluster, hostname)
    Decision-->>Analyzer: Return decision & categories
    Analyzer-->>Workflow: Return decision & available categories
    Workflow->>Plan: PlanRestore(decision.ClusterPayload, selectedCategories, systemType, mode)
    Plan-->>Workflow: Return restore plan
    Workflow->>User: Present prompts & proceed
Loading
sequenceDiagram
    participant Extract as Extract Operation
    participant SafeFS as Path Security FS
    participant Resolver as Path Resolver
    participant Validator as Symlink Validator

    Extract->>SafeFS: sanitizeRestoreEntryTargetWithFS(destRoot, entry)
    SafeFS->>Resolver: resolvePathWithinRootFS(destRoot, entry)
    Resolver->>Validator: Walk path components, detect symlinks
    Validator->>Validator: Resolve symlinks (hop limit)
    Validator->>Validator: Ensure resolved path inside root
    Validator-->>Resolver: Return resolved path or security/operational error
    Resolver-->>SafeFS: Return validation result
    SafeFS-->>Extract: Allow or reject extraction
Loading
sequenceDiagram
    participant Reader as Reader
    participant Input as Input Handler
    participant State as Inflight State Map
    participant Goroutine as Read Goroutine

    Reader->>Input: ReadLineWithContext(ctx)
    Input->>State: Check inflight for reader
    alt no existing inflight
        Input->>Goroutine: Spawn goroutine to perform ReadString
        Goroutine->>State: Deliver result via channel
        State->>Input: Store inflight op
    else reuse inflight
        Input->>State: Wait on existing inflight channel
    end
    Goroutine-->>Input: Return result/error
    Input->>State: Clear inflight
    Input-->>Reader: Return line or error
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Poem

🐰 I nibble paths, so safe and neat,
I hop through tar and checksum'd meat,
Keys for stores and symlinks tamed,
One read at once — no race proclaimed,
A rabbit cheers: your archives sleep, secure and sweet.

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 8.21% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title "Sync dev to main" is vague and generic, using non-descriptive terms that don't convey any meaningful information about the changeset's primary purpose or scope. Consider using a more descriptive title that highlights the main change or purpose, such as "Harden restore decision flow, file safety, and PBS datastore handling" or a similar summary of the key improvements.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch dev

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Nitpick comments (3)
internal/input/input.go (2)

90-106: Minor: Redundant Load before LoadOrStore.

The initial Load check is unnecessary since LoadOrStore performs an atomic load-or-store operation that handles both cases. This is a minor optimization opportunity.

🔧 Suggested simplification
 func getLineState(reader *bufio.Reader) *lineState {
-	if state, ok := lineStates.Load(reader); ok {
-		return state.(*lineState)
-	}
 	state := &lineState{}
 	actual, _ := lineStates.LoadOrStore(reader, state)
 	return actual.(*lineState)
 }

 func getPasswordState(fd int) *passwordState {
-	if state, ok := passwordStates.Load(fd); ok {
-		return state.(*passwordState)
-	}
 	state := &passwordState{}
 	actual, _ := passwordStates.LoadOrStore(fd, state)
 	return actual.(*passwordState)
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/input/input.go` around lines 90 - 106, Both getLineState and
getPasswordState perform an unnecessary initial Load before calling LoadOrStore;
remove the redundant Load and simply create the new state
(lineState/passwordState), call lineStates.LoadOrStore(reader, state) /
passwordStates.LoadOrStore(fd, state), then return the actual value cast to
*lineState or *passwordState respectively so the atomic load-or-store handles
both existing and new cases.

125-152: Mutex held during blocking select limits concurrent consumption.

The mutex is held across the entire select block, which means if multiple goroutines call ReadLineWithContext on the same reader concurrently, they will serialize on the mutex rather than all waiting on the same in-flight result. This appears intentional per the docstring (at-most-one in-flight read for sequential retries), but it's worth noting this design doesn't support multiple concurrent consumers efficiently.

Additionally, entries in lineStates are never removed after use. For long-running processes with many transient readers, this could lead to gradual memory accumulation. Consider adding a cleanup mechanism if readers are short-lived.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/input/input.go` around lines 125 - 152, The mutex (state.mu) is held
while blocking on the select, serializing concurrent callers; fix by
creating/assigning the lineInflight under lock but releasing state.mu before the
select so multiple goroutines can wait on the same inflight.done channel
concurrently (capture a local inflight variable while holding the lock, then
unlock). Also add cleanup to avoid unbounded growth of lineStates: when clearing
state.inflight (where currently you set state.inflight = nil after receiving
res), remove the corresponding entry from the container map (lineStates) if the
reader is transient so entries are not retained forever; do this under the same
mutex to avoid races (e.g., in the branch that sets state.inflight = nil, lock,
remove the map entry, then unlock).
internal/orchestrator/compatibility_test.go (1)

19-19: Make this test prove the new API is argument-driven.

Because the fake host is also set up as PVE, this still passes if ValidateCompatibility accidentally reads DetectCurrentSystem() instead of honoring the currentSystem argument. Passing a deliberately conflicting value here would guard the refactor much better.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/orchestrator/compatibility_test.go` at line 19, The test currently
passes because the fake host is PVE; update the call to ValidateCompatibility so
it supplies a deliberately conflicting currentSystem argument (e.g., pass
SystemTypePBS as the currentSystem while the host is set up as PVE) to prove the
API uses the passed arguments rather than calling DetectCurrentSystem(); modify
the test assertion accordingly to expect an error from
ValidateCompatibility(SystemTypePBS, SystemTypePVE) (or the inverse conflict) so
the refactor is guarded.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/backup/collector_pbs_datastore.go`:
- Around line 762-795: The current logic gates adding c.config.PBSDatastorePaths
on successful CLI enumeration (getDatastoreList) so overrides never get applied
when the proxmox-backup-manager CLI is missing or datastore list fails; change
the flow so CLI enumeration is best-effort (catch/log errors from
getDatastoreList but do not return early) and always run the override-appending
block that iterates c.config.PBSDatastorePaths (using normalizePBSDatastorePath,
buildPBSOverrideDisplayName, buildPBSOverrideOutputKey) and then call
assignUniquePBSDatastoreOutputKeys(datastores); alternatively move the
override-appending block out of any early-return path so overrides are appended
regardless of getDatastoreList outcome.

In `@internal/backup/collector_pbs.go`:
- Around line 350-353: The status snapshot filename is still generated from
ds.pathKey(), so assignUniquePBSDatastoreOutputKeys(datastores) has no effect
and later entries overwrite earlier ones; update the status snapshot generation
(where you currently call ds.pathKey()) to use the datastore's assigned output
key set by assignUniquePBSDatastoreOutputKeys (the field or accessor populated
by that function, e.g., datastore.OutputKey or datastore.outputKey()) when
creating the datastore_*_status.json filenames so collisions are avoided; apply
the same change to the other occurrences mentioned (lines ~400-403) and keep
clonePBSDatastores(datastores) + assignUniquePBSDatastoreOutputKeys(datastores)
as-is.

In `@internal/orchestrator/backup_sources.go`:
- Around line 263-267: The current guard only checks presence of checksum
strings (item.remoteChecksum and manifest.SHA256) but not their validity; update
the validation inside the candidate-filtering logic (the block referencing
item.remoteChecksum, manifest.SHA256, and logWarning) to normalize and parse the
checksum material before accepting a candidate: for manifest.SHA256,
trim/normalize and verify it is a valid 64-hex SHA256 string (reject and
increment integrityMissing/logWarning if malformed), and for local sidecar
checksum (item.remoteChecksum or wherever the .sha256 is read) parse/normalize
the value similarly (reject non-hex or wrong-length values); ensure the same
normalized checksum value and its source tag are stored on the candidate so
downstream staged integrity resolution only sees pre-validated checksums—apply
the identical validation logic to the other occurrence noted (lines around the
second block that checks these fields).

In `@internal/orchestrator/compatibility.go`:
- Around line 64-73: parseSystemTypeString currently only matches acronyms and
hyphenated names, causing inputs like "Proxmox VE" or "Proxmox Backup Server" to
return SystemTypeUnknown; update parseSystemTypeString to also detect
space-separated and full names by checking for strings.Contains(normalized,
"proxmox ve"), strings.Contains(normalized, "proxmox backup"), and
strings.Contains(normalized, "proxmox backup server") (and/or a generic "proxmox
backup" check) alongside the existing checks so SystemTypePVE and SystemTypePBS
are returned for those inputs; reference the parseSystemTypeString function and
the SystemTypePVE/SystemTypePBS constants when making the change.

In `@internal/orchestrator/network_staged_apply.go`:
- Around line 185-210: The code currently replays symlink targets verbatim
(restoreFS.Readlink -> restoreFS.Symlink) which can escape the real destination
subtree; change the logic to resolve and validate the link target before
creating it: after reading target from restoreFS.Readlink and before
restoreFS.Symlink, compute the effective absolute path the link would point to
when created under filepath.Dir(dest) (e.g., resolved :=
filepath.Clean(filepath.Join(filepath.Dir(dest), target)) and optionally
EvalSymlinks on the destination FS), ensure that this resolved path is inside
the allowed destination subtree root (reject with an error if
!strings.HasPrefix(resolved, destRoot+string(os.PathSeparator)) or equivalent
safe containment check), and for cases where the original target is absolute or
would escape, either rewrite the target to a safe relative target under the
destination subtree or return an error; apply these checks around the existing
functions (restoreFS.Readlink, ensureDirExistsWithInheritedMeta,
restoreFS.Lstat, restoreFS.Remove, restoreFS.Symlink) so Symlink is only called
with a validated/rewritten target.

In `@internal/orchestrator/restore_decision.go`:
- Around line 139-149: The code reads backup_metadata.txt with
io.ReadAll(tarReader) which can allocate unbounded memory; replace that call by
reading through an io.LimitedReader (or io.LimitReader) with a small configured
max bytes (e.g., a few KB) before calling parseRestoreDecisionMetadata, and if
the reader hits the limit treat it as invalid metadata (set metadataErr and
continue or return as the surrounding logic requires). Update the logic around
parseRestoreDecisionMetadata, metadataErr, and metadata to ensure overflowing
reads are rejected consistently and do not permit parsing of truncated/partial
data.

In `@internal/orchestrator/restore_workflow_ui.go`:
- Around line 79-84: When AnalyzeRestoreArchive returns an error, do not return
early into runFullRestoreWithUI; instead log the error (as already done) but
continue through the existing compatibility and cluster/PBS guard flow by
invoking ValidateCompatibility and the decision-driven checks before any
fallback. Modify the block around AnalyzeRestoreArchive so that on error you set
availableCategories/decisionInfo to nil or an appropriate default, then proceed
to call ValidateCompatibility and the later guard/decision logic, and only call
runFullRestoreWithUI as the final fallback after those checks pass or explicitly
allow it.

In `@internal/safefs/safefs.go`:
- Around line 97-105: The code calls effectiveTimeout and treats timeout==0 as
"no timeout", which also happens when the context deadline already elapsed;
update runLimited to distinguish these cases by checking ctx.Err() after
computing effectiveTimeout: if timeout==0 and ctx.Err()!=nil then return the
zero value and the provided TimeoutError (or wrap/convert ctx.Err() into a
TimeoutError) instead of calling run(), otherwise proceed with no-timeout path.
Reference runLimited, effectiveTimeout, TimeoutError and the run func when
implementing this check.

---

Nitpick comments:
In `@internal/input/input.go`:
- Around line 90-106: Both getLineState and getPasswordState perform an
unnecessary initial Load before calling LoadOrStore; remove the redundant Load
and simply create the new state (lineState/passwordState), call
lineStates.LoadOrStore(reader, state) / passwordStates.LoadOrStore(fd, state),
then return the actual value cast to *lineState or *passwordState respectively
so the atomic load-or-store handles both existing and new cases.
- Around line 125-152: The mutex (state.mu) is held while blocking on the
select, serializing concurrent callers; fix by creating/assigning the
lineInflight under lock but releasing state.mu before the select so multiple
goroutines can wait on the same inflight.done channel concurrently (capture a
local inflight variable while holding the lock, then unlock). Also add cleanup
to avoid unbounded growth of lineStates: when clearing state.inflight (where
currently you set state.inflight = nil after receiving res), remove the
corresponding entry from the container map (lineStates) if the reader is
transient so entries are not retained forever; do this under the same mutex to
avoid races (e.g., in the branch that sets state.inflight = nil, lock, remove
the map entry, then unlock).

In `@internal/orchestrator/compatibility_test.go`:
- Line 19: The test currently passes because the fake host is PVE; update the
call to ValidateCompatibility so it supplies a deliberately conflicting
currentSystem argument (e.g., pass SystemTypePBS as the currentSystem while the
host is set up as PVE) to prove the API uses the passed arguments rather than
calling DetectCurrentSystem(); modify the test assertion accordingly to expect
an error from ValidateCompatibility(SystemTypePBS, SystemTypePVE) (or the
inverse conflict) so the refactor is guarded.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 831216f9-eb91-4ba0-a25b-53094afa0e68

📥 Commits

Reviewing files that changed from the base of the PR and between b7d9670 and fbec12f.

📒 Files selected for processing (63)
  • .github/workflows/release.yml
  • Makefile
  • docs/CONFIGURATION.md
  • docs/RESTORE_GUIDE.md
  • go.mod
  • internal/backup/checksum.go
  • internal/backup/checksum_legacy_test.go
  • internal/backup/checksum_test.go
  • internal/backup/collector.go
  • internal/backup/collector_helpers_extra_test.go
  • internal/backup/collector_pbs.go
  • internal/backup/collector_pbs_commands_coverage_test.go
  • internal/backup/collector_pbs_datastore.go
  • internal/backup/collector_pbs_datastore_inventory.go
  • internal/backup/collector_pbs_datastore_inventory_test.go
  • internal/backup/collector_pbs_test.go
  • internal/backup/collector_pve.go
  • internal/backup/collector_pve_parse_test.go
  • internal/backup/collector_pve_test.go
  • internal/config/templates/backup.env
  • internal/input/input.go
  • internal/input/input_test.go
  • internal/orchestrator/backup_safety.go
  • internal/orchestrator/backup_safety_test.go
  • internal/orchestrator/backup_sources.go
  • internal/orchestrator/backup_sources_test.go
  • internal/orchestrator/compatibility.go
  • internal/orchestrator/compatibility_test.go
  • internal/orchestrator/decrypt.go
  • internal/orchestrator/decrypt_integrity.go
  • internal/orchestrator/decrypt_integrity_test.go
  • internal/orchestrator/decrypt_integrity_test_helpers_test.go
  • internal/orchestrator/decrypt_prepare_common.go
  • internal/orchestrator/decrypt_test.go
  • internal/orchestrator/decrypt_tui.go
  • internal/orchestrator/decrypt_tui_test.go
  • internal/orchestrator/decrypt_workflow_test.go
  • internal/orchestrator/decrypt_workflow_ui.go
  • internal/orchestrator/deps.go
  • internal/orchestrator/deps_test.go
  • internal/orchestrator/network_apply_additional_test.go
  • internal/orchestrator/network_staged_apply.go
  • internal/orchestrator/network_staged_apply_test.go
  • internal/orchestrator/path_security.go
  • internal/orchestrator/path_security_test.go
  • internal/orchestrator/pbs_staged_apply.go
  • internal/orchestrator/pbs_staged_apply_additional_test.go
  • internal/orchestrator/restore.go
  • internal/orchestrator/restore_decision.go
  • internal/orchestrator/restore_decision_test.go
  • internal/orchestrator/restore_errors_test.go
  • internal/orchestrator/restore_plan.go
  • internal/orchestrator/restore_plan_test.go
  • internal/orchestrator/restore_test.go
  • internal/orchestrator/restore_workflow_decision_test.go
  • internal/orchestrator/restore_workflow_more_test.go
  • internal/orchestrator/restore_workflow_ui.go
  • internal/orchestrator/restore_workflow_ui_helpers_test.go
  • internal/orchestrator/selective.go
  • internal/pbs/namespaces.go
  • internal/pbs/namespaces_test.go
  • internal/safefs/safefs.go
  • internal/safefs/safefs_test.go

Comment on lines 762 to +795
if len(c.config.PBSDatastorePaths) > 0 {
existing := make(map[string]struct{}, len(datastores))
for _, ds := range datastores {
if ds.Path != "" {
existing[ds.Path] = struct{}{}
if normalized := ds.normalizedPath(); normalized != "" {
existing[normalized] = struct{}{}
}
}
validName := regexp.MustCompile(`^[a-zA-Z0-9_-]+$`)
for idx, override := range c.config.PBSDatastorePaths {
override = strings.TrimSpace(override)
if override == "" {
continue
}
if _, ok := existing[override]; ok {
normalized := normalizePBSDatastorePath(override)
if normalized == "" {
continue
}
name := filepath.Base(filepath.Clean(override))
if name == "" || name == "." || name == string(os.PathSeparator) || !validName.MatchString(name) {
name = fmt.Sprintf("datastore_%d", idx+1)
if _, ok := existing[normalized]; ok {
continue
}
existing[normalized] = struct{}{}
name := buildPBSOverrideDisplayName(normalized, idx)
datastores = append(datastores, pbsDatastore{
Name: name,
Path: override,
Comment: "configured via PBS_DATASTORE_PATH",
Name: name,
Path: override,
Comment: "configured via PBS_DATASTORE_PATH",
Source: pbsDatastoreSourceOverride,
NormalizedPath: normalized,
OutputKey: buildPBSOverrideOutputKey(normalized),
})
}
}

assignUniquePBSDatastoreOutputKeys(datastores)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't gate override scan roots on proxmox-backup-manager.

The override entries added here are never reached when the CLI is missing or datastore list fails, because getDatastoreList returns earlier. That breaks the override-only namespace/PXAR discovery path on exactly the hosts this fallback is meant to support. Make CLI enumeration best-effort, then append c.config.PBSDatastorePaths regardless.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/backup/collector_pbs_datastore.go` around lines 762 - 795, The
current logic gates adding c.config.PBSDatastorePaths on successful CLI
enumeration (getDatastoreList) so overrides never get applied when the
proxmox-backup-manager CLI is missing or datastore list fails; change the flow
so CLI enumeration is best-effort (catch/log errors from getDatastoreList but do
not return early) and always run the override-appending block that iterates
c.config.PBSDatastorePaths (using normalizePBSDatastorePath,
buildPBSOverrideDisplayName, buildPBSOverrideOutputKey) and then call
assignUniquePBSDatastoreOutputKeys(datastores); alternatively move the
override-appending block out of any early-return path so overrides are appended
regardless of getDatastoreList outcome.

Comment on lines +350 to +353
if len(datastores) > 0 {
datastores = clonePBSDatastores(datastores)
assignUniquePBSDatastoreOutputKeys(datastores)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use the assigned datastore output key for status snapshots.

assignUniquePBSDatastoreOutputKeys(datastores) is effectively ignored here because the status filename is still derived from ds.pathKey(). If two entries collapse to the same path key, the later datastore_*_status.json overwrites the earlier one, which drops datastore metadata and defeats the collision handling this PR adds.

Also applies to: 400-403

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/backup/collector_pbs.go` around lines 350 - 353, The status snapshot
filename is still generated from ds.pathKey(), so
assignUniquePBSDatastoreOutputKeys(datastores) has no effect and later entries
overwrite earlier ones; update the status snapshot generation (where you
currently call ds.pathKey()) to use the datastore's assigned output key set by
assignUniquePBSDatastoreOutputKeys (the field or accessor populated by that
function, e.g., datastore.OutputKey or datastore.outputKey()) when creating the
datastore_*_status.json filenames so collisions are avoided; apply the same
change to the other occurrences mentioned (lines ~400-403) and keep
clonePBSDatastores(datastores) + assignUniquePBSDatastoreOutputKeys(datastores)
as-is.

Comment on lines +79 to +84
availableCategories, decisionInfo, err := AnalyzeRestoreArchive(prepared.ArchivePath, logger)
if err != nil {
logger.Warning("Could not analyze categories: %v", err)
logger.Info("Falling back to full restore mode")
return runFullRestoreWithUI(ctx, ui, candidate, prepared, destRoot, logger, cfg.DryRun)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Don't bypass compatibility and cluster safeguards on analysis failure.

Line 83 returns runFullRestoreWithUI immediately. That path never calls ValidateCompatibility and skips the later decision-driven cluster/PBS guard flow in this function, so a partially unreadable archive can fall straight into a raw full restore on the wrong system or with live services still running. Please keep the best-effort compatibility and cluster checks before offering this fallback.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/orchestrator/restore_workflow_ui.go` around lines 79 - 84, When
AnalyzeRestoreArchive returns an error, do not return early into
runFullRestoreWithUI; instead log the error (as already done) but continue
through the existing compatibility and cluster/PBS guard flow by invoking
ValidateCompatibility and the decision-driven checks before any fallback. Modify
the block around AnalyzeRestoreArchive so that on error you set
availableCategories/decisionInfo to nil or an appropriate default, then proceed
to call ValidateCompatibility and the later guard/decision logic, and only call
runFullRestoreWithUI as the final fallback after those checks pass or explicitly
allow it.

update runLimited to re-check ctx.Err() after effectiveTimeout()
avoid calling run() when effectiveTimeout() returns 0 because the context deadline already elapsed
preserve the no-timeout path only for genuinely disabled timeouts
add a regression test for the expired-deadline edge case in internal/safefs
When restore archive analysis fails in the UI workflow, do not jump
immediately to full restore.
Build a best-effort RestoreDecisionInfo from the manifest, run the
existing compatibility check first, and only then fall back to
runFullRestoreWithUI.
Also add a regression test covering the analysis-error path to ensure
the compatibility warning is still shown before the full restore
fallback executes.
Reject oversized backup_metadata.txt entries during restore archive inspection by reading through a fixed limit before parsing. This prevents unbounded memory allocation from crafted archives and ensures truncated metadata caused by limit overflow is treated as invalid rather than partially trusted. Add regression tests covering oversized metadata rejection and the public restore analysis path.
Validate symlink targets during staged network apply before recreating them
under the destination tree. Reuse the existing path-security resolver to
ensure targets stay within the allowed destination root, rewrite safe
absolute targets to relative links, and reject absolute or relative targets
that escape the subtree. Add regression tests for safe rewrites and escaping
symlink targets.
Validate symlink targets in syncDirExact before recreating them under the
destination subtree. Reuse the existing path-security resolver to ensure
firewall and SDN staged symlinks stay within the allowed destination root,
rewrite safe absolute targets to relative links, and reject absolute or
relative targets that escape the subtree. Add regression tests for safe
rewrites and escaping symlink targets.
Verify and fix parseSystemTypeString in compatibility.go.

The parser now recognizes space-separated and full-name variants such as
"Proxmox VE", "Proxmox Backup", and "Proxmox Backup Server", in addition
to the existing acronym and hyphenated forms.

Add regression tests to ensure these values map correctly to
SystemTypePVE and SystemTypePBS.
Validate and normalize manifest and sidecar SHA256 values before
accepting local or rclone raw backup candidates.

Reject candidates when the manifest checksum is malformed, the .sha256
sidecar cannot be parsed, or both checksum sources disagree. Persist the
normalized checksum and its source on the candidate so downstream staged
integrity verification only works with pre-validated values.

Add coverage for local and rclone discovery, malformed and conflicting
checksums, and reuse of the candidate integrity expectation during
staged verification.
Adds a regression test for collectPBSCommands() covering colliding PBS datastore output keys. The test verifies that assignUniquePBSDatastoreOutputKeys() disambiguates colliding datastore status filenames and that two distinct datastore_*_status.json files are written with the correct datastore-specific content.
Make PBS datastore enumeration best-effort in getDatastoreList so configured PBSDatastorePaths overrides are still appended when proxmox-backup-manager is missing, the datastore list command fails, or its JSON output is invalid. Preserve context cancellation as a fatal error, keep unique output key assignment, and add regression tests for missing CLI, command failure, and parse failure fallback cases.
@tis24dev tis24dev closed this Mar 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant