Skip to content

fix(backups): preserve partial backups when qB export crashes#1573

Closed
s0up4200 wants to merge 4 commits intodevelopfrom
fix/backup-partial-export-warnings
Closed

fix(backups): preserve partial backups when qB export crashes#1573
s0up4200 wants to merge 4 commits intodevelopfrom
fix/backup-partial-export-warnings

Conversation

@s0up4200
Copy link
Copy Markdown
Collaborator

@s0up4200 s0up4200 commented Mar 8, 2026

Keep usable backup data when qBittorrent export hangs or drops mid-run: record skipped torrents as manifest warnings, finish the run as success with warnings when at least one torrent was exported, and surface that state in notifications/UI. Context: qui discussion #1554, qBittorrent issue #18185, qBittorrent issue #23907.

Summary by CodeRabbit

  • New Features

    • Backups can complete as partial runs when an instance crashes or is unavailable, preserving exported torrents and marking the run as completed with warnings.
    • Manifests and backup history now list skipped torrents with human-readable reasons; backup view shows a Warnings count and detailed list.
    • Notifications for successful backups show "completed with warnings" when applicable.
  • Documentation

    • Added docs describing partial backup behavior and how warnings are presented.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 8, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: e8e4a92f-4fa7-446d-bc55-836d61e40735

📥 Commits

Reviewing files that changed from the base of the PR and between 4642faa and e2e99ef.

📒 Files selected for processing (3)
  • internal/backups/service.go
  • internal/backups/service_partial_test.go
  • web/src/pages/InstanceBackups.tsx
🚧 Files skipped from review as they are similar to previous changes (1)
  • web/src/pages/InstanceBackups.tsx

Walkthrough

Adds partial-backup resilience: classify export failures (fatal / metadata unavailable / recoverable), record recoverable failures as per-torrent manifest warnings instead of aborting, persist on-disk manifests with DB fallback, surface warning summaries in notifications, and display warnings in the frontend.

Changes

Cohort / File(s) Summary
Docs
documentation/docs/features/backups.md
Added "Partial Backups" section describing interrupted exports, partial run recording, and manifest warning semantics.
Backend: backup service
internal/backups/service.go
Added exportFailureKind enum and classifier, ManifestWarning type and Warnings field on Manifest, plumbing for warningSummary/warnings in backupResult, helpers (classify/simplify/isRecoverable/buildSummary), manifest-on-disk loading with DB fallback, and DI hooks (list/export/probe/getCategories/getTags/getWebAPIVersion/now).
Backend: tests
internal/backups/service_export_test.go, internal/backups/service_partial_test.go
Added unit tests for classifyExportFailure and comprehensive partial-backup scenarios (recoverable failures, instance death timing, manifest file vs DB fallback, and generated warnings).
Backend: notifications
internal/services/notifications/service.go
EventBackupSucceeded title and details now reflect warnings when ErrorMessage present (title becomes "completed with warnings" and Warnings line appended).
Frontend / Types
web/src/pages/InstanceBackups.tsx, web/src/types/index.ts
Added BackupManifestWarning type and optional warnings on BackupManifest; UI shows warning-aware status badge, warnings count, and renders a Partial backup warnings Alert listing each skipped torrent and reason.

Sequence Diagram

sequenceDiagram
    participant Client as Web UI
    participant Service as Backup Service
    participant qBT as qBittorrent API
    participant Store as Manifest store
    participant Notify as Notifications

    Client->>Service: Start backup run
    Service->>qBT: probeInstance / listTorrents
    loop per torrent
        Service->>qBT: exportTorrent(hash)
        alt export succeeds
            qBT-->>Service: exported data
            Service->>Store: add item to manifest
        else recoverable failure
            qBT-->>Service: transient/network error
            Service->>Service: classify as recoverable
            Service->>Store: add ManifestWarning (hash, name, reason)
            Service->>Service: continue loop
        else fatal / metadata-unavailable
            qBT-->>Service: fatal or metadata-unavailable error
            Service->>Service: classify -> abort or mark metadata-unavailable
            Service-->>Client: stop run (may be partial)
        end
    end
    Service->>Store: finalize manifest (items + warnings + summary)
    Service->>Notify: emit EventBackupSucceeded (include warning summary if present)
    Notify-->>Client: display "completed" or "completed with warnings"
    Client->>Store: fetch manifest -> render items and warnings list
Loading

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly Related PRs

Suggested Labels

bugfix, backup & restore, area/backend, area/frontend, web

Poem

🐰 Exports tripped, some torrents stayed behind,
I hop and note each hash and name I find.
Warnings tucked into the manifest trail,
Partial saves endure where full runs fail.
A carrot for retries — steady and kind.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title accurately and specifically summarizes the main change: implementing preservation of partial backups when qBittorrent export fails, which is the core objective of the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/backup-partial-export-warnings

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
internal/backups/service.go (1)

746-795: ⚠️ Potential issue | 🟠 Major

Don’t stop iterating once the instance probe fails.

break backupLoop drops every remaining torrent, including later entries that already have a cached blob and no longer need qBittorrent. That means a mid-run crash can still discard usable backup data, and the run can fail even when recoverable cached exports exist later in the slice.

💡 Possible direction
- backupLoop:
+ instanceUnavailable := false
  for idx, torrent := range torrents {
    ...
-   if data == nil {
+   if data == nil {
+     if instanceUnavailable {
+       manifestWarnings = append(manifestWarnings, ManifestWarning{
+         Hash:   torrent.Hash,
+         Name:   torrent.Name,
+         Reason: "qBittorrent became unavailable before export",
+       })
+       s.updateProgress(j.runID, idx+1)
+       continue
+     }
+
      if err := waitForExportThrottle(ctx, exportThrottle); err != nil {
        return nil, err
      }
      ...
      if err != nil {
        switch classifyExportFailure(err) {
        ...
        case exportFailureRecoverable:
          ...
          if probeErr := s.probeInstance(ctx, j.instanceID); probeErr != nil {
-           if len(manifestItems) == 0 {
-             return nil, fmt.Errorf("backup aborted after qBittorrent export failure and instance became unavailable: %w", err)
-           }
-
-           ...
-           break backupLoop
+           instanceUnavailable = true
+           continue
          }
          continue
        }
      }
    }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/backups/service.go` around lines 746 - 795, The current code calls
break backupLoop when s.probeInstance(ctx, j.instanceID) fails, which aborts
processing of later torrents (even those that can be satisfied from cache);
instead, stop aborting and allow the loop to continue: replace the break
backupLoop path with logic that marks the instance as unavailable (e.g., set a
local bool like instanceAvailable = false) and continue the for-loop so
subsequent iterations still check cached blobs without calling s.exportTorrent;
update places that call s.exportTorrent to short-circuit when instanceAvailable
is false, and keep the same logging/error behavior when manifestItems is empty
(return if no partial backup) while preserving the existing
manifestWarnings/updates.
🧹 Nitpick comments (2)
internal/backups/service_export_test.go (1)

29-41: Cover the real deadline error path here.

Using errors.New("context deadline exceeded") only exercises the substring fallback. It won’t catch regressions in the errors.Is(context.DeadlineExceeded) branch or wrapped deadline errors, which are the cases production code is most likely to see.

💡 Suggested rewrite
+import (
+	"context"
+	"fmt"
+)
+
 func TestClassifyExportFailure(t *testing.T) {
-	if kind := classifyExportFailure(errors.New("context deadline exceeded")); kind != exportFailureRecoverable {
-		t.Fatalf("expected deadline exceeded to be recoverable, got %v", kind)
-	}
-
-	if kind := classifyExportFailure(qbt.ErrTorrentMetdataNotDownloadedYet); kind != exportFailureMetadataUnavailable {
-		t.Fatalf("expected metadata-not-downloaded to be classified as metadata unavailable, got %v", kind)
-	}
-
-	if kind := classifyExportFailure(errors.New("status code: 400: bad request")); kind != exportFailureFatal {
-		t.Fatalf("expected 400 to be fatal, got %v", kind)
-	}
+	t.Parallel()
+
+	tests := []struct {
+		name string
+		err  error
+		want exportFailureKind
+	}{
+		{name: "deadline exceeded", err: context.DeadlineExceeded, want: exportFailureRecoverable},
+		{name: "wrapped deadline exceeded", err: fmt.Errorf("export: %w", context.DeadlineExceeded), want: exportFailureRecoverable},
+		{name: "metadata unavailable", err: qbt.ErrTorrentMetdataNotDownloadedYet, want: exportFailureMetadataUnavailable},
+		{name: "400 is fatal", err: errors.New("status code: 400: bad request"), want: exportFailureFatal},
+	}
+
+	for _, tt := range tests {
+		t.Run(tt.name, func(t *testing.T) {
+			if kind := classifyExportFailure(tt.err); kind != tt.want {
+				t.Fatalf("expected %v, got %v", tt.want, kind)
+			}
+		})
+	}
 }
As per coding guidelines "Prefer table-driven test cases in Go tests".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/backups/service_export_test.go` around lines 29 - 41, The test
TestClassifyExportFailure uses a plain errors.New("context deadline exceeded")
which only exercises string matching; change it to a table-driven test that
includes at least: context.DeadlineExceeded (use errors.Is path), a wrapped
deadline error (e.g., fmt.Errorf("wrap: %w", context.DeadlineExceeded)) to
ensure errors.Is works, the qbt.ErrTorrentMetdataNotDownloadedYet case, and the
400 response string case; update assertions to iterate the table and compare
expected exportFailure* results for the classifyExportFailure function.
internal/backups/service_partial_test.go (1)

194-199: This helper still pays the production export throttle.

NewService rewrites non-positive ExportThrottle values to the 200ms default, so passing 0 here doesn’t actually disable the ticker. These tests will sleep once per export and get slower as more cases are added.

💡 Suggested fix
  svc := NewService(store, &qb.SyncManager{}, nil, Config{
  	DataDir:        t.TempDir(),
  	WorkerCount:    1,
  	ExportThrottle: 0,
  }, nil)
+ svc.cfg.ExportThrottle = 0
  svc.now = func() time.Time { return time.Unix(1_700_000_000, 0).UTC() }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/backups/service_partial_test.go` around lines 194 - 199, The test
currently passes ExportThrottle: 0 to NewService, but NewService treats
non-positive ExportThrottle as the 200ms default so the test still sleeps;
change the test to use a large positive duration (e.g., ExportThrottle:
time.Hour) in the Config passed to NewService so the export ticker won't fire
during the test (refer to NewService, Config.ExportThrottle and svc.now when
updating the test).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@web/src/pages/InstanceBackups.tsx`:
- Around line 1748-1758: The badge currently always renders run.status (e.g.,
"success") even when hasBackupWarnings(run) is true; change the Badge content so
that when hasBackupWarnings(run) and run.status === "success" you render an
explicit warning label such as "success (warnings)" or "success with warnings"
instead of run.status, while keeping the amber styling logic intact—update the
Badge usage in InstanceBackups.tsx (the Badge element that uses
statusVariants[run.status] and hasBackupWarnings(run)) to compute the displayed
text conditionally and ensure the fallback for non-warning statuses still
renders run.status; also update any nearby text consumers (e.g., the paragraph
showing run.errorMessage) if they rely on the badge label.

---

Outside diff comments:
In `@internal/backups/service.go`:
- Around line 746-795: The current code calls break backupLoop when
s.probeInstance(ctx, j.instanceID) fails, which aborts processing of later
torrents (even those that can be satisfied from cache); instead, stop aborting
and allow the loop to continue: replace the break backupLoop path with logic
that marks the instance as unavailable (e.g., set a local bool like
instanceAvailable = false) and continue the for-loop so subsequent iterations
still check cached blobs without calling s.exportTorrent; update places that
call s.exportTorrent to short-circuit when instanceAvailable is false, and keep
the same logging/error behavior when manifestItems is empty (return if no
partial backup) while preserving the existing manifestWarnings/updates.

---

Nitpick comments:
In `@internal/backups/service_export_test.go`:
- Around line 29-41: The test TestClassifyExportFailure uses a plain
errors.New("context deadline exceeded") which only exercises string matching;
change it to a table-driven test that includes at least:
context.DeadlineExceeded (use errors.Is path), a wrapped deadline error (e.g.,
fmt.Errorf("wrap: %w", context.DeadlineExceeded)) to ensure errors.Is works, the
qbt.ErrTorrentMetdataNotDownloadedYet case, and the 400 response string case;
update assertions to iterate the table and compare expected exportFailure*
results for the classifyExportFailure function.

In `@internal/backups/service_partial_test.go`:
- Around line 194-199: The test currently passes ExportThrottle: 0 to
NewService, but NewService treats non-positive ExportThrottle as the 200ms
default so the test still sleeps; change the test to use a large positive
duration (e.g., ExportThrottle: time.Hour) in the Config passed to NewService so
the export ticker won't fire during the test (refer to NewService,
Config.ExportThrottle and svc.now when updating the test).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: b1b9659f-274f-4613-9b98-fdcf67258c62

📥 Commits

Reviewing files that changed from the base of the PR and between 7aadde7 and 474fa8c.

📒 Files selected for processing (7)
  • documentation/docs/features/backups.md
  • internal/backups/service.go
  • internal/backups/service_export_test.go
  • internal/backups/service_partial_test.go
  • internal/services/notifications/service.go
  • web/src/pages/InstanceBackups.tsx
  • web/src/types/index.ts

Comment thread web/src/pages/InstanceBackups.tsx
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
internal/backups/service.go (1)

1114-1134: Overly broad "eof" substring match may cause false positives.

Line 1129 checks for "eof" which is too permissive—it would match unrelated words like "geoffrey" or any error message containing those letters. The check is also redundant since:

  1. Line 1118 already handles io.EOF and io.ErrUnexpectedEOF directly
  2. Line 1128 already checks "unexpected eof"
♻️ Proposed fix: remove redundant and overly broad check
 	message := strings.ToLower(err.Error())
 	return strings.Contains(message, "timeout") ||
 		strings.Contains(message, "deadline exceeded") ||
 		strings.Contains(message, "connection reset by peer") ||
 		strings.Contains(message, "connection refused") ||
 		strings.Contains(message, "broken pipe") ||
 		strings.Contains(message, "unexpected eof") ||
-		strings.Contains(message, "eof") ||
 		strings.Contains(message, "status code: 500") ||
 		strings.Contains(message, "status code: 502") ||
 		strings.Contains(message, "status code: 503") ||
 		strings.Contains(message, "status code: 504")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/backups/service.go` around lines 1114 - 1134, The
isRecoverableExportFailure function currently treats any error message
containing the substring "eof" as recoverable which is overly broad and
redundant (io.EOF/io.ErrUnexpectedEOF are already handled and "unexpected eof"
is explicitly checked); remove the standalone strings.Contains(message, "eof")
check from isRecoverableExportFailure so only the specific checks (errors.Is for
io.EOF/io.ErrUnexpectedEOF and the "unexpected eof" substring) remain,
preventing false positives while preserving intended behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/backups/service_partial_test.go`:
- Around line 209-232: The helper seedCachedTorrent has its context parameter
out of order and uses insecure file permissions; change the function signature
to accept ctx context.Context as the first parameter (seedCachedTorrent(t
*testing.T, ctx context.Context, svc *Service, store *models.BackupStore,
instanceID int, hash string, data string)), update every call site to pass ctx
as the first argument, and reduce the os.WriteFile permission from 0o644 to a
more restrictive mode such as 0o600 (keep os.MkdirAll at 0o755 unless otherwise
required) so the call to os.WriteFile uses 0o600 to satisfy gosec G306.

In `@web/src/pages/InstanceBackups.tsx`:
- Around line 1749-1761: The "Last backup" summary currently prints raw Status:
{lastRun.status}; replace that with the same presentation logic used in the
table by using backupStatusLabel(lastRun) and statusVariants[lastRun.status]
(and render the warning message when hasBackupWarnings(lastRun) is true) so
partial-success/warning-backed successes show the same label and warning text as
in the detailed row; update the Last backup card to use backupStatusLabel,
statusVariants, and hasBackupWarnings with lastRun (and show
lastRun.errorMessage when applicable).

---

Nitpick comments:
In `@internal/backups/service.go`:
- Around line 1114-1134: The isRecoverableExportFailure function currently
treats any error message containing the substring "eof" as recoverable which is
overly broad and redundant (io.EOF/io.ErrUnexpectedEOF are already handled and
"unexpected eof" is explicitly checked); remove the standalone
strings.Contains(message, "eof") check from isRecoverableExportFailure so only
the specific checks (errors.Is for io.EOF/io.ErrUnexpectedEOF and the
"unexpected eof" substring) remain, preventing false positives while preserving
intended behavior.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: f60e0258-9bb6-4fcd-952c-fb9d6830bcf6

📥 Commits

Reviewing files that changed from the base of the PR and between ea4819f and 4642faa.

📒 Files selected for processing (4)
  • internal/backups/service.go
  • internal/backups/service_export_test.go
  • internal/backups/service_partial_test.go
  • web/src/pages/InstanceBackups.tsx
🚧 Files skipped from review as they are similar to previous changes (1)
  • internal/backups/service_export_test.go

Comment thread internal/backups/service_partial_test.go Outdated
Comment thread web/src/pages/InstanceBackups.tsx
@s0up4200 s0up4200 closed this Apr 3, 2026
@s0up4200 s0up4200 deleted the fix/backup-partial-export-warnings branch April 3, 2026 14:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant