fix(backups): preserve partial backups when qB export crashes#1573
fix(backups): preserve partial backups when qB export crashes#1573
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
🚧 Files skipped from review as they are similar to previous changes (1)
WalkthroughAdds partial-backup resilience: classify export failures (fatal / metadata unavailable / recoverable), record recoverable failures as per-torrent manifest warnings instead of aborting, persist on-disk manifests with DB fallback, surface warning summaries in notifications, and display warnings in the frontend. Changes
Sequence DiagramsequenceDiagram
participant Client as Web UI
participant Service as Backup Service
participant qBT as qBittorrent API
participant Store as Manifest store
participant Notify as Notifications
Client->>Service: Start backup run
Service->>qBT: probeInstance / listTorrents
loop per torrent
Service->>qBT: exportTorrent(hash)
alt export succeeds
qBT-->>Service: exported data
Service->>Store: add item to manifest
else recoverable failure
qBT-->>Service: transient/network error
Service->>Service: classify as recoverable
Service->>Store: add ManifestWarning (hash, name, reason)
Service->>Service: continue loop
else fatal / metadata-unavailable
qBT-->>Service: fatal or metadata-unavailable error
Service->>Service: classify -> abort or mark metadata-unavailable
Service-->>Client: stop run (may be partial)
end
end
Service->>Store: finalize manifest (items + warnings + summary)
Service->>Notify: emit EventBackupSucceeded (include warning summary if present)
Notify-->>Client: display "completed" or "completed with warnings"
Client->>Store: fetch manifest -> render items and warnings list
Estimated Code Review Effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly Related PRs
Suggested Labels
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal/backups/service.go (1)
746-795:⚠️ Potential issue | 🟠 MajorDon’t stop iterating once the instance probe fails.
break backupLoopdrops every remaining torrent, including later entries that already have a cached blob and no longer need qBittorrent. That means a mid-run crash can still discard usable backup data, and the run can fail even when recoverable cached exports exist later in the slice.💡 Possible direction
- backupLoop: + instanceUnavailable := false for idx, torrent := range torrents { ... - if data == nil { + if data == nil { + if instanceUnavailable { + manifestWarnings = append(manifestWarnings, ManifestWarning{ + Hash: torrent.Hash, + Name: torrent.Name, + Reason: "qBittorrent became unavailable before export", + }) + s.updateProgress(j.runID, idx+1) + continue + } + if err := waitForExportThrottle(ctx, exportThrottle); err != nil { return nil, err } ... if err != nil { switch classifyExportFailure(err) { ... case exportFailureRecoverable: ... if probeErr := s.probeInstance(ctx, j.instanceID); probeErr != nil { - if len(manifestItems) == 0 { - return nil, fmt.Errorf("backup aborted after qBittorrent export failure and instance became unavailable: %w", err) - } - - ... - break backupLoop + instanceUnavailable = true + continue } continue } } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/backups/service.go` around lines 746 - 795, The current code calls break backupLoop when s.probeInstance(ctx, j.instanceID) fails, which aborts processing of later torrents (even those that can be satisfied from cache); instead, stop aborting and allow the loop to continue: replace the break backupLoop path with logic that marks the instance as unavailable (e.g., set a local bool like instanceAvailable = false) and continue the for-loop so subsequent iterations still check cached blobs without calling s.exportTorrent; update places that call s.exportTorrent to short-circuit when instanceAvailable is false, and keep the same logging/error behavior when manifestItems is empty (return if no partial backup) while preserving the existing manifestWarnings/updates.
🧹 Nitpick comments (2)
internal/backups/service_export_test.go (1)
29-41: Cover the real deadline error path here.Using
errors.New("context deadline exceeded")only exercises the substring fallback. It won’t catch regressions in theerrors.Is(context.DeadlineExceeded)branch or wrapped deadline errors, which are the cases production code is most likely to see.As per coding guidelines "Prefer table-driven test cases in Go tests".💡 Suggested rewrite
+import ( + "context" + "fmt" +) + func TestClassifyExportFailure(t *testing.T) { - if kind := classifyExportFailure(errors.New("context deadline exceeded")); kind != exportFailureRecoverable { - t.Fatalf("expected deadline exceeded to be recoverable, got %v", kind) - } - - if kind := classifyExportFailure(qbt.ErrTorrentMetdataNotDownloadedYet); kind != exportFailureMetadataUnavailable { - t.Fatalf("expected metadata-not-downloaded to be classified as metadata unavailable, got %v", kind) - } - - if kind := classifyExportFailure(errors.New("status code: 400: bad request")); kind != exportFailureFatal { - t.Fatalf("expected 400 to be fatal, got %v", kind) - } + t.Parallel() + + tests := []struct { + name string + err error + want exportFailureKind + }{ + {name: "deadline exceeded", err: context.DeadlineExceeded, want: exportFailureRecoverable}, + {name: "wrapped deadline exceeded", err: fmt.Errorf("export: %w", context.DeadlineExceeded), want: exportFailureRecoverable}, + {name: "metadata unavailable", err: qbt.ErrTorrentMetdataNotDownloadedYet, want: exportFailureMetadataUnavailable}, + {name: "400 is fatal", err: errors.New("status code: 400: bad request"), want: exportFailureFatal}, + } + + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + if kind := classifyExportFailure(tt.err); kind != tt.want { + t.Fatalf("expected %v, got %v", tt.want, kind) + } + }) + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/backups/service_export_test.go` around lines 29 - 41, The test TestClassifyExportFailure uses a plain errors.New("context deadline exceeded") which only exercises string matching; change it to a table-driven test that includes at least: context.DeadlineExceeded (use errors.Is path), a wrapped deadline error (e.g., fmt.Errorf("wrap: %w", context.DeadlineExceeded)) to ensure errors.Is works, the qbt.ErrTorrentMetdataNotDownloadedYet case, and the 400 response string case; update assertions to iterate the table and compare expected exportFailure* results for the classifyExportFailure function.internal/backups/service_partial_test.go (1)
194-199: This helper still pays the production export throttle.
NewServicerewrites non-positiveExportThrottlevalues to the 200ms default, so passing0here doesn’t actually disable the ticker. These tests will sleep once per export and get slower as more cases are added.💡 Suggested fix
svc := NewService(store, &qb.SyncManager{}, nil, Config{ DataDir: t.TempDir(), WorkerCount: 1, ExportThrottle: 0, }, nil) + svc.cfg.ExportThrottle = 0 svc.now = func() time.Time { return time.Unix(1_700_000_000, 0).UTC() }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/backups/service_partial_test.go` around lines 194 - 199, The test currently passes ExportThrottle: 0 to NewService, but NewService treats non-positive ExportThrottle as the 200ms default so the test still sleeps; change the test to use a large positive duration (e.g., ExportThrottle: time.Hour) in the Config passed to NewService so the export ticker won't fire during the test (refer to NewService, Config.ExportThrottle and svc.now when updating the test).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@web/src/pages/InstanceBackups.tsx`:
- Around line 1748-1758: The badge currently always renders run.status (e.g.,
"success") even when hasBackupWarnings(run) is true; change the Badge content so
that when hasBackupWarnings(run) and run.status === "success" you render an
explicit warning label such as "success (warnings)" or "success with warnings"
instead of run.status, while keeping the amber styling logic intact—update the
Badge usage in InstanceBackups.tsx (the Badge element that uses
statusVariants[run.status] and hasBackupWarnings(run)) to compute the displayed
text conditionally and ensure the fallback for non-warning statuses still
renders run.status; also update any nearby text consumers (e.g., the paragraph
showing run.errorMessage) if they rely on the badge label.
---
Outside diff comments:
In `@internal/backups/service.go`:
- Around line 746-795: The current code calls break backupLoop when
s.probeInstance(ctx, j.instanceID) fails, which aborts processing of later
torrents (even those that can be satisfied from cache); instead, stop aborting
and allow the loop to continue: replace the break backupLoop path with logic
that marks the instance as unavailable (e.g., set a local bool like
instanceAvailable = false) and continue the for-loop so subsequent iterations
still check cached blobs without calling s.exportTorrent; update places that
call s.exportTorrent to short-circuit when instanceAvailable is false, and keep
the same logging/error behavior when manifestItems is empty (return if no
partial backup) while preserving the existing manifestWarnings/updates.
---
Nitpick comments:
In `@internal/backups/service_export_test.go`:
- Around line 29-41: The test TestClassifyExportFailure uses a plain
errors.New("context deadline exceeded") which only exercises string matching;
change it to a table-driven test that includes at least:
context.DeadlineExceeded (use errors.Is path), a wrapped deadline error (e.g.,
fmt.Errorf("wrap: %w", context.DeadlineExceeded)) to ensure errors.Is works, the
qbt.ErrTorrentMetdataNotDownloadedYet case, and the 400 response string case;
update assertions to iterate the table and compare expected exportFailure*
results for the classifyExportFailure function.
In `@internal/backups/service_partial_test.go`:
- Around line 194-199: The test currently passes ExportThrottle: 0 to
NewService, but NewService treats non-positive ExportThrottle as the 200ms
default so the test still sleeps; change the test to use a large positive
duration (e.g., ExportThrottle: time.Hour) in the Config passed to NewService so
the export ticker won't fire during the test (refer to NewService,
Config.ExportThrottle and svc.now when updating the test).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: b1b9659f-274f-4613-9b98-fdcf67258c62
📒 Files selected for processing (7)
documentation/docs/features/backups.mdinternal/backups/service.gointernal/backups/service_export_test.gointernal/backups/service_partial_test.gointernal/services/notifications/service.goweb/src/pages/InstanceBackups.tsxweb/src/types/index.ts
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
internal/backups/service.go (1)
1114-1134: Overly broad "eof" substring match may cause false positives.Line 1129 checks for
"eof"which is too permissive—it would match unrelated words like "geoffrey" or any error message containing those letters. The check is also redundant since:
- Line 1118 already handles
io.EOFandio.ErrUnexpectedEOFdirectly- Line 1128 already checks
"unexpected eof"♻️ Proposed fix: remove redundant and overly broad check
message := strings.ToLower(err.Error()) return strings.Contains(message, "timeout") || strings.Contains(message, "deadline exceeded") || strings.Contains(message, "connection reset by peer") || strings.Contains(message, "connection refused") || strings.Contains(message, "broken pipe") || strings.Contains(message, "unexpected eof") || - strings.Contains(message, "eof") || strings.Contains(message, "status code: 500") || strings.Contains(message, "status code: 502") || strings.Contains(message, "status code: 503") || strings.Contains(message, "status code: 504")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/backups/service.go` around lines 1114 - 1134, The isRecoverableExportFailure function currently treats any error message containing the substring "eof" as recoverable which is overly broad and redundant (io.EOF/io.ErrUnexpectedEOF are already handled and "unexpected eof" is explicitly checked); remove the standalone strings.Contains(message, "eof") check from isRecoverableExportFailure so only the specific checks (errors.Is for io.EOF/io.ErrUnexpectedEOF and the "unexpected eof" substring) remain, preventing false positives while preserving intended behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/backups/service_partial_test.go`:
- Around line 209-232: The helper seedCachedTorrent has its context parameter
out of order and uses insecure file permissions; change the function signature
to accept ctx context.Context as the first parameter (seedCachedTorrent(t
*testing.T, ctx context.Context, svc *Service, store *models.BackupStore,
instanceID int, hash string, data string)), update every call site to pass ctx
as the first argument, and reduce the os.WriteFile permission from 0o644 to a
more restrictive mode such as 0o600 (keep os.MkdirAll at 0o755 unless otherwise
required) so the call to os.WriteFile uses 0o600 to satisfy gosec G306.
In `@web/src/pages/InstanceBackups.tsx`:
- Around line 1749-1761: The "Last backup" summary currently prints raw Status:
{lastRun.status}; replace that with the same presentation logic used in the
table by using backupStatusLabel(lastRun) and statusVariants[lastRun.status]
(and render the warning message when hasBackupWarnings(lastRun) is true) so
partial-success/warning-backed successes show the same label and warning text as
in the detailed row; update the Last backup card to use backupStatusLabel,
statusVariants, and hasBackupWarnings with lastRun (and show
lastRun.errorMessage when applicable).
---
Nitpick comments:
In `@internal/backups/service.go`:
- Around line 1114-1134: The isRecoverableExportFailure function currently
treats any error message containing the substring "eof" as recoverable which is
overly broad and redundant (io.EOF/io.ErrUnexpectedEOF are already handled and
"unexpected eof" is explicitly checked); remove the standalone
strings.Contains(message, "eof") check from isRecoverableExportFailure so only
the specific checks (errors.Is for io.EOF/io.ErrUnexpectedEOF and the
"unexpected eof" substring) remain, preventing false positives while preserving
intended behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: f60e0258-9bb6-4fcd-952c-fb9d6830bcf6
📒 Files selected for processing (4)
internal/backups/service.gointernal/backups/service_export_test.gointernal/backups/service_partial_test.goweb/src/pages/InstanceBackups.tsx
🚧 Files skipped from review as they are similar to previous changes (1)
- internal/backups/service_export_test.go
Keep usable backup data when qBittorrent export hangs or drops mid-run: record skipped torrents as manifest warnings, finish the run as success with warnings when at least one torrent was exported, and surface that state in notifications/UI. Context: qui discussion #1554, qBittorrent issue #18185, qBittorrent issue #23907.
Summary by CodeRabbit
New Features
Documentation