Skip to content

fix(backup): harden tenant restore preview and lookup handling#920

Merged
viettranx merged 11 commits intonextlevelbuilder:devfrom
badgerbees:fix/tenant-restore-hardening
Apr 16, 2026
Merged

fix(backup): harden tenant restore preview and lookup handling#920
viettranx merged 11 commits intonextlevelbuilder:devfrom
badgerbees:fix/tenant-restore-hardening

Conversation

@badgerbees
Copy link
Copy Markdown
Contributor

@badgerbees badgerbees commented Apr 16, 2026

Summary

This PR hardens the tenant backup and restore flow end to end. It fixes the original tenant backup scope bug by making the backup registry scope-aware, including root-table handling for tenants and coverage for tenant_users, then rebuilds tenant restore mode new so it creates a fresh tenant from archive metadata instead of replaying the archived tenant row.

The restore flow now validates the target slug up front, including in dry-run mode, so preview runs catch slug collisions before any real work starts. The HTTP restore auth path also now distinguishes a missing tenant from a real store or database failure, returning 404 only when the tenant truly does not exist and logging unexpected lookup failures as server errors.

On top of the backend changes, the CLI restore path, web restore form, request builder, and localized restore copy were updated to match the new mode=new contract. Focused regression tests were added for tenant restore helper behavior, tenant table scoping, and tenant lookup error handling.

The branch also picked up follow-up hardening from CI: hook-store GetByID now respects tenant scope in both PostgreSQL and SQLite, hook integration tests explicitly opt into the loopback bypass used by SSRF-safe HTTP handlers, and the affected hook tests were tightened so they match the current uniqueness and dispatcher behavior instead of relying on brittle assumptions. That keeps the integration suite aligned with the actual store contract and the test-only security bypass the HTTP hook handler expects.

Type

  • Feature
  • Bug fix
  • Hotfix (targeting main)
  • Refactor
  • Docs
  • CI/CD

Target Branch

dev

Checklist

  • go build ./... passes
  • go build -tags sqliteonly ./... passes (if Go changes)
  • go vet ./... passes
  • Tests pass: go test -race ./...
  • Web UI builds: cd ui/web && pnpm build (if UI changes)
  • No hardcoded secrets or credentials
  • SQL queries use parameterized $1, $2 (no string concat)
  • New user-facing strings added to all 3 locales (en/vi/zh)
  • Migration version bumped in internal/upgrade/version.go (if new migration)

Test Plan

  • go test ./internal/backup -run 'TestLoadTenantRestoreRow|TestShouldRestoreTable|TestEnsureTenantSlugAvailable'
  • go test ./internal/http -run 'TestResolveTenant|TestResolveRestoreTargetNewModeUsesSlug'
  • go test ./cmd -run '^$'
  • cd ui/web && pnpm build

viettranx and others added 11 commits April 13, 2026 11:54
fix(security): cross-group session leak + auto-inject scoping + vault graph + UI fixes
Release: vault enrich filter, stop bug, graph, tests, security fixes
…SSRF flag

- Replace mode no longer deletes the tenants row (FK safe vs excluded
  diagnostic tables: traces, activity_logs, usage_snapshots, spans,
  embedding_cache, pairing_requests, paired_devices,
  channel_pending_messages, cron_run_logs). Metadata is preserved in place.
- shouldRestoreTable now excludes tenants for both new and replace modes.
- CLI: add validateTenantRestoreFlags guardrail. mode=new requires
  --new-tenant-slug and rejects --tenant/--tenant-id; upsert/replace warn
  on stray --new-tenant-slug; invalid --mode values rejected. TAB in help
  text fixed; flag descriptions clarified.
- HTTP: resolveRestoreTarget rejects tenant_id for mode=new regardless
  of tenant_slug (matches CLI contract). New i18n key
  MsgRestoreNewModeRejectsTenantID (en/vi/zh).
- security/ssrf: allowLoopbackForTest switched to atomic.Bool so
  concurrent reads from outbound dialers are race-safe.
- Polish: vi backup.json key order matches en/zh; TenantRestoreOptions.Mode
  doc comment documents upsert/replace/new semantics including clone
  behavior for new.
- Tests: unit coverage for validator (12 cases), HTTP guardrails
  (3 cases), shouldRestoreTable replace branch. Integration test
  tests/integration/tenant_restore_replace_test.go regression-guards
  the FK fix using activity_logs seed + DeleteTenantDataForTest helper.
# Conflicts:
#	internal/store/pg/hooks.go
#	internal/store/pg/hooks_test.go
#	internal/store/sqlitestore/hooks.go
@viettranx viettranx merged commit f5917b0 into nextlevelbuilder:dev Apr 16, 2026
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants