Skip to content

Cross-environment restore (Docker ↔ K8s) breaks due to different PostgreSQL role ownership #127

@fatih-acar

Description

@fatih-acar

Summary

When a backup is created in one environment and restored into the other (Docker → Kubernetes or Kubernetes → Docker), the PostgreSQL restore can leave the prefect database with role/ownership/grants that don't exist in the target cluster, breaking task-manager startup.

The root cause is that the two environments ship with different default PostgreSQL usernames:

  • Docker Compose: connects as postgres (superuser)
  • Kubernetes (Helm chart): connects as prefect

These defaults are reflected in src/internal/app/app_config.go:17-19:

defaultPostgresDatabase = "prefect"
defaultPostgresUsername = "postgres"   // Docker-flavored default
defaultPostgresPassword = "prefect"

pg_dump -Fc embeds object ownership and ACL statements that reference the source role. On restore via pg_restore --clean --create (src/internal/app/backup_taskmanager.go:102 / :115), the target cluster either:

  • fails because the source role does not exist, or
  • silently restores objects owned by a role the application user cannot access, leading to permission errors at runtime.

Steps to reproduce

  1. Deploy Infrahub via Docker Compose. Let task-manager run and create flow data.
  2. infrahub-backup create → produces a backup whose prefect.dump references role postgres.
  3. Deploy Infrahub on Kubernetes (where the Prefect DB user is prefect).
  4. infrahub-backup restore against the K8s deployment.
  5. Observe restore errors / task-manager pod errors about missing role postgres or permission denied on prefect schema objects.

The reverse direction (K8s backup → Docker restore) fails symmetrically: objects end up owned by prefect while the runtime connects as postgres.

Expected behavior

A backup taken from one environment should be restorable into the other without manual SQL fix-ups, regardless of which default role each platform ships with.

Proposed approaches (to discuss)

  • Remap ownership during restore. Use pg_restore --no-owner --no-privileges --role=<target-user> so the target user owns all restored objects and the source role is irrelevant. This is the simplest fix and matches the typical pg cross-cluster migration pattern.
  • Capture and persist the source role in backup metadata, and on restore either (a) create the role if missing or (b) translate ownership to the target role.
  • Normalize at backup time — dump with --no-owner --no-acl so the dump file is portable by construction. Trades off losing fine-grained ownership info that we don't actually use today.

Likely best to combine: dump with --no-owner --no-acl and restore with --role=<target> to be defensive.

Affected code

  • src/internal/app/backup_taskmanager.go:14-31 (stream dump)
  • src/internal/app/backup_taskmanager.go:33-64 (file dump)
  • src/internal/app/backup_taskmanager.go:66-126 (restore)
  • src/internal/app/app_config.go:13-20 (default credentials)

Metadata

Metadata

Assignees

No one assigned

    Labels

    type: bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions