Summary
Implement a Dagster-orchestrated export/import pipeline for safely migrating LadybugDB databases across version upgrades. The pipeline uses LadybugDB's native EXPORT DATABASE / IMPORT DATABASE commands, with system backups to S3 as a safety net, health check 503 during import to drain traffic, and background tasks with SSE for monitoring long-running operations.
Status: Approved
Problem Statement: Current State
LadybugDB uses an embedded single-file database format (.lbug). When the LadybugDB Python package is upgraded to a new version, the on-disk format may be incompatible with existing databases. Today there is no migration path:
- Existing
.lbug files may fail to open with a new version
- No version tracking on databases tells us which LadybugDB version created them
- No automated export/import pipeline exists to migrate data between versions
- A failed upgrade could leave databases in an unrecoverable state
- Shared repositories (SEC) and all user databases must be migrated together or not at all
The current version is 0.13.0 (set via LADYBUG_INTERNAL_VERSION in the Dockerfile).
Problem Statement: Desired State
A fully automated, Dagster-orchestrated pipeline that:
- Pre-deploy: Creates system backups to S3 + exports all databases to local Parquet on each instance
- Deploy: New container with new LadybugDB version (EBS volume persists exported files)
- Post-deploy: Imports all databases from Parquet into fresh
.lbug files with the new version
- Has rollback at every stage (
.pre-migration files on disk, system backups in S3, old Docker image in ECR)
- Is version-agnostic — can be run with same-version as a validation/integrity check
Problem Statement: Why Now?
The platform is on LadybugDB 0.13.0. Upstream Kuzu releases (which LadybugDB forks) regularly change the on-disk format. Without a migration pipeline, we cannot upgrade LadybugDB, which blocks performance improvements, bug fixes, and new features from upstream.
Proposed Solution: Approach
Dagster-orchestrated, instance-level export/import via Graph API endpoints.
Two Dagster jobs orchestrate the migration across the fleet:
- Export job (pre-deploy): Discovers all instances from DynamoDB → calls
POST /migration/export on each → system backup to S3, then EXPORT DATABASE to local Parquet, writes migration.json manifest
- Import job (post-deploy): Discovers all instances → calls
POST /migration/import on each → reads manifest, creates fresh empty databases, runs IMPORT DATABASE from Parquet
Key design decisions:
- Background tasks with SSE — export/import can take 30+ minutes for large databases
- Health check 503 during import — same pattern as S3 ATTACH replica warmup; ALB stops routing traffic
- System backups — new
BackupType.SYSTEM (hidden from customers) provides off-disk S3 safety net
- EBS persistence — exported Parquet files survive container swaps
- Lazy connections — new Graph API starts without trying to open incompatible
.lbug files
- Replicas don't need migration — SEC replicas S3 ATTACH whatever backup is published; only writers have downtime
Components Affected
Key Changes
New Files
| File |
Description |
graph_api/models/migration.py |
Pydantic models for export/import responses, manifest, status |
graph_api/core/migration_service.py |
Export/import logic, 503 flag, system backup + EXPORT/IMPORT DATABASE |
graph_api/routers/migration.py |
/migration/export, /migration/import, /migration/status endpoints |
dagster/jobs/migration.py |
Dagster export + import jobs with DynamoDB fleet discovery + SSE monitoring |
Modified Files
| File |
Change |
graph_api/app.py |
Include migration router |
graph_api/client/client.py |
Add migration_export(), migration_import(), migration_status() client methods |
graph_api/core/ladybug/pool.py |
Version incompatibility detection in _create_new_connection() |
graph_api/core/task_manager.py |
Add migration_task_manager instance |
graph_api/core/task_sse.py |
Add TaskType.MIGRATION SSE messages |
graph_api/routers/tasks.py |
Register migration manager in UnifiedTaskManager |
graph_api/routers/health.py |
Add is_migration_in_progress() check returning 503 |
models/iam/graph_backup.py |
Add SYSTEM to BackupType enum |
routers/graphs/backups/backup.py |
Filter backup_type != "system" from customer list |
dagster/definitions.py |
Register migration jobs |
Data Model Changes
BackupType enum (models/iam/graph_backup.py):
- Add
SYSTEM = "system" — used for pre-migration safety backups, hidden from customer-facing endpoints
No database migrations required — BackupType is stored as a string column.
API Changes
# Graph API (instance-level, called by Dagster)
POST /migration/export?source_version=0.13.0&target_version=0.14.0
→ 200 {"task_id": "migration_migration_a1b2c3d4", "monitor_url": "/tasks/migration_.../monitor"}
POST /migration/import
→ 200 {"task_id": "migration_migration_e5f6g7h8", "monitor_url": "/tasks/migration_.../monitor"}
GET /migration/status
→ 200 {"migration_pending": true, "migration_in_progress": false, "manifest": {...}, "pre_migration_files": [...]}
# Health endpoint behavior during import:
GET /health
→ 503 {"status": "migrating", "message": "Version migration in progress - not ready for traffic"}
Implementation Plan
Dependencies: None — all building blocks exist (connection pool, backup service, task manager, DynamoDB registry)
Testing
Rollout
Environments: Development → Staging → Production
Rollout strategy:
- Same-version validation in dev (export + import with identical version)
- Cross-version migration in staging
- Production: SEC shared repos first → large/xlarge (fewer customers) → standard (most customers, smallest DBs)
Rollback plan:
.pre-migration files on local EBS for immediate rollback
- System backups in S3 for full recovery
- Old Docker image in ECR — rollback is just changing the image tag
- Dagster import job has
skip_instances config for partial fleet recovery
Success Criteria
Open Questions
References
- Spec:
local/docs/specs/ladybug-version-migration.md
- Kuzu EXPORT/IMPORT docs: https://docs.kuzudb.com/export-import/
- Existing building blocks:
graph_api/core/backup_service.py, graph_api/core/task_manager.py, graph_api/routers/health.py
- Related: Graph deprovisioning (
feature/graph-deprovisioning branch)
Summary
Implement a Dagster-orchestrated export/import pipeline for safely migrating LadybugDB databases across version upgrades. The pipeline uses LadybugDB's native
EXPORT DATABASE/IMPORT DATABASEcommands, with system backups to S3 as a safety net, health check 503 during import to drain traffic, and background tasks with SSE for monitoring long-running operations.Status: Approved
Problem Statement: Current State
LadybugDB uses an embedded single-file database format (
.lbug). When the LadybugDB Python package is upgraded to a new version, the on-disk format may be incompatible with existing databases. Today there is no migration path:.lbugfiles may fail to open with a new versionThe current version is
0.13.0(set viaLADYBUG_INTERNAL_VERSIONin the Dockerfile).Problem Statement: Desired State
A fully automated, Dagster-orchestrated pipeline that:
.lbugfiles with the new version.pre-migrationfiles on disk, system backups in S3, old Docker image in ECR)Problem Statement: Why Now?
The platform is on LadybugDB
0.13.0. Upstream Kuzu releases (which LadybugDB forks) regularly change the on-disk format. Without a migration pipeline, we cannot upgrade LadybugDB, which blocks performance improvements, bug fixes, and new features from upstream.Proposed Solution: Approach
Dagster-orchestrated, instance-level export/import via Graph API endpoints.
Two Dagster jobs orchestrate the migration across the fleet:
POST /migration/exporton each → system backup to S3, thenEXPORT DATABASEto local Parquet, writesmigration.jsonmanifestPOST /migration/importon each → reads manifest, creates fresh empty databases, runsIMPORT DATABASEfrom ParquetKey design decisions:
BackupType.SYSTEM(hidden from customers) provides off-disk S3 safety net.lbugfilesComponents Affected
/robosystems/graph_api/)/robosystems/dagster/)/robosystems/models/)/robosystems/routers/)Key Changes
New Files
graph_api/models/migration.pygraph_api/core/migration_service.pygraph_api/routers/migration.py/migration/export,/migration/import,/migration/statusendpointsdagster/jobs/migration.pyModified Files
graph_api/app.pygraph_api/client/client.pymigration_export(),migration_import(),migration_status()client methodsgraph_api/core/ladybug/pool.py_create_new_connection()graph_api/core/task_manager.pymigration_task_managerinstancegraph_api/core/task_sse.pyTaskType.MIGRATIONSSE messagesgraph_api/routers/tasks.pyUnifiedTaskManagergraph_api/routers/health.pyis_migration_in_progress()check returning 503models/iam/graph_backup.pySYSTEMtoBackupTypeenumrouters/graphs/backups/backup.pybackup_type != "system"from customer listdagster/definitions.pyData Model Changes
BackupTypeenum (models/iam/graph_backup.py):SYSTEM = "system"— used for pre-migration safety backups, hidden from customer-facing endpointsNo database migrations required —
BackupTypeis stored as a string column.API Changes
Implementation Plan
Phase 1: Core models and service
graph_api/models/migration.py— Pydantic modelsgraph_api/core/migration_service.py— Export/import logic with 503 flagmigration_task_managertograph_api/core/task_manager.pyTaskType.MIGRATIONSSE messages tograph_api/core/task_sse.pyPhase 2: Endpoints and health check
graph_api/routers/migration.py— HTTP endpointsgraph_api/app.pygraph_api/routers/health.py(503 pattern)graph_api/routers/tasks.py(UnifiedTaskManager)Phase 3: Backup and safety
SYSTEMtoBackupTypeenum inmodels/iam/graph_backup.pylist_backupsendpointPhase 4: Client and Dagster
migration_export(),migration_import(),migration_status()to Graph API clientdagster/jobs/migration.py— export + import jobsdagster/definitions.pyPhase 5: Validation
Dependencies: None — all building blocks exist (connection pool, backup service, task manager, DynamoDB registry)
Testing
MigrationServiceexport/import logic.pre-migrationrollbackRollout
Environments: Development → Staging → Production
Rollout strategy:
Rollback plan:
.pre-migrationfiles on local EBS for immediate rollbackskip_instancesconfig for partial fleet recoverySuccess Criteria
.pre-migrationfiles works correctly on import failureOpen Questions
EXPORT DATABASEwork with an active read connection? If not, we need to drain all connections before exporting, causing brief unavailability during export..lbugfile header? Would enable version guard without the manifest.References
local/docs/specs/ladybug-version-migration.mdgraph_api/core/backup_service.py,graph_api/core/task_manager.py,graph_api/routers/health.pyfeature/graph-deprovisioningbranch)