Skip to content

perf: Use UNLOGGED tables during bulk import#29

Merged
jakebromberg merged 1 commit intomainfrom
feat/unlogged-tables
Mar 10, 2026
Merged

perf: Use UNLOGGED tables during bulk import#29
jakebromberg merged 1 commit intomainfrom
feat/unlogged-tables

Conversation

@jakebromberg
Copy link
Member

Summary

  • Use UNLOGGED tables during the COPY-intensive import phases to skip WAL writes, roughly halving I/O
  • Convert tables back to LOGGED after vacuum so consumers get durable storage
  • FK-ordered two-phase execution: children first for UNLOGGED, parent first for LOGGED
  • Pipeline state v3 with backward-compatible v2 migration
  • Localized entirely in run_pipeline.py; schema file and standalone scripts unchanged

Closes #28

Test plan

  • ruff format --check . && ruff check . passes
  • pytest tests/unit/ -v -- 317 passed
  • pytest -m postgres -v -- 119 passed (includes new UNLOGGED/LOGGED integration tests)
  • pytest -m e2e -v -- 27 passed (includes new test_tables_are_logged and set_logged resume check)

Tables are set UNLOGGED immediately after schema creation (skipping WAL writes during the COPY-intensive import, index, dedup, and prune phases) and converted back to LOGGED after vacuum so consumers get durable tables. The optimization is localized entirely in run_pipeline.py; the canonical schema and standalone scripts remain unchanged.

- Add PIPELINE_TABLES constant shared by run_vacuum, set_tables_unlogged, set_tables_logged
- Add set_tables_unlogged/set_tables_logged with FK-ordered two-phase execution (children first for UNLOGGED, parent first for LOGGED)
- Wire into both _run_database_build (CSV mode) and _run_database_build_post_import (direct-PG mode)
- Add "set_logged" to pipeline state (v3) with v2 migration support
- Add integration tests verifying pg_class.relpersistence transitions
- Add E2E test verifying all tables are LOGGED after pipeline completion
@jakebromberg jakebromberg merged commit 0e7fed2 into main Mar 10, 2026
3 checks passed
@jakebromberg jakebromberg deleted the feat/unlogged-tables branch March 10, 2026 04:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf: Use UNLOGGED tables during bulk import

1 participant