perf: Use UNLOGGED tables during bulk import#29
Merged
jakebromberg merged 1 commit intomainfrom Mar 10, 2026
Merged
Conversation
Tables are set UNLOGGED immediately after schema creation (skipping WAL writes during the COPY-intensive import, index, dedup, and prune phases) and converted back to LOGGED after vacuum so consumers get durable tables. The optimization is localized entirely in run_pipeline.py; the canonical schema and standalone scripts remain unchanged. - Add PIPELINE_TABLES constant shared by run_vacuum, set_tables_unlogged, set_tables_logged - Add set_tables_unlogged/set_tables_logged with FK-ordered two-phase execution (children first for UNLOGGED, parent first for LOGGED) - Wire into both _run_database_build (CSV mode) and _run_database_build_post_import (direct-PG mode) - Add "set_logged" to pipeline state (v3) with v2 migration support - Add integration tests verifying pg_class.relpersistence transitions - Add E2E test verifying all tables are LOGGED after pipeline completion
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
run_pipeline.py; schema file and standalone scripts unchangedCloses #28
Test plan
ruff format --check . && ruff check .passespytest tests/unit/ -v-- 317 passedpytest -m postgres -v-- 119 passed (includes new UNLOGGED/LOGGED integration tests)pytest -m e2e -v-- 27 passed (includes newtest_tables_are_loggedandset_loggedresume check)