Skip to content

feat(universaldb): postgres leader-resolver driver overhaul#5329

Open
MasterPtato wants to merge 1 commit into
stack/slop-claude-opus-4-8-feat-pegboard-envoy-rate-limit-envoy-ws-ingress-and-cap-get_pages-trim-unused-metrics-tnzyzqrlfrom
stack/slop-claude-opus-4-8-high-feat-universaldb-postgres-leader-resolver-driver-overhaul-xrrmomwz
Open

feat(universaldb): postgres leader-resolver driver overhaul#5329
MasterPtato wants to merge 1 commit into
stack/slop-claude-opus-4-8-feat-pegboard-envoy-rate-limit-envoy-ws-ingress-and-cap-get_pages-trim-unused-metrics-tnzyzqrlfrom
stack/slop-claude-opus-4-8-high-feat-universaldb-postgres-leader-resolver-driver-overhaul-xrrmomwz

Conversation

@MasterPtato

@MasterPtato MasterPtato commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

No description provided.

@MasterPtato

MasterPtato commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

Stack for rivet-dev/rivet

Current stack:

Dependencies:

Get stack: forklift get 5329
Push local edits: forklift submit
Merge when ready: forklift merge 5329

@claude

claude Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

PR Review: Postgres Leader-Resolver Driver Overhaul

This PR replaces the old per-transaction conflict_ranges exclusion-constraint approach (btree_gist, GIST-indexed ranges, Postgres advisory locking) with a centralized leader-resolver pattern: one elected node drains a udb_commit_requests queue, resolves conflicts in-memory via the shared TransactionConflictTracker, applies writes in batched Postgres transactions, and replies to followers via NOTIFY. The architecture is much cleaner and avoids the distributed deadlock / GiST contention issues of the old approach.

Four confirmed findings below, ranked by severity.


1. Fresh-database cold-window spuriously rejects all writes for 5 seconds

engine/packages/universaldb/src/driver/postgres/resolver/mod.rs:141

recovery_floor reads last_value from udb_version_seq to determine the highest version a prior leader could have committed. On a brand-new database where no nextval has ever been called, PostgreSQL initialises last_value = 1 (the sequence start value) with is_called = false. recovery_floor therefore returns max(durable=0, seq_high=1) = 1, setting recovery_version = 1.

During the 5-second cold window (TXN_TIMEOUT), any commit with start_version < recovery_version is rejected. Every first-boot follower opens a snapshot with read_version = durable_version = 0, so start_version = 0 < 1 = recovery_version → all writes fail as conflicts and retry. All write transactions stall for the full 5-second cold window on a fresh install.

Fix: Account for is_called:

SELECT CASE WHEN is_called THEN last_value ELSE 0 END FROM udb_version_seq

2. LISTEN race during reconnect silently drops notifications for new subscribers

engine/packages/universaldb/src/driver/postgres/listener.rs:176

In connect_and_run, the reconnect sequence is:

  1. Connect
  2. Snapshot registered channels via iter_async
  3. Issue LISTEN for each snapshotted channel
  4. Set *client.lock() = Some(new_client)client set last

If a concurrent listen("new_channel") call arrives between steps 2 and 4, it finds client = None and skips the LISTEN. The channel IS inserted into scc::HashMap, so it won't be picked up by the re-LISTEN loop on the current reconnect — that loop already ran. No LISTEN is ever issued for this channel until the next connection drop.

The inline comment says "the lifecycle task re-LISTENs on reconnect" — but for channels added in this window, that reconnect already happened.

Fix: Set *client = Some(new_client) before the re-LISTEN loop. New listen() calls that arrive after that point will immediately issue their own LISTEN on the active client.


3. wait_for_leader polls with sleep instead of using the existing watch::Receiver

engine/packages/universaldb/src/driver/postgres/commit.rs:97

async fn wait_for_leader(shared: &Arc<PostgresShared>) -> Result<LeaseInfo> {
    let deadline = Instant::now() + LEADER_WAIT_TIMEOUT;
    loop {
        if let Some(lease) = shared.current_lease() { return Ok(lease); }
        if Instant::now() >= deadline { return Err(DatabaseError::NotCommitted.into()); }
        tokio::time::sleep(RESULT_POLL_INTERVAL).await;  // ← banned pattern
    }
}

This is the loop { check; sleep } pattern that CLAUDE.md prohibits: "If it is inside a loop { check; sleep } body, it is polling and should be event-driven instead." PostgresShared already has a watch::Sender<Option<LeaseInfo>> / watch::Receiver pair (lease_tx / lease_rx) that fires whenever set_lease is called — the infrastructure for event-driven notification is in place but not exposed to the call site.

Fix: Add a method to PostgresShared to subscribe to the lease watch channel, then rewrite as:

async fn wait_for_leader(shared: &Arc<PostgresShared>) -> Result<LeaseInfo> {
    let mut lease_rx = shared.subscribe_lease();
    let deadline = tokio::time::Instant::now() + LEADER_WAIT_TIMEOUT;
    loop {
        if let Some(lease) = shared.current_lease() { return Ok(lease); }
        tokio::select! {
            res = lease_rx.changed() => { if res.is_err() { break; } }
            _ = tokio::time::sleep_until(deadline) => {
                return Err(DatabaseError::NotCommitted.into());
            }
        }
    }
    Err(DatabaseError::NotCommitted.into())
}

4. GC query deletes pending rows — SQL does not match the stated invariant

engine/packages/universaldb/src/driver/postgres/database.rs:219–221

"DELETE FROM udb_commit_requests
 WHERE created_at < now() - ($1::bigint * interval '1 second')"

The constant is documented as: "Terminal and orphaned commit-request rows older than this are garbage collected." But the SQL has no status filter — it deletes every row, including pending ones.

A pending row stuck for >60 seconds (e.g., pool exhaustion, leader stall) is silently deleted. The waiting follower in await_result receives Ok(None) and returns NotCommitted, causing a retry — which is safe today — but this diverges from the documented invariant and would silently break any future assertion or metric that expects live pending rows to always exist.

Fix: Either add AND status != 'pending' to the DELETE, or update the constant's doc comment to reflect the actual behaviour.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant