Skip to content

bulkmerge: gate distributed merge on instance availability#162616

Draft
spilchen wants to merge 1 commit intocockroachdb:masterfrom
spilchen:gh-161877/260206/1358/merge/restart-behaviour-no-persist
Draft

bulkmerge: gate distributed merge on instance availability#162616
spilchen wants to merge 1 commit intocockroachdb:masterfrom
spilchen:gh-161877/260206/1358/merge/restart-behaviour-no-persist

Conversation

@spilchen
Copy link
Contributor

@spilchen spilchen commented Feb 6, 2026

Previously, a distributed merge would fail immediately if any SQL instance owning required SST files was unavailable (for example, when a node was down). If the instance remained unavailable, the job would retry indefinitely and never make progress.

This change adds a proactive availability check at the start of Merge(). Before planning the merge, we now verify that all required nodes are up. If any are unavailable, the job enters an exponential backoff. If the backoff expires, the job fails with a permanent error so that it does not restart endlessly.

The logic lives in the bulkmerge package so that this behavior applies consistently to IMPORT and backfill.

Epic: CRDB-48845
Closes #161877

Release note: None

Previously, a distributed merge would fail immediately if any SQL instance
owning required SST files was unavailable (for example, when a node was down).
If the instance remained unavailable, the job would retry indefinitely and
never make progress.

This change adds a proactive availability check at the start of Merge().
Before planning the merge, we now verify that all required nodes are up. If any
are unavailable, the job enters an exponential backoff. If the backoff expires,
the job fails with a permanent error so that it does not restart endlessly.

The logic lives in the bulkmerge package so that this behavior applies
consistently to IMPORT and backfill.

Epic: CRDB-48845
Closes cockroachdb#161877

Release note: None
@spilchen spilchen self-assigned this Feb 6, 2026
@trunk-io
Copy link
Contributor

trunk-io bot commented Feb 6, 2026

Merging to master in this repository is managed by Trunk.

  • To merge this pull request, check the box to the left or comment /trunk merge below.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sql: resume of index backfill with distributed merge can hit terminal error

2 participants