bulkmerge: gate distributed merge on instance availability by spilchen · Pull Request #162616 · cockroachdb/cockroach

spilchen · 2026-02-06T22:08:14Z

Previously, a distributed merge would fail immediately if any SQL instance owning required SST files was unavailable (for example, when a node was down). If the instance remained unavailable, the job would retry indefinitely and never make progress.

This change adds a proactive availability check at the start of Merge(). Before planning the merge, we now verify that all required nodes are up. If any are unavailable, the job enters an exponential backoff. If the backoff expires, the job fails with a permanent error so that it does not restart endlessly.

The logic lives in the bulkmerge package so that this behavior applies consistently to IMPORT and backfill.

Epic: CRDB-48845
Closes #161877

Release note: None

Previously, a distributed merge would fail immediately if any SQL instance owning required SST files was unavailable (for example, when a node was down). If the instance remained unavailable, the job would retry indefinitely and never make progress. This change adds a proactive availability check at the start of Merge(). Before planning the merge, we now verify that all required nodes are up. If any are unavailable, the job enters an exponential backoff. If the backoff expires, the job fails with a permanent error so that it does not restart endlessly. The logic lives in the bulkmerge package so that this behavior applies consistently to IMPORT and backfill. Epic: CRDB-48845 Closes cockroachdb#161877 Release note: None

trunk-io · 2026-02-06T22:08:18Z

Merging to master in this repository is managed by Trunk.

To merge this pull request, check the box to the left or comment /trunk merge below.

cockroach-teamcity · 2026-02-06T22:08:28Z

This change is

spilchen self-assigned this Feb 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bulkmerge: gate distributed merge on instance availability#162616

bulkmerge: gate distributed merge on instance availability#162616
spilchen wants to merge 1 commit intocockroachdb:masterfrom
spilchen:gh-161877/260206/1358/merge/restart-behaviour-no-persist

spilchen commented Feb 6, 2026

Uh oh!

trunk-io bot commented Feb 6, 2026

Uh oh!

cockroach-teamcity commented Feb 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

spilchen commented Feb 6, 2026

Uh oh!

trunk-io bot commented Feb 6, 2026

Uh oh!

cockroach-teamcity commented Feb 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants