Skip to content

direct: Don't open the state WAL when a deploy's plan fails#5607

Open
denik wants to merge 3 commits into
mainfrom
wal-bug2
Open

direct: Don't open the state WAL when a deploy's plan fails#5607
denik wants to merge 3 commits into
mainfrom
wal-bug2

Conversation

@denik

@denik denik commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Changes

When planning failed, the deploy phase still called UpgradeToWrite (which
writes the WAL header) and only then returned, never reaching the Finalize
that would commit or discard it — leaving a header-only WAL on disk.

Bail out before UpgradeToWrite when planning has already logged an error, so
a failed plan never opens the WAL. Also re-check diagnostics after
InitForApply: it receives ctx and could log a diagnostic without returning
an error, so the call site can't prove it won't (UpgradeToWrite takes no
ctx and thus can't log, so the earlier check covers it).

Why

Related to #5557. The dstate fix in #5606 makes recovery of a leftover WAL
safe; this stops a failed plan from creating one in the first place.

Tests

  • bundle/deploy/wal/two-failed-deploys: a deploy creates the job, then an
    injected fault on the plan-stage refresh GET makes the next two deploys fail
    while planning — no WAL is left and the final deploy recovers. Confirmed to
    fail when the fix is reverted.

This pull request and its description were written by Isaac.

@eng-dev-ecosystem-bot

Copy link
Copy Markdown
Collaborator

Integration test report

Commit: 8235d22

Run: 27550847836

Env 🟨​KNOWN 🔄​flaky 💚​RECOVERED 🙈​SKIP ✅​pass 🙈​skip Time
💚​ aws linux 7 15 264 979 6:33
🟨​ aws windows 7 15 266 977 21:06
💚​ aws-ucws linux 7 15 360 893 15:47
💚​ aws-ucws windows 7 15 362 891 10:14
🔄​ azure linux 2 1 17 265 977 8:27
💚​ azure windows 1 17 269 975 9:22
💚​ azure-ucws linux 1 17 365 889 12:26
💚​ azure-ucws windows 1 17 367 887 11:22
💚​ gcp linux 1 17 263 980 7:42
💚​ gcp windows 1 17 265 978 9:25
24 interesting tests: 15 SKIP, 7 KNOWN, 2 flaky
Test Name aws linux aws windows aws-ucws linux aws-ucws windows azure linux azure windows azure-ucws linux azure-ucws windows gcp linux gcp windows
🟨​ TestAccept 💚​R 🟨​K 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R
🙈​ TestAccept/bundle/invariant/no_drift 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/permissions 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions 💚​R 🟨​K 💚​R 💚​R 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=direct 💚​R 🟨​K 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=terraform 💚​R 🟨​K 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions 💚​R 🟨​K 💚​R 💚​R 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=direct 💚​R 🟨​K 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=terraform 💚​R 🟨​K 💚​R 💚​R
🙈​ TestAccept/bundle/resources/postgres_branches/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/recreate 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/replace_existing 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/update_protected 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/without_branch_id 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_endpoints/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_endpoints/recreate 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_projects/update_display_name 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/synced_database_tables/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_endpoints/drift/recreated_same_name 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_indexes/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_indexes/grants/select 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/ssh/connection 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🔄​ TestFetchRepositoryInfoAPI_FromRepo/root ✅​p ✅​p ✅​p ✅​p 🔄​f ✅​p ✅​p ✅​p ✅​p ✅​p
🔄​ TestFetchRepositoryInfoAPI_FromRepo/subdir ✅​p ✅​p ✅​p ✅​p 🔄​f ✅​p ✅​p ✅​p ✅​p ✅​p
Top 30 slowest tests (at least 2 minutes):
duration env testname
6:15 gcp windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
4:55 gcp linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
4:36 aws-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
4:23 gcp windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
4:11 gcp linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:41 azure-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:36 azure-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:25 aws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:24 aws-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:15 azure linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:07 azure linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:06 azure-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:05 azure-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:00 aws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:58 azure windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:57 gcp windows TestAccept
2:55 aws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:55 aws-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:55 azure-ucws windows TestAccept
2:54 aws-ucws linux TestAccept/bundle/resources/dashboards/generate_inplace/DATABRICKS_BUNDLE_ENGINE=direct
2:53 aws-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:48 azure windows TestAccept
2:47 azure-ucws linux TestAccept/bundle/resources/volumes/recreate/DATABRICKS_BUNDLE_ENGINE=terraform
2:39 aws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:36 azure windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:27 aws-ucws linux TestAccept/bundle/deploy/files/no-snapshot-sync/DATABRICKS_BUNDLE_ENGINE=direct
2:21 aws-ucws linux TestAccept/bundle/resources/volumes/recreate/DATABRICKS_BUNDLE_ENGINE=terraform
2:21 aws-ucws windows TestAccept
2:14 aws-ucws linux TestAccept/bundle/resources/model_serving_endpoints/basic/DATABRICKS_BUNDLE_ENGINE=terraform
2:01 azure-ucws linux TestAccept/bundle/resources/model_serving_endpoints/basic/DATABRICKS_BUNDLE_ENGINE=terraform

@denik denik added this pull request to the merge queue Jun 15, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to a conflict with the base branch Jun 15, 2026
denik added 2 commits June 15, 2026 08:23
…an fails

A failed plan still ran UpgradeToWrite and then returned without Finalize,
leaving a header-only WAL on disk. Bail out before UpgradeToWrite when planning
has already errored. Also re-check diagnostics after InitForApply, which
receives ctx and could log a diagnostic without returning an error.

Co-authored-by: Isaac
Name it after the guarantee it checks: a failed plan leaves no WAL. Use a
generic test-bundle name.

Co-authored-by: Isaac
@denik denik enabled auto-merge June 15, 2026 15:24
@denik denik temporarily deployed to test-trigger-is June 15, 2026 15:24 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 15, 2026 15:24 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 15, 2026 15:37 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 15, 2026 15:37 — with GitHub Actions Inactive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants