Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion NEXT_CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,8 @@
* Bundle variable references now accept Unicode letters in path segments (e.g. `${var.变量}`). ([#5532](https://github.com/databricks/cli/pull/5532))
* Ignore remote changes for vector search direct_access_index_spec.schema_json to prevent drift when the backend normalizes the schema ([#5481](https://github.com/databricks/cli/pull/5481)).
* Remove hidden, never-functional `--existing-dashboard-id`, `--existing-dashboard-path`, `--existing-alert-id`, and `--existing-genie-space-id` alias flags from `bundle generate`; use the documented `--existing-id` / `--existing-path` flags instead ([#5591](https://github.com/databricks/cli/pull/5591)).
* engine/direct: Fix WAL corruption after two consecutive failed deploys ([#5557](https://github.com/databricks/cli/issues/5557)).
* engine/direct: Fix WAL corruption after two consecutive failed deploys ([#5606](https://github.com/databricks/cli/pull/5606)).
* engine/direct: Don't open the deployment state WAL when a deploy's plan fails ([#5607](https://github.com/databricks/cli/pull/5607)).

### Dependency updates

Expand Down
14 changes: 14 additions & 0 deletions acceptance/bundle/deploy/wal/failed-plan-no-wal/databricks.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
bundle:
name: test-bundle

resources:
jobs:
test_job:
name: "test-job"
tasks:
- task_key: "test-task"
spark_python_task:
python_file: ./test.py
new_cluster:
spark_version: 15.4.x-scala2.12
node_type_id: i3.xlarge
3 changes: 3 additions & 0 deletions acceptance/bundle/deploy/wal/failed-plan-no-wal/out.test.toml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

48 changes: 48 additions & 0 deletions acceptance/bundle/deploy/wal/failed-plan-no-wal/output.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@

=== Deploy 1 (normal: creates the job and the committed state)
>>> [CLI] bundle deploy
Uploading bundle files to /Workspace/Users/[USERNAME]/.bundle/test-bundle/default/files...
Deploying resources...
Updating deployment state...
Deployment complete!

=== Deploy 2 (planning fails, must not leave a WAL)
>>> errcode [CLI] bundle deploy
Uploading bundle files to /Workspace/Users/[USERNAME]/.bundle/test-bundle/default/files...
Error: cannot plan resources.jobs.test_job: reading id="[NUMID]": Fault injected by test. (403 INJECTED)

Endpoint: GET [DATABRICKS_URL]/api/2.2/jobs/get?job_id=[NUMID]
HTTP Status: 403 Forbidden
API error_code: INJECTED
API message: Fault injected by test.

Error: planning failed


Exit code: 1

>>> assert_not_exists.py .databricks/bundle/default/resources.json.wal

=== Deploy 3 (planning fails again, must not leave a WAL)
>>> errcode [CLI] bundle deploy
Uploading bundle files to /Workspace/Users/[USERNAME]/.bundle/test-bundle/default/files...
Error: cannot plan resources.jobs.test_job: reading id="[NUMID]": Fault injected by test. (403 INJECTED)

Endpoint: GET [DATABRICKS_URL]/api/2.2/jobs/get?job_id=[NUMID]
HTTP Status: 403 Forbidden
API error_code: INJECTED
API message: Fault injected by test.

Error: planning failed


Exit code: 1

>>> assert_not_exists.py .databricks/bundle/default/resources.json.wal

=== Deploy 4 (fault expired: recovers and succeeds)
>>> [CLI] bundle deploy
Uploading bundle files to /Workspace/Users/[USERNAME]/.bundle/test-bundle/default/files...
Deploying resources...
Updating deployment state...
Deployment complete!
29 changes: 29 additions & 0 deletions acceptance/bundle/deploy/wal/failed-plan-no-wal/script
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# A failed plan must not leave a write-ahead log behind, so repeated planning
# failures never block a later, healthy deploy. Previously a failed plan still
# opened the WAL for write (UpgradeToWrite) and returned without finalizing,
# leaving a header-only WAL; after two failures the WAL serial drifted two ahead
# of the committed serial and every later command failed WAL recovery until the
# WAL was deleted by hand.
#
# A first deploy creates the job normally. An injected fault then makes the next
# two deploys fail while planning (planning refreshes the existing job via
# jobs/get). The final deploy, with the fault expired, must recover and succeed.
# A non-retried 403 is used so the failure is immediate; a 5xx would be retried
# with backoff.

title "Deploy 1 (normal: creates the job and the committed state)"
trace $CLI bundle deploy

# Fail the plan-stage refresh GET for the next two deploys only.
fault.py "GET /api/2.2/jobs/get" 403 0 2

title "Deploy 2 (planning fails, must not leave a WAL)"
trace errcode $CLI bundle deploy
trace assert_not_exists.py .databricks/bundle/default/resources.json.wal

title "Deploy 3 (planning fails again, must not leave a WAL)"
trace errcode $CLI bundle deploy
trace assert_not_exists.py .databricks/bundle/default/resources.json.wal

title "Deploy 4 (fault expired: recovers and succeeds)"
trace $CLI bundle deploy
1 change: 1 addition & 0 deletions acceptance/bundle/deploy/wal/failed-plan-no-wal/test.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
print("test")
10 changes: 10 additions & 0 deletions bundle/phases/deploy.go
Original file line number Diff line number Diff line change
Expand Up @@ -170,6 +170,13 @@ func Deploy(ctx context.Context, b *bundle.Bundle, outputHandler sync.OutputHand
plan = RunPlan(ctx, b, engine)
}

// Stop before opening the WAL for write if planning failed. UpgradeToWrite
// writes a WAL header that only deployCore's Finalize commits or discards;
// returning past it without finalizing leaves a header-only WAL behind.
if logdiag.HasError(ctx) {
return
}

if engine.IsDirect() {
// Upgrade from read (opened by process.go) to write mode
if err := b.DeploymentBundle.StateDB.UpgradeToWrite(); err != nil {
Expand All @@ -187,6 +194,9 @@ func Deploy(ctx context.Context, b *bundle.Bundle, outputHandler sync.OutputHand
}
}

// InitForApply receives ctx and could log a diagnostic without returning an
// error, so re-check before deploying. (UpgradeToWrite above takes no ctx and
// thus cannot log, so the earlier check is enough to guard the WAL open.)
if logdiag.HasError(ctx) {
return
}
Expand Down
Loading