Skip to content

direct: store serialized_dashboard/serialized_space in state as content hashes#5609

Draft
shreyas-goenka wants to merge 1 commit into
databricks:mainfrom
shreyas-goenka:shreyas-goenka/dashboards-sha-state
Draft

direct: store serialized_dashboard/serialized_space in state as content hashes#5609
shreyas-goenka wants to merge 1 commit into
databricks:mainfrom
shreyas-goenka:shreyas-goenka/dashboards-sha-state

Conversation

@shreyas-goenka

Copy link
Copy Markdown
Contributor

Changes

The direct deploy engine persists the full planned config per resource in resources.json. For dashboards and genie spaces that includes the inlined serialized_dashboard / serialized_space contents, which routinely run from hundreds of KB to several MB (and roughly double once JSON-escaped into state). That content is never read back from state:

  • drift is detected via etag (both fields are ignore_remote_changes, etag_based), so the remote serialized value is never meaningfully compared;
  • a deploy always sends the contents to the API from the plan's new_state, never from saved state;
  • nothing resolves references out of these fields.

So the only thing the saved value is used for is a local "did the content change since last deploy?" equality check — which a hash serves exactly.

This PR adds an optional CompactState(state *T) (*T, error) adapter method (same idiom as RemapState / OverrideChangeDesc) that replaces such equality-only fields with a sha256:<hex> placeholder. The framework applies it both before persisting state and to every value entering the diff (saved state, the local config copy, and the remapped remote), so stored and compared values share one canonical form: unchanged content still yields an equal-hash skip, changed content yields a different hash, exactly as before.

dashboards.serialized_dashboard and genie_spaces.serialized_space implement it. The plan's new_state (sent to the API on apply) and the raw top-level remote_state snapshot keep their full content.

Compatibility

No state version bump. Legacy full-content state is hashed on read for the comparison and rewritten compactly on the next save (lazy migration). An older CLI reading new state sees a hash, plans one redundant update, and rewrites full content — safe. Bumping the version would instead hard-fail older CLIs, a worse failure mode for mixed-version CI/teams.

User-visible effect

bundle plan reports these fields as sha256:... in the changes section rather than embedding the (potentially multi-MB) serialized blob. resources.json shrinks correspondingly, as does the per-deploy state upload.

Field selection

A field qualifies only if it is ignore_remote_changes, is never read back from state, and is large enough to matter. Surveying all direct-engine resource types, only these two fields qualify today; the mechanism is declarative-by-method so a future file-inlined blob can opt in by implementing CompactState. A unit test guards the core invariant (the field must be ignore_remote_changes).

Tests

  • Unit: hashStateValue (determinism, idempotence, nil/empty), CompactState for both resources, and the ignore_remote_changes invariant guards.
  • Acceptance: regenerated affected direct-engine plan/state outputs (dashboards simple/detect-change/unpublish-out-of-band, genie inline, bind, migrate). The genie_spaces/version_migration script previously parsed the schema version out of the plan's serialized content; it now asserts local_remote_differ + the etag_based skip, which is the behavior it was really demonstrating.

This pull request and its description were written by Isaac.

…nt hashes

The direct deploy engine persists the full planned config per resource in
resources.json. For dashboards and genie spaces, that includes the inlined
serialized_dashboard / serialized_space contents, which routinely run into the
hundreds of KB to several MB. These fields are never read back from state: drift
is detected via etag (they are ignore_remote_changes), and a deploy always sends
the contents from the plan's new_state, never from saved state.

Add an optional CompactState adapter method that replaces such equality-only
fields with a "sha256:<hex>" content hash. The framework applies it both before
persisting state and to every value entering the diff, so stored and compared
values share one form and unchanged content still produces an equal-hash skip.
The plan's new_state (sent to the API) and the raw remote_state snapshot keep
full content.

No state version bump: legacy full-content state is hashed on read for the
comparison and rewritten compactly on the next save.

Co-authored-by: Isaac
@github-actions

Copy link
Copy Markdown
Contributor

An authorized user can trigger integration tests manually by following the instructions below:

Trigger:
go/deco-tests-run/cli

Inputs:

  • PR number: 5609
  • Commit SHA: 4066dd65f0bec721d206a934347c134a5a55d4dd

Checks will be approved automatically on success.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant