Skip to content

Track compressed resource state sizes in deploy telemetry (direct engine)#5608

Draft
shreyas-goenka wants to merge 1 commit into
mainfrom
shreyas-goenka/telemetry-compressed-resource-sizes
Draft

Track compressed resource state sizes in deploy telemetry (direct engine)#5608
shreyas-goenka wants to merge 1 commit into
mainfrom
shreyas-goenka/telemetry-compressed-resource-sizes

Conversation

@shreyas-goenka

Copy link
Copy Markdown
Contributor

What

Bundle deploy telemetry already reports per-resource-type raw state-size statistics (state_size_{max,mean,median}_bytes in ResourceMetadata). The same per-resource state is stored compressed downstream, so this adds the compressed-size counterparts to gauge how much resource state shrinks under compression, not just the raw sizes:

  • state_compressed_size_max_bytes
  • state_compressed_size_mean_bytes
  • state_compressed_size_median_bytes

How

  • compressedStateSize (in bundle/direct/dstate) compresses each resource's serialized state once at export time, alongside the existing len(state) raw measurement.
  • It uses the standard library's compress/flate rather than a dedicated zstd package: a deliberately rough proxy for the server side (which uses zstd), chosen to keep the dependency/supply-chain surface small while still giving useful signal on compressibility. No new dependency is added.
  • ResourceState carries the new StateCompressedSizeBytes; resourceMetadataFromState aggregates max/mean/median per resource type, mirroring the raw stats.
  • Only the direct engine is measured (matching the existing raw-size behavior); terraform deploys leave it zero/absent.

Performance

flate runs at state-export time over individually small resource states (each well under the server's per-resource limit), not in a tight loop, so even large bundles compress in a few milliseconds — negligible next to a deploy's network I/O. No background goroutine is warranted.

Tests

  • Unit: TestCompressedStateSize (empty → 0, compressible blob shrinks) plus updated resources_metadata aggregation tests.
  • Acceptance: golden updated with the actual compressed sizes. These are deterministic for the fixed test config and stdlib flate, so they are asserted directly (no redaction).

Notes

  • The matching telemetry schema field is added separately.

This pull request and its description were written by Isaac.

…ine)

Deploy telemetry already reports per-resource-type raw state-size statistics
(state_size_{max,mean,median}_bytes). The deployment metadata service stores
that same per-resource state compressed, so this adds compressed-size
counterparts to gauge how much resource state shrinks under compression rather
than just the raw sizes:

  - state_compressed_size_max_bytes
  - state_compressed_size_mean_bytes
  - state_compressed_size_median_bytes

The compressed length is computed once per resource at state-export time
(alongside the existing raw length) using the standard library's compress/flate.
This is a deliberately rough proxy for the server side (which uses zstd): flate
keeps the dependency/supply-chain surface small while still giving useful signal
on compressibility. Only the direct engine is measured, matching the existing
raw-size behavior.

Co-authored-by: Isaac
@eng-dev-ecosystem-bot

Copy link
Copy Markdown
Collaborator

Integration test report

Commit: ece87d5

Run: 27550846260

Env 🟨​KNOWN 🔄​flaky 💚​RECOVERED 🙈​SKIP ✅​pass 🙈​skip Time
💚​ aws linux 7 15 264 978 6:56
🟨​ aws windows 7 15 266 976 13:59
💚​ aws-ucws linux 7 15 360 892 16:22
💚​ aws-ucws windows 7 15 362 890 11:10
💚​ azure linux 1 17 267 976 6:55
🔄​ azure windows 2 1 17 267 974 9:50
🔄​ azure-ucws linux 2 1 17 363 888 14:10
💚​ azure-ucws windows 1 17 367 886 8:42
💚​ gcp linux 1 17 263 979 7:14
💚​ gcp windows 1 17 265 977 7:36
24 interesting tests: 15 SKIP, 7 KNOWN, 2 flaky
Test Name aws linux aws windows aws-ucws linux aws-ucws windows azure linux azure windows azure-ucws linux azure-ucws windows gcp linux gcp windows
🟨​ TestAccept 💚​R 🟨​K 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R
🙈​ TestAccept/bundle/invariant/no_drift 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/permissions 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions 💚​R 🟨​K 💚​R 💚​R 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=direct 💚​R 🟨​K 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=terraform 💚​R 🟨​K 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions 💚​R 🟨​K 💚​R 💚​R 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=direct 💚​R 🟨​K 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=terraform 💚​R 🟨​K 💚​R 💚​R
🙈​ TestAccept/bundle/resources/postgres_branches/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/recreate 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/replace_existing 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/update_protected 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/without_branch_id 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_endpoints/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_endpoints/recreate 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_projects/update_display_name 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/synced_database_tables/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_endpoints/drift/recreated_same_name 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_indexes/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_indexes/grants/select 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/ssh/connection 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🔄​ TestFetchRepositoryInfoAPI_FromRepo/root ✅​p ✅​p ✅​p ✅​p ✅​p 🔄​f 🔄​f ✅​p ✅​p ✅​p
🔄​ TestFetchRepositoryInfoAPI_FromRepo/subdir ✅​p ✅​p ✅​p ✅​p ✅​p 🔄​f 🔄​f ✅​p ✅​p ✅​p
Top 30 slowest tests (at least 2 minutes):
duration env testname
4:58 gcp windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
4:22 gcp linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
4:17 gcp linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
4:15 gcp windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
4:09 aws-ucws linux TestAccept/bundle/deploy/files/no-snapshot-sync/DATABRICKS_BUNDLE_ENGINE=direct
4:03 aws-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
4:01 aws-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:52 aws-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:52 azure linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:28 azure-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:26 aws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:25 aws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:13 azure-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:08 azure-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:59 aws-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:52 aws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:48 aws-ucws linux TestAccept/bundle/run_as/job_default/DATABRICKS_BUNDLE_ENGINE=direct
2:43 azure windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:37 aws-ucws linux TestAccept/bundle/resources/model_serving_endpoints/basic/DATABRICKS_BUNDLE_ENGINE=direct
2:37 azure linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:37 aws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:36 azure windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:30 azure windows TestAccept
2:30 azure-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:29 aws-ucws windows TestAccept
2:29 azure-ucws windows TestAccept
2:28 gcp windows TestAccept
2:17 aws-ucws linux TestAccept/bundle/resources/volumes/recreate/DATABRICKS_BUNDLE_ENGINE=direct
2:07 aws-ucws linux TestAccept/bundle/deployment/bind/job/generate-and-bind/DATABRICKS_BUNDLE_ENGINE=terraform
2:01 azure-ucws linux TestAccept/bundle/resources/model_serving_endpoints/basic/DATABRICKS_BUNDLE_ENGINE=terraform

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants