chore(tesseract): run pre-aggregation integration tests against live CubeStore#11018
chore(tesseract): run pre-aggregation integration tests against live CubeStore#11018waralexrom wants to merge 8 commits into
Conversation
… on live CubeStore Add an integration-cubestore feature that runs Tesseract queries fully covered by external pre-aggregations against a locally built cubestored: - cubestore_service.rs spawns a shared cubestored (free ports, temp data dir) and gives each test an isolated schema over mysql_async - TestContext::try_execute_cubestore builds pre-agg tables in Postgres, uploads them into CubeStore (CREATE TABLE + INSERT), then runs the outer query rendered in the CubeStore dialect - MockBaseTools carries external driver tools; cubestore_templates mirrors CubeStoreQuery.sqlTemplates() so external-covered queries render CubeStore SQL - pre-agg YAML indexes parsed into the mock and emitted as INDEX / AGGREGATE INDEX + AGGREGATIONS on the CubeStore table
Replace CREATE TABLE + INSERT with writing a local CSV and loading it through CREATE TABLE ... LOCATION, the same import pipeline production rollups use. Values are serialized to match the JS postgres type parsers (timestamp -> YYYY-MM-DDTHH:mm:ss.SSS, bool true/false, NULL as empty field), which removes the timestamptz trim workaround. Also harden the schema-qualifying regex with a trailing token boundary to avoid prefix-name collisions, and remove the cubestored temp data dir on exit.
Make pre-aggregation integration tests follow production routing: rollup and rollupJoin default to external (CUBEJS_EXTERNAL_DEFAULT=true) in the YAML mock, and TestContext::try_execute runs external-covered queries on CubeStore (skipped without the integration-cubestore feature) and everything else on Postgres. All pre_aggregations tests migrated. Mirror the CubeStore dialect's native DriverTools methods in the mock (date_bin -> DATE_BIN, time_stamp_cast -> CAST as TIMESTAMP, date_time_cast -> to_timestamp, convert_tz -> CONVERT_TZ); previously the external mock carried CubeStore templates but Postgres method bodies, so it ran Postgres SQL against CubeStore. Sort result rows for deterministic snapshots.
…build Resolve cubestored from the downloaded @cubejs-backend/cubestore release binary by default (suits Tesseract development without a 10-minute CubeStore build). Set CUBESTORED_LOCAL_BUILD=1 to use a local cargo build from rust/cubestore/target (release preferred — debug stack-overflows on deep multi-stage plans). CUBESTORED_BIN_PATH still overrides everything.
Resolve cubestored as: local release build (where CI drops the build-cubestore artifact) -> downloaded @cubejs-backend/cubestore release. A debug build is ignored by default (it stack-overflows on deep multi-stage plans) but warns when present and unused; CUBESTORED_ALLOW_DEBUG=1 opts into it for fast local iteration on features that don't hit the overflow. CUBESTORED_BIN_PATH still overrides everything.
unit-core now consumes the build-cubestore release artifact (dropped into rust/cubestore/target/release, where the resolver looks first) and runs the cube workspace with cubesqlplanner/integration-cubestore instead of integration-postgres, so pre-aggregation tests execute on CubeStore. The artifact is the current source tree's CubeStore, so the same job catches both Tesseract and CubeStore regressions.
Remove the post-execution row sort in the CubeStore result formatter and give every executing pre-agg test a total `order:` over its group-by keys, so row order is exercised by the query (CubeStore's auto ORDER BY is only partial) instead of being silently masked. NULLs now sort last (ASC), as CubeStore orders them. Also make AGGREGATIONS rendering for an aggregate index fail loudly (unimplemented!) on unsupported measure types (e.g. countDistinctApprox/HLL) instead of silently dropping them while still emitting AGGREGATE INDEX.
|
Claude finished @waralexrom's task in 2m 10s —— View job Claude PR Review — LGTM with minor observations (0 blockers, 0 high, 0 medium, 4 low/nits)
SummarySolid, carefully written change. The new harness puts the pre-aggregation suite onto the real production path (Postgres builds the rollup → CSV → Correctness & design — looks good
Nits / small follow-ups
Testing/CI
SecurityNothing concerning. The MySQL connection is loopback-only on a free port, the CSV path is process+schema+table scoped under |
…ature MockPreAggregationIndex and the indexes field/accessor are only read by the integration-cubestore-gated CubeStore table builder; allow dead_code only when that feature is off so clippy --all-targets is clean in every config.
|
Claude finished @waralexrom's task in 1m 7s —— View job Claude PR Review — LGTM with minor observations (0 blockers, 0 high, 0 medium, 4 low/nits) Re-review after sync — findings unchanged
SummaryThe 8 commits in this push are the same set already covered by the prior review at 11:45 — re-verified against the current tree, all findings still hold. The harness puts the pre-aggregation suite onto the real production path (Postgres builds the rollup → CSV → Correctness & design — looks good
Nits / small follow-ups (unchanged from prior review)
Testing/CI
SecurityNothing concerning. MySQL connection is loopback-only on a free port; CSV path is process+schema+table scoped under · branch |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #11018 +/- ##
==========================================
- Coverage 83.32% 78.82% -4.50%
==========================================
Files 255 471 +216
Lines 77177 94442 +17265
Branches 0 3502 +3502
==========================================
+ Hits 64304 74446 +10142
- Misses 12873 19488 +6615
- Partials 0 508 +508
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Summary
Tesseract's pre-aggregation integration tests previously executed their rollup queries against Postgres, which doesn't match production — in the real system external pre-aggregations live in CubeStore and are served by the CubeStore query engine. This PR makes those tests run against a live
cubestored, so the suite exercises the actual production path and catches CubeStore/Tesseract consistency regressions.Changes
integration-cubestorefeature + harness: spawns a shared localcubestored(free ports, temp data dir, per-test schema,mysql_async), builds each pre-aggregation in Postgres, uploads it to CubeStore as CSV viaCREATE TABLE ... LOCATION(the same import pipeline production rollups use), then runs the outer query rendered in the CubeStore dialect.external(mirroringCUBEJS_EXTERNAL_DEFAULT=true);try_executesends external-covered queries to CubeStore (skipped without the feature) and everything else to Postgres. Allpre_aggregations/tests migrated.DriverToolsmethods (date_bin→DATE_BIN,time_stamp_cast→CAST as TIMESTAMP,date_time_cast→to_timestamp,convert_tz→CONVERT_TZ) plus theCubeStoreQuery.sqlTemplates()overrides, so the external path renders real CubeStore SQL rather than Postgres SQL.INDEX/AGGREGATE INDEX+AGGREGATIONSon the CubeStore table.order:(no post-sort masking), soORDER BYrendering stays under test.build-cubestoreartifact), falls back to the downloaded@cubejs-backend/cubestorerelease; debug builds warn (they stack-overflow on deep multi-stage plans) and requireCUBESTORED_ALLOW_DEBUG=1to opt in.unit-coreconsumes the existingbuild-cubestorerelease artifact and runs the cube workspace withcubesqlplanner/integration-cubestore— the artifact is always the current source tree's CubeStore, so one job covers both Tesseract and CubeStore regressions.Testing
cargo test --features integration-cubestore(releasecubestored):pre_aggregations47 passed,cubestore1 passed;cargo test --lib(no feature) 1001 passed. Verified migrated snapshot values match the prior Postgres results (modulo CubeStore decimal/timestamp formatting and the now query-driven order).🤖 Generated with Claude Code