Skip to content

feat: Support pre-aggregation-specific data source credentials (CORE-…#10587

Open
ovr wants to merge 14 commits intomasterfrom
feat/separate-pre-aggregations-driver
Open

feat: Support pre-aggregation-specific data source credentials (CORE-…#10587
ovr wants to merge 14 commits intomasterfrom
feat/separate-pre-aggregations-driver

Conversation

@ovr
Copy link
Copy Markdown
Member

@ovr ovr commented Mar 31, 2026

No description provided.

@github-actions github-actions bot added driver:mongodb Issues relating to the MongoBI driver driver:redshift Issues relating to the Redshift driver driver:mysql Issues relating to the MySQL/MariaDB driver driver:bigquery Issues related to the BigQuery driver driver:snowflake Issues relating to the Snowflake driver driver:clickhouse Issues related to the ClickHouse driver driver:athena Issues related to the AWS Athena driver driver:mssql Issues relating to the MSSQL driver driver:oracle Issues relating to the Oracle driver driver:prestodb Issues relating to the PrestoDB driver driver:postgres Issues relating to the Postgres driver driver:hive Issues relating to the Apache Hive driver driver:mysql-aurora-serverless Issues relating to the AWS MySQL Aurora Serverless driver driver:sqlite Issues relating to the SQLite driver driver:druid Issues relating to the Druid driver driver:elasticsearch Issues relating to the ElasticSearch driver driver:dremio driver:questdb javascript Pull requests that update Javascript code driver:firebolt data source driver driver:databricks driver:duckdb driver:pinot cubejs-jdbc-driver labels Mar 31, 2026
@ovr ovr force-pushed the feat/separate-pre-aggregations-driver branch from 9ab1520 to 141cdad Compare March 31, 2026 14:37
@ovr ovr force-pushed the feat/separate-pre-aggregations-driver branch from 141cdad to 05bc016 Compare March 31, 2026 15:30
@ovr ovr marked this pull request as ready for review April 9, 2026 13:27
@ovr ovr requested review from a team as code owners April 9, 2026 13:27
@claude
Copy link
Copy Markdown

claude bot commented Apr 9, 2026

Claude finished @ovr's task in 6m 12s —— View job


Code Review: Pre-aggregation-specific data source credentials

  • Gather context and understand PR changes
  • Review core changes (env.ts refactoring)
  • Review driver updates
  • Review server-core changes
  • Review tests
  • Post comprehensive review

Summary

This PR adds support for separate database credentials for pre-aggregation workloads, allowing users to configure a different user/password (or other connection parameters) for pre-aggregation building vs. regular queries. The implementation:

  1. Extends keyByDataSource() to generate CUBEJS_PRE_AGGREGATIONS_* env var names
  2. Adds hasPreAggregationsEnvVars() to detect when separate credentials are configured
  3. Threads preAggregations boolean through all getEnv() calls in every driver
  4. Creates a separate driver instance keyed by dataSource@pre_agg in server-core
  5. Refactors env var reads from raw process.env[...] to use env-var's get() helper (good cleanup!)

The overall architecture is sound — the approach of reusing the existing driver factory with a boolean flag is clean and backward-compatible. However, I found several bugs and one design concern that should be addressed before merge.


Bugs Found

1. dbType test contradicts implementation (inline comment)
The test at db_env_pre_aggregations.test.ts:49-57 expects getEnv('dbType', { preAggregations: true }) to return the PRE_AGGREGATIONS variant ('duckdb'), but the dbType implementation in env.ts:357-360 hardcodes preAggregations to false. This test should be failing.

2. Stale key variable in error messages (inline, inline, inline)
In dbPollMaxInterval, dbPollTimeout, and dbQueryTimeout, the key variable (used for error messages) is computed without preAggregations, but the actual get() call uses preAggregations. Error messages will reference the wrong env var name. Suggestions with fixes are in the inline comments.

3. SSL cert env vars not updated for pre-aggregations (could not attach inline - BaseDriver.ts:273-297)
In getSslOptions(), the SSL certificate env vars (CUBEJS_DB_SSL_CA, CUBEJS_DB_SSL_CERT, CUBEJS_DB_SSL_KEY, etc.) call keyByDataSource() without the preAggregations parameter. If a pre-aggregation connection needs different SSL certificates, they won't be read from the PRE_AGGREGATIONS variant. The preAggregations parameter should be threaded through to these keyByDataSource calls as well.


Design Concern

No fallback to base env vars (inline comment)

When preAggregations: true is set, keyByDataSource always returns the PRE_AGGREGATIONS variant with no fallback. If a user sets CUBEJS_PRE_AGGREGATIONS_DB_USER but not CUBEJS_PRE_AGGREGATIONS_DB_HOST, the host will be undefined rather than falling back to CUBEJS_DB_HOST.

This forces users to duplicate all connection env vars for pre-aggregations, not just the ones they want to override. For a "separate credentials" feature, a fallback pattern would be much more ergonomic:

CUBEJS_DB_HOST=shared-host           # used by both
CUBEJS_DB_USER=regular-user
CUBEJS_PRE_AGGREGATIONS_DB_USER=preagg-user  # only override what differs

If this no-fallback behavior is intentional, it should be clearly documented.


Positive Observations

  • Good refactoring of raw process.env reads to use the env-var get() helper — this adds type safety and cleaner defaults
  • Clean DataSourceOpts type alias reduces parameter boilerplate across ~60+ env var declarations
  • Proper conflict detection in server.ts when both driverFactory and PRE_AGGREGATIONS env vars are set
  • Solid test coverage for the core env var generation logic (single + multi datasource + hasPreAggregationsEnvVars)
  • Consistent driver updates across all 20+ drivers following the same pattern
  • Integration test with postgres using read-only pre-agg user is a good real-world validation

Minor Items

  • dbMinPoolSize has a duplicate get() call and wrong env var name in error message (see inline on hasPreAggregationsEnvVars)
  • Logging in server.ts ('Initializing data source connection') fires for every driver creation, not just pre-agg — consider gating or using debug level
  • The conflict warning log could fire multiple times per dataSource; consider deduplicating

Comment on lines +127 to +136
if (preAggregations) {
const dsMatch = key.match(/^(CUBEJS_DS_[A-Z0-9_]+?_)(DB_|JDBC_|AWS_|DATABASE|FIREBOLT_)(.*)/);
if (dsMatch) {
return `${dsMatch[1]}PRE_AGGREGATIONS_${dsMatch[2]}${dsMatch[3]}`;
}

if (key.startsWith('CUBEJS_')) {
return key.replace(/^CUBEJS_/, 'CUBEJS_PRE_AGGREGATIONS_');
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Consider adding a unit test for edge cases in the regex. The regex ^(CUBEJS_DS_[A-Z0-9_]+?_)(DB_|JDBC_|AWS_|DATABASE|FIREBOLT_)(.*) uses a non-greedy match for the DS name which could behave unexpectedly with certain datasource names.

For example, a datasource named db would produce the key CUBEJS_DS_DB_DB_HOST. The non-greedy [A-Z0-9_]+?_ would match CUBEJS_DS_D first, then try B_DB_HOST against the alternation — B_ doesn't match DB_, so it extends to CUBEJS_DS_DB_ and then matches DB_HOST. This works, but edge cases like datasource names containing DB, AWS, JDBC, etc. as substrings should be tested.

Also, DATABASE in the alternation doesn't have a trailing underscore. For CUBEJS_DS_FOO_DATABASE_SECRET_ARN, group 2 captures DATABASE and group 3 captures _SECRET_ARN (including the leading underscore). This produces CUBEJS_DS_FOO_PRE_AGGREGATIONS_DATABASE_SECRET_ARN which is correct, but the asymmetry with other alternatives (which have trailing _) is subtle.

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 9, 2026

Codecov Report

❌ Patch coverage is 68.20513% with 62 lines in your changes missing coverage. Please review.
✅ Project coverage is 57.93%. Comparing base (4a9453b) to head (d5e3ae5).
⚠️ Report is 20 commits behind head on master.

Files with missing lines Patch % Lines
packages/cubejs-backend-shared/src/env.ts 68.38% 43 Missing ⚠️
...y-orchestrator/src/orchestrator/PreAggregations.ts 40.00% 6 Missing ⚠️
...s/cubejs-clickhouse-driver/src/ClickHouseDriver.ts 63.63% 0 Missing and 4 partials ⚠️
packages/cubejs-druid-driver/src/DruidDriver.ts 20.00% 3 Missing and 1 partial ⚠️
packages/cubejs-base-driver/src/BaseDriver.ts 0.00% 2 Missing ⚠️
...kages/cubejs-bigquery-driver/src/BigQueryDriver.ts 33.33% 0 Missing and 2 partials ⚠️
packages/cubejs-server-core/src/core/server.ts 94.11% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #10587      +/-   ##
==========================================
+ Coverage   57.84%   57.93%   +0.09%     
==========================================
  Files         225      215      -10     
  Lines       17637    16635    -1002     
  Branches     3634     3343     -291     
==========================================
- Hits        10202     9638     -564     
+ Misses       6890     6508     -382     
+ Partials      545      489      -56     
Flag Coverage Δ
cube-backend 57.93% <68.20%> (+0.09%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cubejs-jdbc-driver data source driver driver:athena Issues related to the AWS Athena driver driver:bigquery Issues related to the BigQuery driver driver:clickhouse Issues related to the ClickHouse driver driver:databricks driver:dremio driver:druid Issues relating to the Druid driver driver:duckdb driver:elasticsearch Issues relating to the ElasticSearch driver driver:firebolt driver:hive Issues relating to the Apache Hive driver driver:mongodb Issues relating to the MongoBI driver driver:mssql Issues relating to the MSSQL driver driver:mysql Issues relating to the MySQL/MariaDB driver driver:mysql-aurora-serverless Issues relating to the AWS MySQL Aurora Serverless driver driver:oracle Issues relating to the Oracle driver driver:pinot driver:postgres Issues relating to the Postgres driver driver:prestodb Issues relating to the PrestoDB driver driver:questdb driver:redshift Issues relating to the Redshift driver driver:snowflake Issues relating to the Snowflake driver driver:sqlite Issues relating to the SQLite driver javascript Pull requests that update Javascript code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant