feat(bigquery-jdbc): add `EnableProjectDiscovery` connection property for metadata methods by keshavdandeva · Pull Request #13344 · googleapis/google-cloud-java

keshavdandeva · 2026-06-03T14:14:59Z

b/499078725

This PR introduces the EnableProjectDiscovery connection property (default false), which allows JDBC metadata methods to discover and query across all accessible Google Cloud projects, rather than being strictly limited to the project specified in the connection URL.

Key Changes:

Connection Property: Added EnableProjectDiscovery parameter parsing in BigQueryJdbcUrlUtility and DataSource.
Project Discovery: Implemented BigQueryConnection.getDiscoveredProjects() to fetch all accessible GCP projects using the underlying low-level Bigquery HTTP client.
Caching: Added connection-scoped caching for the discovered project list to prevent redundant and expensive API calls.
Metadata Integration: Updated BigQueryDatabaseMetaData.getCatalogs() and getSchemas() to return information across all discovered projects when the flag is enabled.
Concurrency: Parallelized dataset/schema fetching in getSchemas() using a fixed thread pool to significantly improve performance when scanning across multiple discovered projects.
Testing: Added unit tests verifying the property parsing, connection caching, and parallel metadata fetching logic.

…ethods

gemini-code-assist

Code Review

This pull request introduces an EnableProjectDiscovery configuration property to automatically discover and list all accessible Google Cloud projects as catalogs. To support this, schema fetching in BigQueryDatabaseMetaData has been refactored to run concurrently using an executor service. Feedback on these changes highlights two main areas for improvement: first, replacing the fragile use of reflection to access the low-level BigQuery client with the standard public BigQuery.listProjects() API; second, ensuring that outstanding asynchronous tasks are properly cancelled if the schema fetching loop is interrupted to prevent resource leaks.

keshavdandeva · 2026-06-03T18:30:32Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces an automatic project discovery feature to the BigQuery JDBC driver, allowing users to discover all accessible Google Cloud projects via a new EnableProjectDiscovery connection property. It also updates BigQueryDatabaseMetaData to parallelize schema fetching across multiple projects. The review feedback highlights several areas for improvement: reusing a shared ExecutorService instead of creating a new one on every getSchemas() call, avoiding caching empty results on transient exceptions in getDiscoveredProjects(), using GsonFactory.getDefaultInstance() for better resource reuse, and preserving the stack trace when logging execution exceptions.

logachev · 2026-06-04T04:09:52Z

+    }
+
+    try {
+      BigQueryOptions options = (BigQueryOptions) getBigQuery().getOptions();


We have already code that provisions BigQuery client, we should not be creating new one (e.g. feels like HttpTransport is missing proxy/private endpoint and other properties that are configured).

Are there issues with SDK client?

Yeah, so the SDK does not expose a public listProjects() method. To invoke the BigQuery REST /projects endpoint, we must drop down to the low-level client's projects().list() call.

I used reflection to extract the underlying client from BigQueryImpl but gemini said it is bad practice and is fragile and can throw dynamic access exceptions (e.g. InaccessibleObjectException under strict modular Java 17+ runtimes) or fail when BigQuery is mocked in tests.

Hence, the current implementation. And it should not bypass proxy, endpoint, or auth configurations.

Proxy: Calling transportOptions.getHttpTransportFactory().create() creates the exact HttpTransport configured in the connection provider (which contains any custom proxy factory settings).

Private Endpoints: options.getResolvedApiaryHost(BIGQUERY_SERVICE_NAME) resolves the correct API host, incorporating user overrides or custom private service endpoints.

Auth/Timeouts/Headers: transportOptions.getHttpRequestInitializer(options) automatically provisions the auth token initialization, user agent headers, and connection timeouts."

logachev · 2026-06-04T04:12:49Z

+  private static final String BIGQUERY_SERVICE_NAME = "bigquery";
+  private static final long MAX_PROJECTS_PER_PAGE = 10000L;
+  private static final String PROJECT_LIST_FIELDS =
+      "projects/projectReference/projectId,nextPageToken";


Better to query projectName rather than projectId imo. This is used in some UI tools

In the BigQuery REST API, there is no projectName field (only projectId and friendlyName). Using projectId is the standard way to reference GCP projects. Also, the catalog name returned by getCatalogs() must be the alphanumeric projectId.

logachev · 2026-06-04T04:15:43Z

+          ExecutorService apiExecutor = null;
+          final List<Future<List<Dataset>>> apiFutures = new ArrayList<>();
          try {
+            apiExecutor = Executors.newFixedThreadPool(API_EXECUTOR_POOL_SIZE);


instead of create threadPool, we should have one available. We have MetaDataFetchThreadCount property available, but we actually lack connection-layer threadPool.

Yes, completely agree. I noticed this as well and this is happening with all major metadata methods (getTables, getColumns, getProcedures, etc.). I created b/520400589 and will work on it in separate PR

logachev · 2026-06-04T04:23:41Z

  }

  @Override
  public ResultSet getSchemas(String catalog, String schemaPattern) {


I'd suggest some refactoring for this method, few ideas to simplify:

Move code to fetch list of schemas in a specific catalog to a separate function;

Keep this method simple - single catalog only, essentially calls into helper & transforms to jsonResultSet. No need for background threads since it is single catalog

Refactor getSchemas() to be the one that fans out multiple requests & assembles data.

Probably breaking up CLs in 2 will be easier:

Add proper threadPool to connection-layer (and remove static threadPool in statement, it is not used)

update getSchemas

Also I'd suggest to reuse more code. There are 4 metadata methods that can generate large # of rest calls:

getCatalogs()

getSchemas()

getTables()

getColumns()

getProcedures()

In a way, they build results on top of each other. Right now we duplicate a lot, e.g. listDatasets() is called in 7 different places.

Also please don't tackle all of them in a single CL, lets do it incrementally :)
(Otherwise it is pain to review)

Yeah, makes sense. I have created 3 bugs:

b/520400589 - Refactor Metadata Thread Pool Management to Reuse Connection-Scoped Executor

b/520407325 - Refactor getSchemas for Catalog-Based Routing (Synchronous & Async Fan-out)

b/520406763 - Deduplicate metadata API calls

feat: add EnableProjectDiscovery connection property for metadata m…

8f75b30

…ethods

gemini-code-assist Bot reviewed Jun 3, 2026

View reviewed changes

Comment thread java-bigquery-jdbc/src/main/java/com/google/cloud/bigquery/jdbc/BigQueryConnection.java

Comment thread java-bigquery-jdbc/src/main/java/com/google/cloud/bigquery/jdbc/BigQueryDatabaseMetaData.java

address pr feedback

81e80a0

gemini-code-assist Bot reviewed Jun 3, 2026

View reviewed changes

address pr feedback

86d62b1

keshavdandeva marked this pull request as ready for review June 3, 2026 19:13

keshavdandeva requested review from a team as code owners June 3, 2026 19:14

keshavdandeva requested review from Neenu1995 and logachev June 3, 2026 19:14

logachev reviewed Jun 4, 2026

View reviewed changes

keshavdandeva requested a review from logachev June 5, 2026 16:12

chore: address pr feedback

f440367

Conversation

keshavdandeva commented Jun 3, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

keshavdandeva commented Jun 3, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants