Feature Request: Enhanced Metadata Discovery for Databricks ADBC Driver

### Driver Version

_No response_

### Driver Type

Go

### What feature or improvement would you like to see?

Context

We build a data operations platform that relies on comprehensive metadata discovery to provide users with a rich catalog browsing experience. We currently use the Databricks JDBC driver and supplement it with SHOW commands and custom parsing to extract metadata. We're evaluating the ADBC Databricks driver as a modern Arrow-native alternative, but several metadata categories we depend on are missing.

Below is a prioritized list of the metadata gaps we need addressed, organized by category. For each, we describe what we need, why it matters, and how we currently obtain it via JDBC/SQL.

---
Priority 1 — Functions & Routines

1.1 User-Defined Function Discovery

What we need: An API or query-based mechanism to enumerate user-defined functions (scalar and table-valued) within a catalog and schema, including:
│   Property    │              Description              │
|---|---|
│ Function name │ Fully qualified name                  │
│ Function type │ Scalar vs table-valued                │
│ Language      │ SQL, Python, etc.                          │
│ Determinism   │ Whether the function is deterministic │
│ Comment       │ User-provided description             │
│ Security type │ DEFINER or INVOKER                    │
 
Why it matters: Functions are first-class objects in Unity Catalog. Users need to browse, understand, and reference UDFs alongside tables and views.

How we get it today: JDBC DatabaseMetaData.getFunctions() returns functions with metadata encoded in the REMARKS column using prefixed keys (sqlFunction.*, pythonUDF.*). We parse these with a custom parser.

 1.2 Function Parameters

What we need: For each function, the ordered list of input parameters and the return type:
│      Property       │                                 Description                                 │
|---|---|
│ Parameter name      │ Name of each input parameter                                                │
│ Parameter position  │ Ordinal position (1-based)                                                  │
│ Parameter data type │ Databricks type name                                                        │
│ Parameter mode      │ IN, OUT, INOUT                                                              │
│ Return type         │ For scalar functions, the return data type                                  │
│ Return columns      │ For table-valued functions, the list of result columns with names and types │

Why it matters: Users need parameter signatures to correctly invoke functions. Our product auto-generates function call templates and provides signature help.

How we get it today: Parsed from JDBC getFunctions() REMARKS field — keys inputParam, returnType, isTableFunc, query, expression, codeLiteral.

1.3 System/Built-in Function Discovery

What we need: Enumeration of Databricks built-in functions with at minimum:

│     Property      │          Description           │
|---|---|
│ Function name     │ e.g., abs, concat, date_format │
│ Usage/description │ What the function does         │

Why it matters: We provide autocomplete and documentation for built-in functions. Users expect to see all available functions when writing SQL.

How we get it today: JDBC DatabaseMetaData.getFunctions(null, null, "%") filtered to exclude user-defined function prefixes.

  ---
  Priority 2 — Constraints

 2.1 Primary Key Constraints

 What we need: For each table, the primary key constraint (if any):

│    Property     │            Description             │
|---|---|
│ Constraint name │ PK constraint name                 │
│ Column names    │ Ordered list of columns in the key │

Current ADBC status: The GetObjects schema includes table_constraints with CONSTRAINT_SCHEMA, but it's unclear whether primary keys are actually populated for Databricks tables.

How we get it today: Parsed from the SHOW TABLE EXTENDED output — the "Table Constraints" section contains lines like PRIMARY KEY (col1, col2).

2.2 Foreign Key Constraints

What we need: For each table, foreign key constraints including:

│      Property      │     Description     │
|---|---|
│ Constraint name    │ FK constraint name  │
│ Local column names │ Columns in the FK   │
│ Referenced catalog │ Target catalog      │
│ Referenced schema  │ Target schema       │
│ Referenced table   │ Target table        │
│ Referenced columns │ Target column names │

How we get it today: Parsed from SHOW TABLE EXTENDED — lines like FOREIGN KEY (col) REFERENCES catalog.schema.table(ref_col).

2.3 Check Constraints

What we need: Named check constraints with their expression:
  
│    Property     │              Description               │
|---|---|
│ Constraint name │ Check constraint name                  │
│ Expression      │ The boolean expression (e.g., col > 0) │

How we get it today: Extracted from SHOW TABLE EXTENDED table properties — entries like delta.constraints.mycheck = (col > 0).

  ---
Priority 3 — View Definitions

 3.1 View SQL Text

  What we need: The defining SQL query for views:
  
│    Property     │                Description                 │
|---|---|
│ View name       │ Fully qualified view name                  │
│ View definition │ The SELECT statement that defines the view │

Why it matters: Users inspect, copy, and refactor view definitions. This is essential for understanding data lineage and for migration workflows.

How we get it today: SHOW CREATE TABLE {view_fqn} returns the complete CREATE VIEW statement.

  ---
  Priority 4 — Partition Metadata

  4.1 Partition Columns

  What we need: For partitioned tables, the ordered list of partition columns:
│     Property     │         Description          │
|---|---|
│ Column name      │ Partition column name        │
│ Data type        │ Column data type             │
│ Ordinal position │ Position in partition scheme │

Why it matters: Partition-aware querying is critical for performance on large Databricks tables. Users need to know which columns to filter on.

How we get it today: Parsed from SHOW TABLE EXTENDED output — the "Partition Columns" section contains entries like [col1 string, col2 int].

4.2 Partition Values

What we need: The distinct partition values for a partitioned table:

│    Property    │                  Description                   │
|---|---|
│ Partition spec │ Map of column-name to value for each partition │

How we get it today: SHOW PARTITIONS {catalog}.{schema}.{table} returns one row per partition with dynamic columns for each partition key.

  ---
Priority 5 — DDL Generation

5.1 Table/View DDL

What we need: The complete DDL statement to recreate a table or view:
  
│  Property   │                  Description                   │
|---|---|
│ Object name │ Fully qualified object name                    │
│ DDL text    │ Complete CREATE TABLE or CREATE VIEW statement │

Why it matters: DDL is used for documentation, migration, version control, and understanding table structure including storage format, location, properties, and clustering.

How we get it today: SHOW CREATE TABLE {fqn} returns the full DDL.
  
  ---
Priority 6 — Extended Table Metadata

6.1 Table Properties & Characteristics

What we need: Additional table-level metadata beyond name and type:
│      Property       │         Description          │
|---|---|│ Owner               │ Table owner                  │
│ Comment/description │ Table-level comment          │
│ Table format        │ Delta, Parquet, CSV, etc.    │
│ Storage location    │ For external tables, the URI │
│ Created time        │ When the table was created   │
│ Last modified time  │ Last data modification       │
 │ Table properties    │ Key-value properties map     │
  
How we get it today: All extracted from SHOW TABLE EXTENDED output, which returns a rich information blob containing these fields.

  Current gap: GetObjects returns only table_name and table_type at the table level.

6.2 Table Statistics

What we need: Row counts and size information:

  │        Property         │                   Description                   │
  |---|---|
  │ Row count               │ Number of rows                                  │
  │ Size in bytes           │ Table storage size                              │
  │ Column-level statistics │ Min, max, distinct count, null count per column │

  Current ADBC status: GetStatistics() is listed as not supported.

  ---
  Summary Table
  
  │  #  │       Feature       │      ADBC Spec Coverage      │          Current Status           │ Priority │
  |---|---|---|---|---|
  │ 1.1 │ UDF discovery       │ No standard method           │ Not implemented                   │ P1       │
 │ 1.2 │ Function parameters │ No standard method           │ Not implemented                   │ P1       │
 │ 1.3 │ Built-in functions  │ No standard method           │ Not implemented                   │ P1       │
 │ 2.1 │ Primary keys        │ GetObjects constraint schema │ Schema exists, population unclear │ P2       │
 │ 2.2 │ Foreign keys        │ GetObjects constraint schema │ Schema exists, population unclear │ P2       │
 │ 2.3 │ Check constraints   │ GetObjects constraint schema │ Schema exists, population unclear │ P2       │ 
 │ 3.1 │ View definitions    │ No standard field            │ Not implemented                   │ P3       │
 │ 4.1 │ Partition columns   │ No standard method           │ Not implemented                   │ P4       │
 │ 4.2 │ Partition values    │ No standard method           │ Not implemented                   │ P4       │
 │ 5.1 │ DDL generation      │ No standard method           │ Not implemented                   │ P5       │
  │ 6.1 │ Table properties    │ No standard fields           │ Not implemented                   │ P6       │
  │ 6.2 │ Table statistics    │ GetStatistics API            │ Not implemented                   │ P6       │

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Enhanced Metadata Discovery for Databricks ADBC Driver #236

Driver Version

Driver Type

What feature or improvement would you like to see?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature Request: Enhanced Metadata Discovery for Databricks ADBC Driver #236

Description

Driver Version

Driver Type

What feature or improvement would you like to see?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions