Driver Version
No response
Driver Type
Go
What feature or improvement would you like to see?
Context
We build a data operations platform that relies on comprehensive metadata discovery to provide users with a rich catalog browsing experience. We currently use the Databricks JDBC driver and supplement it with SHOW commands and custom parsing to extract metadata. We're evaluating the ADBC Databricks driver as a modern Arrow-native alternative, but several metadata categories we depend on are missing.
Below is a prioritized list of the metadata gaps we need addressed, organized by category. For each, we describe what we need, why it matters, and how we currently obtain it via JDBC/SQL.
Priority 1 — Functions & Routines
1.1 User-Defined Function Discovery
What we need: An API or query-based mechanism to enumerate user-defined functions (scalar and table-valued) within a catalog and schema, including:
│ Property │ Description │
|---|---|
│ Function name │ Fully qualified name │
│ Function type │ Scalar vs table-valued │
│ Language │ SQL, Python, etc. │
│ Determinism │ Whether the function is deterministic │
│ Comment │ User-provided description │
│ Security type │ DEFINER or INVOKER │
Why it matters: Functions are first-class objects in Unity Catalog. Users need to browse, understand, and reference UDFs alongside tables and views.
How we get it today: JDBC DatabaseMetaData.getFunctions() returns functions with metadata encoded in the REMARKS column using prefixed keys (sqlFunction., pythonUDF.). We parse these with a custom parser.
1.2 Function Parameters
What we need: For each function, the ordered list of input parameters and the return type:
│ Property │ Description │
|---|---|
│ Parameter name │ Name of each input parameter │
│ Parameter position │ Ordinal position (1-based) │
│ Parameter data type │ Databricks type name │
│ Parameter mode │ IN, OUT, INOUT │
│ Return type │ For scalar functions, the return data type │
│ Return columns │ For table-valued functions, the list of result columns with names and types │
Why it matters: Users need parameter signatures to correctly invoke functions. Our product auto-generates function call templates and provides signature help.
How we get it today: Parsed from JDBC getFunctions() REMARKS field — keys inputParam, returnType, isTableFunc, query, expression, codeLiteral.
1.3 System/Built-in Function Discovery
What we need: Enumeration of Databricks built-in functions with at minimum:
│ Property │ Description │
|---|---|
│ Function name │ e.g., abs, concat, date_format │
│ Usage/description │ What the function does │
Why it matters: We provide autocomplete and documentation for built-in functions. Users expect to see all available functions when writing SQL.
How we get it today: JDBC DatabaseMetaData.getFunctions(null, null, "%") filtered to exclude user-defined function prefixes.
Priority 2 — Constraints
2.1 Primary Key Constraints
What we need: For each table, the primary key constraint (if any):
│ Property │ Description │
|---|---|
│ Constraint name │ PK constraint name │
│ Column names │ Ordered list of columns in the key │
Current ADBC status: The GetObjects schema includes table_constraints with CONSTRAINT_SCHEMA, but it's unclear whether primary keys are actually populated for Databricks tables.
How we get it today: Parsed from the SHOW TABLE EXTENDED output — the "Table Constraints" section contains lines like PRIMARY KEY (col1, col2).
2.2 Foreign Key Constraints
What we need: For each table, foreign key constraints including:
│ Property │ Description │
|---|---|
│ Constraint name │ FK constraint name │
│ Local column names │ Columns in the FK │
│ Referenced catalog │ Target catalog │
│ Referenced schema │ Target schema │
│ Referenced table │ Target table │
│ Referenced columns │ Target column names │
How we get it today: Parsed from SHOW TABLE EXTENDED — lines like FOREIGN KEY (col) REFERENCES catalog.schema.table(ref_col).
2.3 Check Constraints
What we need: Named check constraints with their expression:
│ Property │ Description │
|---|---|
│ Constraint name │ Check constraint name │
│ Expression │ The boolean expression (e.g., col > 0) │
How we get it today: Extracted from SHOW TABLE EXTENDED table properties — entries like delta.constraints.mycheck = (col > 0).
Priority 3 — View Definitions
3.1 View SQL Text
What we need: The defining SQL query for views:
│ Property │ Description │
|---|---|
│ View name │ Fully qualified view name │
│ View definition │ The SELECT statement that defines the view │
Why it matters: Users inspect, copy, and refactor view definitions. This is essential for understanding data lineage and for migration workflows.
How we get it today: SHOW CREATE TABLE {view_fqn} returns the complete CREATE VIEW statement.
Priority 4 — Partition Metadata
4.1 Partition Columns
What we need: For partitioned tables, the ordered list of partition columns:
│ Property │ Description │
|---|---|
│ Column name │ Partition column name │
│ Data type │ Column data type │
│ Ordinal position │ Position in partition scheme │
Why it matters: Partition-aware querying is critical for performance on large Databricks tables. Users need to know which columns to filter on.
How we get it today: Parsed from SHOW TABLE EXTENDED output — the "Partition Columns" section contains entries like [col1 string, col2 int].
4.2 Partition Values
What we need: The distinct partition values for a partitioned table:
│ Property │ Description │
|---|---|
│ Partition spec │ Map of column-name to value for each partition │
How we get it today: SHOW PARTITIONS {catalog}.{schema}.{table} returns one row per partition with dynamic columns for each partition key.
Priority 5 — DDL Generation
5.1 Table/View DDL
What we need: The complete DDL statement to recreate a table or view:
│ Property │ Description │
|---|---|
│ Object name │ Fully qualified object name │
│ DDL text │ Complete CREATE TABLE or CREATE VIEW statement │
Why it matters: DDL is used for documentation, migration, version control, and understanding table structure including storage format, location, properties, and clustering.
How we get it today: SHOW CREATE TABLE {fqn} returns the full DDL.
Priority 6 — Extended Table Metadata
6.1 Table Properties & Characteristics
What we need: Additional table-level metadata beyond name and type:
│ Property │ Description │
|---|---|│ Owner │ Table owner │
│ Comment/description │ Table-level comment │
│ Table format │ Delta, Parquet, CSV, etc. │
│ Storage location │ For external tables, the URI │
│ Created time │ When the table was created │
│ Last modified time │ Last data modification │
│ Table properties │ Key-value properties map │
How we get it today: All extracted from SHOW TABLE EXTENDED output, which returns a rich information blob containing these fields.
Current gap: GetObjects returns only table_name and table_type at the table level.
6.2 Table Statistics
What we need: Row counts and size information:
│ Property │ Description │
|---|---|
│ Row count │ Number of rows │
│ Size in bytes │ Table storage size │
│ Column-level statistics │ Min, max, distinct count, null count per column │
Current ADBC status: GetStatistics() is listed as not supported.
Summary Table
│ # │ Feature │ ADBC Spec Coverage │ Current Status │ Priority │
|---|---|---|---|---|
│ 1.1 │ UDF discovery │ No standard method │ Not implemented │ P1 │
│ 1.2 │ Function parameters │ No standard method │ Not implemented │ P1 │
│ 1.3 │ Built-in functions │ No standard method │ Not implemented │ P1 │
│ 2.1 │ Primary keys │ GetObjects constraint schema │ Schema exists, population unclear │ P2 │
│ 2.2 │ Foreign keys │ GetObjects constraint schema │ Schema exists, population unclear │ P2 │
│ 2.3 │ Check constraints │ GetObjects constraint schema │ Schema exists, population unclear │ P2 │
│ 3.1 │ View definitions │ No standard field │ Not implemented │ P3 │
│ 4.1 │ Partition columns │ No standard method │ Not implemented │ P4 │
│ 4.2 │ Partition values │ No standard method │ Not implemented │ P4 │
│ 5.1 │ DDL generation │ No standard method │ Not implemented │ P5 │
│ 6.1 │ Table properties │ No standard fields │ Not implemented │ P6 │
│ 6.2 │ Table statistics │ GetStatistics API │ Not implemented │ P6 │
Driver Version
No response
Driver Type
Go
What feature or improvement would you like to see?
Context
We build a data operations platform that relies on comprehensive metadata discovery to provide users with a rich catalog browsing experience. We currently use the Databricks JDBC driver and supplement it with SHOW commands and custom parsing to extract metadata. We're evaluating the ADBC Databricks driver as a modern Arrow-native alternative, but several metadata categories we depend on are missing.
Below is a prioritized list of the metadata gaps we need addressed, organized by category. For each, we describe what we need, why it matters, and how we currently obtain it via JDBC/SQL.
Priority 1 — Functions & Routines
1.1 User-Defined Function Discovery
What we need: An API or query-based mechanism to enumerate user-defined functions (scalar and table-valued) within a catalog and schema, including:
│ Property │ Description │
|---|---|
│ Function name │ Fully qualified name │
│ Function type │ Scalar vs table-valued │
│ Language │ SQL, Python, etc. │
│ Determinism │ Whether the function is deterministic │
│ Comment │ User-provided description │
│ Security type │ DEFINER or INVOKER │
Why it matters: Functions are first-class objects in Unity Catalog. Users need to browse, understand, and reference UDFs alongside tables and views.
How we get it today: JDBC DatabaseMetaData.getFunctions() returns functions with metadata encoded in the REMARKS column using prefixed keys (sqlFunction., pythonUDF.). We parse these with a custom parser.
1.2 Function Parameters
What we need: For each function, the ordered list of input parameters and the return type:
│ Property │ Description │
|---|---|
│ Parameter name │ Name of each input parameter │
│ Parameter position │ Ordinal position (1-based) │
│ Parameter data type │ Databricks type name │
│ Parameter mode │ IN, OUT, INOUT │
│ Return type │ For scalar functions, the return data type │
│ Return columns │ For table-valued functions, the list of result columns with names and types │
Why it matters: Users need parameter signatures to correctly invoke functions. Our product auto-generates function call templates and provides signature help.
How we get it today: Parsed from JDBC getFunctions() REMARKS field — keys inputParam, returnType, isTableFunc, query, expression, codeLiteral.
1.3 System/Built-in Function Discovery
What we need: Enumeration of Databricks built-in functions with at minimum:
│ Property │ Description │
|---|---|
│ Function name │ e.g., abs, concat, date_format │
│ Usage/description │ What the function does │
Why it matters: We provide autocomplete and documentation for built-in functions. Users expect to see all available functions when writing SQL.
How we get it today: JDBC DatabaseMetaData.getFunctions(null, null, "%") filtered to exclude user-defined function prefixes.
Priority 2 — Constraints
2.1 Primary Key Constraints
What we need: For each table, the primary key constraint (if any):
│ Property │ Description │
|---|---|
│ Constraint name │ PK constraint name │
│ Column names │ Ordered list of columns in the key │
Current ADBC status: The GetObjects schema includes table_constraints with CONSTRAINT_SCHEMA, but it's unclear whether primary keys are actually populated for Databricks tables.
How we get it today: Parsed from the SHOW TABLE EXTENDED output — the "Table Constraints" section contains lines like PRIMARY KEY (col1, col2).
2.2 Foreign Key Constraints
What we need: For each table, foreign key constraints including:
│ Property │ Description │
|---|---|
│ Constraint name │ FK constraint name │
│ Local column names │ Columns in the FK │
│ Referenced catalog │ Target catalog │
│ Referenced schema │ Target schema │
│ Referenced table │ Target table │
│ Referenced columns │ Target column names │
How we get it today: Parsed from SHOW TABLE EXTENDED — lines like FOREIGN KEY (col) REFERENCES catalog.schema.table(ref_col).
2.3 Check Constraints
What we need: Named check constraints with their expression:
│ Property │ Description │
|---|---|
│ Constraint name │ Check constraint name │
│ Expression │ The boolean expression (e.g., col > 0) │
How we get it today: Extracted from SHOW TABLE EXTENDED table properties — entries like delta.constraints.mycheck = (col > 0).
Priority 3 — View Definitions
3.1 View SQL Text
What we need: The defining SQL query for views:
│ Property │ Description │
|---|---|
│ View name │ Fully qualified view name │
│ View definition │ The SELECT statement that defines the view │
Why it matters: Users inspect, copy, and refactor view definitions. This is essential for understanding data lineage and for migration workflows.
How we get it today: SHOW CREATE TABLE {view_fqn} returns the complete CREATE VIEW statement.
Priority 4 — Partition Metadata
4.1 Partition Columns
What we need: For partitioned tables, the ordered list of partition columns:
│ Property │ Description │
|---|---|
│ Column name │ Partition column name │
│ Data type │ Column data type │
│ Ordinal position │ Position in partition scheme │
Why it matters: Partition-aware querying is critical for performance on large Databricks tables. Users need to know which columns to filter on.
How we get it today: Parsed from SHOW TABLE EXTENDED output — the "Partition Columns" section contains entries like [col1 string, col2 int].
4.2 Partition Values
What we need: The distinct partition values for a partitioned table:
│ Property │ Description │
|---|---|
│ Partition spec │ Map of column-name to value for each partition │
How we get it today: SHOW PARTITIONS {catalog}.{schema}.{table} returns one row per partition with dynamic columns for each partition key.
Priority 5 — DDL Generation
5.1 Table/View DDL
What we need: The complete DDL statement to recreate a table or view:
│ Property │ Description │
|---|---|
│ Object name │ Fully qualified object name │
│ DDL text │ Complete CREATE TABLE or CREATE VIEW statement │
Why it matters: DDL is used for documentation, migration, version control, and understanding table structure including storage format, location, properties, and clustering.
How we get it today: SHOW CREATE TABLE {fqn} returns the full DDL.
Priority 6 — Extended Table Metadata
6.1 Table Properties & Characteristics
What we need: Additional table-level metadata beyond name and type:
│ Property │ Description │
|---|---|│ Owner │ Table owner │
│ Comment/description │ Table-level comment │
│ Table format │ Delta, Parquet, CSV, etc. │
│ Storage location │ For external tables, the URI │
│ Created time │ When the table was created │
│ Last modified time │ Last data modification │
│ Table properties │ Key-value properties map │
How we get it today: All extracted from SHOW TABLE EXTENDED output, which returns a rich information blob containing these fields.
Current gap: GetObjects returns only table_name and table_type at the table level.
6.2 Table Statistics
What we need: Row counts and size information:
│ Property │ Description │
|---|---|
│ Row count │ Number of rows │
│ Size in bytes │ Table storage size │
│ Column-level statistics │ Min, max, distinct count, null count per column │
Current ADBC status: GetStatistics() is listed as not supported.
Summary Table
│ # │ Feature │ ADBC Spec Coverage │ Current Status │ Priority │
|---|---|---|---|---|
│ 1.1 │ UDF discovery │ No standard method │ Not implemented │ P1 │
│ 1.2 │ Function parameters │ No standard method │ Not implemented │ P1 │
│ 1.3 │ Built-in functions │ No standard method │ Not implemented │ P1 │
│ 2.1 │ Primary keys │ GetObjects constraint schema │ Schema exists, population unclear │ P2 │
│ 2.2 │ Foreign keys │ GetObjects constraint schema │ Schema exists, population unclear │ P2 │
│ 2.3 │ Check constraints │ GetObjects constraint schema │ Schema exists, population unclear │ P2 │
│ 3.1 │ View definitions │ No standard field │ Not implemented │ P3 │
│ 4.1 │ Partition columns │ No standard method │ Not implemented │ P4 │
│ 4.2 │ Partition values │ No standard method │ Not implemented │ P4 │
│ 5.1 │ DDL generation │ No standard method │ Not implemented │ P5 │
│ 6.1 │ Table properties │ No standard fields │ Not implemented │ P6 │
│ 6.2 │ Table statistics │ GetStatistics API │ Not implemented │ P6 │