Skip to content

Latest commit

Β 

History

History
754 lines (703 loc) Β· 30.1 KB

File metadata and controls

754 lines (703 loc) Β· 30.1 KB

Unity Catalog Roadmap

This document outlines the roadmap for the Unity Catalog open source project. As always, features may move in/out of milestones pending available resources and priorities.

0.4 Release priorities

Storage management

By more tightly integrating the already released credential and external locations API with the rest of the server internals, the next release will allow for more fine-grained, dynamic, and online management of storage locations and their credentials. Furthermore operators can delegate some storage management to the catalog via the managed locations for catalogs and schemas features.

Catalog managed commits

Catalog managed commits are the basis for many new and powerful client (Delta) and server side features. Supporting the table scan and commit APIs is a key priority for the upcoming release.

End to end OAuth support

OAuth support is important for cloud users and RBAC, and unity client plans to support common OAuth flows for authentication.

Full roadmap

Feature Area v0.1 v0.2 v0.3 v0.4 v0.5+
Core
Catalog API + Server done done done done done
Schema API + Server done done done done done
Managed location in catalog API + Server done done
Managed location in schema API + Server done done
Credential API + Server πŸ› οΈ done done
External Location API + Server πŸ› οΈ done done
Multi-tenancy API + Server done
Identity & Authentication
Local identity management (user) API + Server done done done done
Group management API + Server done
Support for Machine identities (SPs) API + Server done
SCIM to support identity sync from IdP (users and groups) API + Server done done done done
OAuth/OIDC for Users API + Server done done done done
OAuth/OIDC for Services API + Server done
OAuth client-side support Spark integration done done
SAML authentication support API + Server ❓
Access Control & Governance
Support for change of ownership API + Server done
Add permission/privilege support for MODIFY, CREATE_X, BROWSE API + Server done
Add remaining permissions/privileges (MANAGE etc) API + Server done
Permission parity with Databricks UC API + Server done
Temporary credential vending for tables API + Server done done done done
Temporary credential vending for volumes API + Server done done done done
Temporary credential vending for models API + Server done done done done
Basic grants API + Server done done done done
Auditing API + Server done
SQL DCL changes Spark Integration done
RBAC API + Server ❓
Row level filters API + Server ❓
Column level masks API + Server ❓
ABAC API + Server ❓
Lineage API + Server ❓
Server production-readiness (support running as a HMS replacement)
Monitoring and Telemetry API + Server ❓
Database schema upgrades API + Server ❓
Change events API + Server ❓
Tables
External table reads & writes API + Server done done done done done
Spark integration done done done done
Delta integration done done done done
Managed Delta table reads API + Server done done done done done
Delta+Spark integration done done done done done
Managed Delta tables creates+writes with catalog-managed commits API + Server done done
Delta-Spark integration done done done
Delta Kernel integration done done
Delta Uniform tables with read as Iceberg via Iceberg REST API API + Server πŸ› οΈ πŸ› οΈ πŸ› οΈ πŸ› οΈ done
Delta integration done
Iceberg tables with create+read+write API + Server done
Multi-engine data types for column definitions API + Server done
Views
Basic Spark SQL flavor views API + Server done
Multi-dialect views API + Server done
Iceberg view support API + Server done
Materialized views API + Server πŸ› οΈ done
Streaming tables API + Server πŸ› οΈ done
Shallow clones API + Server done
Non-tabular and AI assets
Functions (SQL UDFs, Python UDFs) API + Server done done done done done
ML integrations with advanced python SDK done done done
Spark integration done
Multi-engine functions (SQL) API + Server ❓
Remote functions API + Server ❓
External volumes API + Server done done done done done
Spark integration done
Managed volumes API + Server done
Spark integration done
Models and model versions API + Server done done done done
MLflow integration done done done
Spark integration done
Features tables API + Server ❓
Data monitors API + Server ❓
Sharing
Delta Sharing integration API + Server ❓
Shares API + Server ❓
Recipients API + Server ❓
Providers API + Server ❓
Federation
Connections API + Server ❓
Foreign objects (catalogs, schemas, tables) API + Server ❓
Support for different data sources: JDBC, Iceberg REST, HMS API + Server ❓
UI (needs to be completed)