Supabricks provides a REST API interface to Databricks Unity Catalog using FastAPI, enabling simple CRUD operations on Delta tables. It's designed to make Databricks data accessible through a clean, RESTful interface similar to Supabase.
While Databricks already provides its own APIs, Supabricks offers several compelling advantages:
-
Simplified Developer Experience: Provides a clean, RESTful interface that follows familiar patterns similar to modern API services like Supabase, making it more accessible for developers who may not be familiar with Databricks' specific APIs.
-
Standardized Access Pattern: Organizes operations around tables as resources with standard HTTP methods (GET, POST, PUT, DELETE) mapping directly to CRUD operations, creating a more intuitive interface than working with raw SQL endpoints.
-
Reduced Integration Complexity: Abstracts away Databricks-specific details like catalog/schema navigation, connection management, and SQL query construction, significantly reducing the learning curve for new developers.
-
Controlled Data Access: Provides a secure way to expose specific Databricks functionality to applications without granting full workspace access, perfect for scenarios requiring limited, controlled access to data.
-
Simplified Authentication: Handles Databricks PAT authentication in a way that's familiar to REST API users while maintaining security.
-
External Accessibility: The ClearTunnel feature solves the common challenge of exposing Databricks services externally without complex networking setup, making it ideal for development, demos, or quick sharing.
- RESTful API for Delta Tables: Access and manipulate Delta tables through standard HTTP methods
- PAT Authentication: Secure access using Databricks Personal Access Tokens
- Dynamic Table Discovery: Automatically detects tables in your catalog every 60 seconds
- Complete CRUD Operations: Full support for Create, Read, Update, and Delete operations on both data and tables
- Table Management: Create and drop tables with customizable schemas
- Delta Table Integration: Leverages Delta Lake's MERGE capabilities for efficient updates
- System Catalog Filtering: Improved performance by excluding system catalogs and tables
- OpenAPI Documentation: Interactive API docs available at
/docsand/redoc - ClearTunnel Integration: Expose your FastAPI app publicly from within Databricks Apps
| Component | Description |
|---|---|
| FastAPI | Modern, high-performance web framework for building APIs |
| Databricks SDK | Handles user verification via Personal Access Tokens |
| PySpark | SQL engine for interacting with Delta tables |
| Auto-Polling | Background thread that detects tables every 60 seconds |
| ClearTunnel | Enables external access to the API from Databricks Apps |
| Endpoint | Method | Description |
|---|---|---|
/tables |
GET | List all available tables (excludes system tables) |
/tables/{table} |
GET | Retrieve rows from a table with optional filtering |
/tables/{table} |
POST | Insert new rows into a table |
/tables/{table} |
PUT | Update rows in a table using MERGE |
/tables/{table} |
DELETE | Delete rows from a table |
/tables/create |
POST | Create a new table with custom schema |
/tables/drop/{table} |
DELETE | Drop an existing table |
| Operation | Typical Latency |
|---|---|
| GET /tables | 500ms–1.5s |
| GET /tables/{table} | 0.5s–2s |
| POST /tables/{table} | 2s–6s |
| PUT (MERGE) | 3s–10s |
| DELETE | 2s–6s |
| POST /tables/create | 3s–8s |
| DELETE /tables/drop/{table} | 2s–5s |
- Python 3.8+
- Databricks workspace with Unity Catalog enabled
- Databricks Personal Access Token
- Databricks SQL Warehouse ID
-
Clone this repository:
git clone https://github.com/yourusername/supabricks.git cd supabricks -
Create a
.envfile with your Databricks credentials:DATABRICKS_HOST=https://your-databricks-instance.cloud.databricks.com/ # For security, supply your token in API calls via Authorization header ENABLE_CLEARTUNNEL=true DATABRICKS_WAREHOUSE_ID=your-sql-warehouse-id -
Install dependencies:
pip install -r requirements.txt
-
Run the application:
uvicorn main:app --host 0.0.0.0 --port 8000
-
Package the application:
zip -r supabricks_v3.2.zip . -
Upload the ZIP file to your Databricks workspace under Apps
-
Configure the app using the provided
app.yaml -
Run the app and check the logs to find the ClearTunnel URL:
- Look for a line like:
CLEARTUNNEL_URL=Urls(tunnel='https://your-unique-url.trycloudflare.com'...) - This is the public URL you'll use to access your API
- Look for a line like:
All API requests require a Databricks Personal Access Token (PAT) for authentication. This ensures that users can only access data they have permissions for in Databricks.
-
Generate a PAT in your Databricks workspace:
- Go to User Settings > Access Tokens > Generate New Token
-
Include the token in all API requests:
curl -H "Authorization: Bearer your-personal-access-token" https://your-cleartunnel-url/tables
Here's how to test each endpoint of the API:
curl -H "Authorization: Bearer your-personal-access-token" https://your-cleartunnel-url/curl -H "Authorization: Bearer your-personal-access-token" https://your-cleartunnel-url/tablescurl -X POST -H "Authorization: Bearer your-personal-access-token" \
-H "Content-Type: application/json" \
-d '{
"table_name": "catalog.schema.test_table",
"columns": [
{"name": "id", "type": "STRING", "nullable": false},
{"name": "value", "type": "INT"},
{"name": "description", "type": "STRING"}
],
"comment": "Test table created via API"
}' \
https://your-cleartunnel-url/tables/createcurl -H "Authorization: Bearer your-personal-access-token" https://your-cleartunnel-url/tables/catalog.schema.test_table?limit=10curl -X POST -H "Authorization: Bearer your-personal-access-token" \
-H "Content-Type: application/json" \
-d '{"data": [{"id": "test1", "value": 100, "description": "Test record"}]}' \
https://your-cleartunnel-url/tables/catalog.schema.test_tablecurl -X PUT -H "Authorization: Bearer your-personal-access-token" \
-H "Content-Type: application/json" \
-d '{"filter": {"id": "test1"}, "updates": {"value": 200}}' \
https://your-cleartunnel-url/tables/catalog.schema.test_tablecurl -X DELETE -H "Authorization: Bearer your-personal-access-token" \
-H "Content-Type: application/json" \
-d '{"filter": {"id": "test1"}}' \
https://your-cleartunnel-url/tables/catalog.schema.test_tablecurl -X DELETE -H "Authorization: Bearer your-personal-access-token" \
https://your-cleartunnel-url/tables/drop/catalog.schema.test_tableTo measure API performance, you can use the time command with curl:
time curl -H "Authorization: Bearer your-personal-access-token" https://your-cleartunnel-url/tablesOr create a shell script that captures timing information for each API call.
- Admin portals and dashboards
- Data ingestion tools (batch processing)
- BI dashboards via the
/tablesendpoint - Schema management and table creation
- Embedded applications requiring audit-compliant Delta APIs
- Not designed for high-throughput OLTP workloads
- Not optimized for real-time streaming writes
- Performance depends on underlying Databricks cluster configuration
Supabricks v3.2 includes ClearedTunnel support to expose the FastAPI app publicly from within Databricks Apps, which do not allow incoming ports by default. This solves the accessibility issue when using Supabricks inside the Databricks environment.
MIT
Contributions are welcome! Please feel free to submit a Pull Request.
