Skip to content

Commit 0fe561a

Browse files
yogesh-dbxclaude
andauthored
fix: update Zerobus Ingest skill for SDK v1.1.0 breaking API changes (#291)
* fix: update Zerobus Ingest skill for SDK v1.1.0 breaking API changes SDK v1.1.0 introduces breaking changes from v0.2.x that cause skill-generated code to fail at runtime. All changes verified through E2E testing. Key fixes: - Constructor: positional args -> keyword args (host=, unity_catalog_url=) - Ingest pattern: ingest_record(json.dumps(record)) + flush() - SDK version: >=0.2.0 -> >=1.0.0 - Status: Public Preview -> GA (Feb 2026) - Add serverless compute limitation and REST API alternative - Add explicit table grants requirement (Error 4024) - Fix typos: "speficfied" -> "specified", "Workslfow" -> "Workflow" Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * revert constructor to positional style (matches official docs) The ZerobusSdk constructor works fine with positional args. The internal parameter names changed in v1.1.0 but positional passing still works. Reverted to match the official Databricks documentation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
1 parent dd3684e commit 0fe561a

3 files changed

Lines changed: 21 additions & 12 deletions

File tree

databricks-skills/databricks-zerobus-ingest/1-setup-and-authentication.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -76,21 +76,25 @@ GRANT MODIFY, SELECT ON TABLE my_catalog.my_schema.my_events TO `<service-princi
7676

7777
**Tip:** For broader access (e.g., writing to multiple tables in a schema), grant `MODIFY` and `SELECT` at the schema level instead.
7878

79+
**Important:** For Zerobus, always grant explicit table-level `MODIFY` and `SELECT` permissions in addition to catalog/schema access. Schema-level inherited grants may not be sufficient for the OAuth `authorization_details` flow used by Zerobus.
80+
7981
---
8082

8183
## 4. Install the SDK
8284

8385
### Python (3.9+)
8486

8587
```bash
86-
pip install databricks-zerobus-ingest-sdk
88+
pip install databricks-zerobus-ingest-sdk>=1.0.0
8789
```
8890

8991
Or with a virtual environment:
9092
```bash
91-
uv pip install databricks-zerobus-ingest-sdk
93+
uv pip install databricks-zerobus-ingest-sdk>=1.0.0
9294
```
9395

96+
**Note:** The Zerobus SDK cannot be pip-installed on Databricks serverless compute. Use classic compute clusters, or use the [Zerobus REST API](https://docs.databricks.com/aws/en/ingestion/zerobus-rest-api) (Beta) for notebook-based ingestion without the SDK.
97+
9498
### Java (8+)
9599

96100
Maven:

databricks-skills/databricks-zerobus-ingest/2-python-client.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -355,4 +355,4 @@ stream.wait_for_offset(offset)
355355
| `ingest_records_nowait(records)` | None | No | Max batch throughput |
356356
| `wait_for_offset(offset)` | None | Yes (until ACK) | Durability confirmation |
357357
| `flush()` | None | Yes (until sent) | Ensure all buffered records are sent |
358-
| `ingest_record(record)` | RecordAcknowledgment | No | **Deprecated** — use `ingest_record_offset` |
358+
| `ingest_record(record)` | RecordAcknowledgment | No | Primary method in SDK v1.1.0+; pass `json.dumps(record)` for JSON |

databricks-skills/databricks-zerobus-ingest/SKILL.md

Lines changed: 14 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ description: "Build Zerobus Ingest clients for near real-time data ingestion int
77

88
Build clients that ingest data directly into Databricks Delta tables via the Zerobus gRPC API.
99

10-
**Status:** Public Preview (currently free; Databricks plans to introduce charges in the future)
10+
**Status:** GA (Generally Available since February 2026; billed under Lakeflow Jobs Serverless SKU)
1111

1212
**Documentation:**
1313
- [Zerobus Overview](https://docs.databricks.com/aws/en/ingestion/zerobus-overview)
@@ -37,7 +37,7 @@ Zerobus Ingest is a serverless connector that enables direct, record-by-record d
3737
| Schema generation from UC table | Any | Protobuf | [4-protobuf-schema.md](4-protobuf-schema.md) |
3838
| Retry / reconnection logic | Any | Any | [5-operations-and-limits.md](5-operations-and-limits.md) |
3939

40-
If not speficfied, default to python.
40+
If not specified, default to python.
4141

4242
---
4343

@@ -46,7 +46,7 @@ If not speficfied, default to python.
4646
These libraries are essential for ZeroBus data ingestion:
4747

4848
- **databricks-sdk>=0.85.0**: Databricks workspace client for authentication and metadata
49-
- **databricks-zerobus-ingest-sdk>=0.2.0**: ZeroBus SDK for high-performance streaming ingestion
49+
- **databricks-zerobus-ingest-sdk>=1.0.0**: ZeroBus SDK for high-performance streaming ingestion
5050
- **grpcio-tools**
5151
These are typically NOT pre-installed on Databricks. Install them using `execute_databricks_command` tool:
5252
- `code`: "%pip install databricks-sdk>=VERSION databricks-zerobus-ingest-sdk>=VERSION"
@@ -85,6 +85,7 @@ See [1-setup-and-authentication.md](1-setup-and-authentication.md) for complete
8585
## Minimal Python Example (JSON)
8686

8787
```python
88+
import json
8889
from zerobus.sdk.sync import ZerobusSdk
8990
from zerobus.sdk.shared import RecordType, StreamConfigurationOptions, TableProperties
9091

@@ -95,8 +96,8 @@ table_props = TableProperties(table_name)
9596
stream = sdk.create_stream(client_id, client_secret, table_props, options)
9697
try:
9798
record = {"device_name": "sensor-1", "temp": 22, "humidity": 55}
98-
offset = stream.ingest_record_offset(record)
99-
stream.wait_for_offset(offset)
99+
stream.ingest_record(json.dumps(record))
100+
stream.flush()
100101
finally:
101102
stream.close()
102103
```
@@ -115,7 +116,7 @@ finally:
115116

116117
---
117118

118-
You must always follow all the steps in the Workslfow
119+
You must always follow all the steps in the Workflow
119120

120121
## Workflow
121122
0. **Display the plan of your execution**
@@ -129,8 +130,10 @@ You must always follow all the steps in the Workslfow
129130
---
130131

131132
## Important
132-
- Never install local packages
133+
- Never install local packages
133134
- Always validate MCP server requirement before execution
135+
- **Serverless limitation**: The Zerobus SDK cannot pip-install on serverless compute. Use classic compute clusters, or use the [Zerobus REST API](https://docs.databricks.com/aws/en/ingestion/zerobus-rest-api) (Beta) for notebook-based ingestion without the SDK.
136+
- **Explicit table grants**: Service principals need explicit `MODIFY` and `SELECT` grants on the target table. Schema-level inherited permissions may not be sufficient for the `authorization_details` OAuth flow.
134137

135138
---
136139

@@ -173,7 +176,7 @@ When execution fails:
173176
Databricks provides Spark, pandas, numpy, and common data libraries by default. **Only install a library if you get an import error.**
174177

175178
Use `execute_databricks_command` tool:
176-
- `code`: "%pip install databricks-zerobus-ingest-sdk>=0.2.0"
179+
- `code`: "%pip install databricks-zerobus-ingest-sdk>=1.0.0"
177180
- `cluster_id`: "<cluster_id>"
178181
- `context_id`: "<context_id>"
179182

@@ -193,7 +196,7 @@ The timestamp generation must use microseconds for Databricks.
193196
- **gRPC + Protobuf**: Zerobus uses gRPC as its transport protocol. Any application that can communicate via gRPC and construct Protobuf messages can produce to Zerobus.
194197
- **JSON or Protobuf serialization**: JSON for quick starts; Protobuf for type safety, forward compatibility, and performance.
195198
- **At-least-once delivery**: The connector provides at-least-once guarantees. Design consumers to handle duplicates.
196-
- **Durability ACKs**: Each ingested record returns an offset. Use `wait_for_offset(offset)` to confirm durable write. ACKs indicate all records up to that offset have been durably written.
199+
- **Durability ACKs**: Each ingested record returns a `RecordAcknowledgment`. Use `flush()` to ensure all buffered records are durably written, or use `wait_for_offset(offset)` for offset-based tracking.
197200
- **No table management**: Zerobus does not create or alter tables. You must pre-create your target table and manage schema evolution yourself.
198201
- **Single-AZ durability**: The service runs in a single availability zone. Plan for potential zone outages.
199202

@@ -210,6 +213,8 @@ The timestamp generation must use microseconds for Databricks.
210213
| **Throughput limits hit** | Max 100 MB/s and 15,000 rows/s per stream. Open multiple streams or contact Databricks. |
211214
| **Region not supported** | Check supported regions in [5-operations-and-limits.md](5-operations-and-limits.md). |
212215
| **Table not found** | Ensure table is a managed Delta table in a supported region with correct three-part name. |
216+
| **SDK install fails on serverless** | The Zerobus SDK cannot be pip-installed on serverless compute. Use classic compute clusters or the REST API (Beta) from notebooks. |
217+
| **Error 4024 / authorization_details** | Service principal lacks explicit table-level grants. Grant `MODIFY` and `SELECT` directly on the target table — schema-level inherited grants may be insufficient. |
213218

214219
---
215220

0 commit comments

Comments
 (0)