A standalone tool for generating Protocol Buffer (proto2) definition files from Unity Catalog table schemas.
The GenerateProto tool fetches table schema information from Unity Catalog and automatically generates a corresponding .proto file with proper type mappings. This is useful when you need to create Protocol Buffer message definitions that match your Delta table schemas for use with the Zerobus SDK.
The tool is packaged within the Zerobus SDK JAR, so users can run it directly after downloading the SDK without needing to clone the repository.
- Fetches table schema directly from Unity Catalog
- Supports all standard Delta data types
- Generates proto2 format files
- Handles complex types (arrays and maps)
- Uses OAuth 2.0 client credentials authentication
- No external dependencies beyond Java standard library
- Packaged in SDK JAR for easy distribution
- Java 8 or higher
- Zerobus SDK JAR (built with
mvn package) - OAuth client ID and client secret with access to Unity Catalog
- Access to a Unity Catalog endpoint
If you have the SDK source repository:
# First, build the SDK JAR
mvn package
# Then run the tool
./tools/generate_proto.sh \
--uc-endpoint "https://your-workspace.cloud.databricks.com" \
--client-id "your-client-id" \
--client-secret "your-client-secret" \
--table "catalog.schema.table_name" \
--output "output.proto" \
--proto-msg "TableMessage"If you have downloaded the SDK JAR without the source code:
# Using the shaded JAR (includes all dependencies)
java -cp databricks-zerobus-ingest-sdk-0.1.0-jar-with-dependencies.jar \
com.databricks.zerobus.tools.GenerateProto \
--uc-endpoint "https://your-workspace.cloud.databricks.com" \
--client-id "your-client-id" \
--client-secret "your-client-secret" \
--table "catalog.schema.table_name" \
--output "output.proto" \
--proto-msg "TableMessage"Or, if the JAR has a Main-Class manifest entry (which it does):
# Even simpler - just use -jar flag
java -jar databricks-zerobus-ingest-sdk-0.1.0-jar-with-dependencies.jar \
--uc-endpoint "https://your-workspace.cloud.databricks.com" \
--client-id "your-client-id" \
--client-secret "your-client-secret" \
--table "catalog.schema.table_name" \
--output "output.proto" \
--proto-msg "TableMessage"| Argument | Required | Description |
|---|---|---|
--uc-endpoint |
Yes | Unity Catalog endpoint URL (e.g., https://your-workspace.cloud.databricks.com) |
--client-id |
Yes | OAuth client ID for authentication |
--client-secret |
Yes | OAuth client secret for authentication |
--table |
Yes | Full table name in format catalog.schema.table_name |
--output |
Yes | Output path for the generated proto file (e.g., output.proto) |
--proto-msg |
No | Name of the protobuf message (defaults to the table name) |
The tool automatically maps Delta/Unity Catalog types to Protocol Buffer types:
| Delta Type | Proto2 Type |
|---|---|
INT, SHORT, SMALLINT |
int32 |
LONG, BIGINT |
int64 |
STRING, VARCHAR(n) |
string |
FLOAT |
float |
DOUBLE |
double |
BOOLEAN |
bool |
BINARY |
bytes |
DATE |
int32 |
TIMESTAMP |
int64 |
ARRAY<type> |
repeated type |
MAP<key_type, value_type> |
map<key_type, value_type> |
Generate a proto file for a simple table:
From the SDK JAR:
java -jar databricks-zerobus-ingest-sdk-0.1.0-jar-with-dependencies.jar \
--uc-endpoint "https://myworkspace.cloud.databricks.com" \
--client-id "abc123" \
--client-secret "secret123" \
--table "my_catalog.my_schema.users" \
--output "users.proto"Or, if you have the source repository:
./tools/generate_proto.sh \
--uc-endpoint "https://myworkspace.cloud.databricks.com" \
--client-id "abc123" \
--client-secret "secret123" \
--table "my_catalog.my_schema.users" \
--output "users.proto"This might generate:
syntax = "proto2";
message users {
required int32 user_id = 1;
required string username = 2;
optional string email = 3;
required int64 created_at = 4;
}Specify a custom message name:
java -jar databricks-zerobus-ingest-sdk-0.1.0-jar-with-dependencies.jar \
--uc-endpoint "https://myworkspace.cloud.databricks.com" \
--client-id "abc123" \
--client-secret "secret123" \
--table "my_catalog.my_schema.events" \
--output "events.proto" \
--proto-msg "EventRecord"The tool handles complex types like arrays and maps:
java -jar databricks-zerobus-ingest-sdk-0.1.0-jar-with-dependencies.jar \
--uc-endpoint "https://myworkspace.cloud.databricks.com" \
--client-id "abc123" \
--client-secret "secret123" \
--table "my_catalog.my_schema.products" \
--output "products.proto"If the table has columns like:
tags ARRAY<STRING>attributes MAP<STRING, STRING>
The generated proto will include:
syntax = "proto2";
message products {
required int32 product_id = 1;
required string name = 2;
repeated string tags = 3;
map<string, string> attributes = 4;
}The tool uses OAuth 2.0 client credentials flow to authenticate with Unity Catalog. Unlike the SDK's token generation (which includes resource and authorization details for specific table privileges), this tool uses basic authentication with minimal scope to fetch table metadata.
The authentication flow:
- Exchanges client ID and secret for an OAuth token
- Uses the token to fetch table schema from Unity Catalog API
- Token is used only for metadata retrieval (read-only operation)
After generating the .proto file:
- Place it in your project's proto directory (e.g.,
src/main/proto/) - Compile it using the protobuf compiler:
protoc --java_out=src/main/java your_proto_file.proto
- Use the generated Java classes with the Zerobus SDK:
TableProperties<YourMessage> tableProperties = new TableProperties<>("catalog.schema.table", YourMessage.getDefaultInstance()); ZerobusStream<YourMessage> stream = sdk.createStream( tableProperties, clientId, clientSecret).join();
If you receive authentication errors:
- Verify your client ID and secret are correct
- Ensure your OAuth client has access to Unity Catalog
- Check that the endpoint URL is correct
If the table cannot be found:
- Verify the table name format is
catalog.schema.table - Ensure the table exists in Unity Catalog
- Check that your OAuth client has permission to read the table metadata
If you encounter unsupported type errors:
- Check if your table uses custom or complex types not listed in the type mappings
- Consider simplifying the column type or manually editing the generated proto file
The tool is distributed as part of the Zerobus SDK JAR. When you download or build the SDK, the GenerateProto tool is automatically included in the shaded JAR file (databricks-zerobus-ingest-sdk-*-jar-with-dependencies.jar).
Users can run the tool directly from the JAR without needing access to the source code:
# Download the SDK JAR (or build it with mvn package)
# Then simply run:
java -jar databricks-zerobus-ingest-sdk-0.1.0-jar-with-dependencies.jar \
--uc-endpoint "..." \
--client-id "..." \
--client-secret "..." \
--table "..." \
--output "output.proto"src/main/java/com/databricks/zerobus/tools/GenerateProto.java- Main tool implementation (packaged in SDK JAR)tools/generate_proto.sh- Helper script for running from source repositorytools/README.md- This documentation file
This tool is part of the Databricks Zerobus SDK for Java.