Skip to content
This repository was archived by the owner on Mar 6, 2026. It is now read-only.

Commit 1bca2f0

Browse files
Initial commit
Signed-off-by: teodordelibasic-db <teodor.delibasic@databricks.com>
1 parent f4a9819 commit 1bca2f0

52 files changed

Lines changed: 6471 additions & 2281 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,9 @@ buildNumber.properties
1919
.settings/
2020
.project
2121
.classpath
22+
.metals/
23+
.bsp/
24+
.bazelbsp/
2225

2326
# OS
2427
.DS_Store

CHANGELOG.md

Lines changed: 92 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,96 @@
1-
# Version changelog
1+
# Changelog
22

3-
## Release v0.1.0
3+
All notable changes to the Databricks Zerobus Java SDK will be documented in this file.
44

5-
Initial release of the Databricks Zerobus Ingest SDK for Java.
5+
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
6+
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
67

7-
### API Changes
8+
## [Unreleased]
89

9-
- Added `ZerobusSdk` class for creating ingestion streams
10-
- Added `ZerobusStream` class for managing stateful gRPC streams
11-
- Added `RecordAcknowledgment` for blocking until record acknowledgment
12-
- Added `TableProperties` for configuring table schema and name
13-
- Added `StreamConfigurationOptions` for stream behavior configuration
14-
- Added `ZerobusException` and `NonRetriableException` for error handling
15-
- Added `StreamState` enum for tracking stream lifecycle
16-
- Added utility methods in `ZerobusSdkStubUtils` for gRPC stub management
17-
- Support for Java 8 and higher
10+
### Added
11+
12+
#### Native Rust Backend (JNI Migration)
13+
- **Complete architecture rewrite**: The SDK now uses JNI (Java Native Interface) to call a high-performance Rust implementation instead of pure Java gRPC calls
14+
- Native library is automatically loaded from the classpath or system library path
15+
- Improved performance through zero-copy data passing and optimized async handling
16+
17+
#### New APIs
18+
19+
**Offset-Based Ingestion API** - High-throughput alternative to CompletableFuture-based API:
20+
- `ZerobusStream.ingestRecordOffset(IngestableRecord)` - Returns offset immediately without future allocation
21+
- `ZerobusStream.ingestRecordOffset(RecordType)` - Convenience overload for Protocol Buffer messages
22+
- `ZerobusStream.ingestRecordsOffset(Iterable)` - Batch ingestion returning last offset
23+
- `ZerobusStream.waitForOffset(long)` - Block until specific offset is acknowledged
24+
25+
**JSON Record Support**:
26+
- `IngestableRecord` interface - Unified interface for all record types
27+
- `JsonRecord` class - JSON string wrapper implementing IngestableRecord
28+
- `ProtoRecord<T>` class - Protocol Buffer wrapper implementing IngestableRecord
29+
- `RecordType` enum - Specifies stream serialization format (`PROTO` or `JSON`)
30+
- `StreamConfigurationOptions.setRecordType(RecordType)` - Configure stream for JSON or Proto records
31+
- Both record types work with `ingestRecord()` and `ingestRecordOffset()` methods
32+
33+
**Batch Operations**:
34+
- `ZerobusStream.ingestRecords(Iterable)` - Ingest multiple records with single acknowledgment
35+
- `ZerobusStream.getUnackedBatches()` - Get unacknowledged records preserving batch grouping
36+
- `EncodedBatch` class - Represents a batch of encoded records
37+
38+
**Arrow Flight Support** (Experimental):
39+
- `ZerobusArrowStream` class - High-performance columnar data ingestion
40+
- `ArrowTableProperties` class - Table configuration with Arrow schema
41+
- `ArrowStreamConfigurationOptions` class - Arrow stream configuration
42+
- `ZerobusSdk.createArrowStream()` - Create Arrow Flight streams
43+
- `ZerobusSdk.recreateArrowStream()` - Recover failed Arrow streams
44+
45+
**New Callback Interface**:
46+
- `AckCallback` interface with `onAck(long offsetId)` and `onError(long offsetId, String message)`
47+
- More detailed error information than the deprecated Consumer-based callback
48+
49+
### Changed
50+
51+
- `ZerobusSdk` now requires native library to be available at runtime
52+
- `ZerobusStream.close()` now throws `ZerobusException` instead of being silent on errors
53+
- `ZerobusArrowStream.close()` also throws `ZerobusException`
54+
- Improved error messages with more context from the Rust backend
55+
56+
### Deprecated
57+
58+
- `ZerobusStream.ingestRecord(RecordType)` - Use `ingestRecordOffset()` instead for better performance. The offset-based API avoids CompletableFuture allocation overhead.
59+
- `ZerobusStream.ingestRecord(IngestableRecord)` - Use `ingestRecordOffset()` instead.
60+
- `ZerobusStream.ingestRecords(Iterable)` - Use `ingestRecordsOffset()` instead.
61+
- `ZerobusStream.getState()` - Stream state is no longer exposed by the native backend. Returns `OPENED` or `CLOSED` only.
62+
- `ZerobusStream.getUnackedRecords()` - Returns empty iterator. Use `getUnackedBatches()` or `getUnackedRecordsRaw()` instead.
63+
- `StreamConfigurationOptions.Builder.setAckCallback(Consumer<IngestRecordResponse>)` - Use `setAckCallback(AckCallback)` instead.
64+
- `ZerobusSdk.setStubFactory()` - gRPC stub factory is no longer used with native backend. Throws `UnsupportedOperationException`.
65+
66+
### Removed
67+
68+
- Direct gRPC implementation - All network communication now goes through the Rust native library
69+
- Internal gRPC stub management classes (still present but unused)
70+
71+
### Fixed
72+
73+
- Memory leaks in long-running streams through proper native resource management
74+
- Race conditions in concurrent record ingestion
75+
- Improved backpressure handling through native implementation
76+
77+
### Performance
78+
79+
- **Reduced latency**: Direct native calls avoid gRPC Java overhead
80+
- **Lower memory footprint**: No CompletableFuture allocation for offset-based API
81+
- **Better throughput**: Optimized Rust async runtime handles network I/O
82+
- **Reduced GC pressure**: Fewer intermediate object allocations
83+
84+
## [0.1.0] - Initial Release
85+
86+
### Added
87+
88+
- Initial SDK release with Protocol Buffer record ingestion
89+
- `ZerobusSdk` - Main entry point for creating streams
90+
- `ZerobusStream` - Stream for ingesting Protocol Buffer records
91+
- `TableProperties` - Table configuration
92+
- `StreamConfigurationOptions` - Stream configuration with builder pattern
93+
- `ZerobusException` and `NonRetriableException` - Exception hierarchy
94+
- `GenerateProto` tool - Generate Protocol Buffer schemas from Unity Catalog
95+
- Blocking and non-blocking ingestion examples
96+
- Comprehensive README documentation

0 commit comments

Comments
 (0)