You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[Apache Iceberg](https://iceberg.apache.org/)is an open table format for large-scale analytic datasets, designed for highperformance and reliability. It provides an open, vendor-neutral solution that supports multiple engines, making it ideal for various analytics workloads. Initially, the Iceberg ecosystem was primarily built around Java, but with the increasing adoption of the REST catalog specification, Timeplus is among the first vendors to integrate with Iceberg purely in C++. This allows Timeplus users to stream data to Iceberg with a high performance, low memory footprint, and easy installation without relying on Java dependencies.
3
+
Timeplus natively supports the [Apache Iceberg](https://iceberg.apache.org/) open table format — a high-performance, reliable storage format for large-scale analytics. This integration allows Timeplus users to **stream data directly to Iceberg** and **query Iceberg tables efficiently**, all implemented purely in **C++**, without any Java dependencies.
4
4
5
-
Since Timeplus Proton 1.7(to be released soon) and [Timeplus Enterprise 2.8](/enterprise-v2.8), we provide native support for Apache Iceberg as a new database type. This allows you to read and write data using the Apache Iceberg open table format, with support for the Iceberg REST Catalog (IRC). In the initial release, we focused on writing data to Iceberg, with basic query optimization for reading data from Iceberg. The integration with Amazon S3, [AWS Glue's Iceberg REST endpoint](https://docs.aws.amazon.com/glue/latest/dg/connect-glu-iceberg-rest.html) and [the Apache Gravitino Iceberg REST Server](https://gravitino.apache.org/docs/0.8.0-incubating/iceberg-rest-service) are validated. More REST catalog implementations are planned.
5
+
### Supported Catalogs and Storage
6
6
7
-
## Key Benefits for Timeplus Iceberg Integration
7
+
The Iceberg REST Catalog integration works with common cloud and open-source backends, including:
8
8
9
-
- Using Timeplus materialized views, users can continuously process and transform streaming data (from Apache Kafka for example) and write to the cost-effective object storage in Apache Iceberg open table format.
10
-
- Apache Iceberg's open table format ensures you're never locked into a single vendor or query engine
11
-
- Query your Iceberg tables with multiple engines including Timeplus, Apache Spark, Apache Flink, ClickHouse, DuckDB, and AWS Athena
12
-
- Future-proof your data architecture with broad industry support and an active open-source community
To create an Iceberg database in Timeplus, use the following syntax:
15
+
| Feature | Description |
16
+
|----------|--------------|
17
+
|**Native C++ Integration**| Fully implemented in C++ — no Java runtime required. |
18
+
|**REST Catalog Support**| Works with any Iceberg REST Catalog implementation. |
19
+
|**Stream-to-Iceberg Writes**| Continuously write streaming data into Iceberg tables. |
20
+
|**Direct Reads from Iceberg**| Query Iceberg tables natively using Timeplus SQL. |
21
+
|**Cloud Ready**| Optimized for S3 and compatible object storage systems. |
22
+
23
+
:::info
24
+
Data compaction is **not yet supported** in the current Timeplus Iceberg integration.
25
+
:::
26
+
27
+
## Create an Iceberg Database
28
+
29
+
You can create an **Iceberg database** in Timeplus using the `CREATE DATABASE` statement with the `type='iceberg'` setting.
30
+
31
+
### Syntax
17
32
18
33
```sql
19
34
CREATE DATABASE <database_name>
20
35
SETTINGS
21
-
type='iceberg',
22
-
catalog_uri='<catalog_uri>',
23
-
catalog_type='rest',
24
-
warehouse='<warehouse_path>',
25
-
storage_endpoint='<s3_endpoint>',
26
-
rest_catalog_sigv4_enabled=<true|false>,
27
-
rest_catalog_signing_region='<region>',
28
-
rest_catalog_signing_name='<service_name>',
29
-
use_environment_credentials=<true|false>,
30
-
credential='<username:password>',
31
-
catalog_credential='<username:password>',
32
-
storage_credential='<username:password>';
36
+
type='iceberg',
37
+
catalog_uri='<catalog_uri>',
38
+
catalog_type='rest',
39
+
warehouse='<warehouse_path>',
40
+
storage_endpoint='<s3_endpoint>',
41
+
rest_catalog_sigv4_enabled=<true|false>,
42
+
rest_catalog_signing_region='<region>',
43
+
rest_catalog_signing_name='<service_name>',
44
+
use_environment_credentials=<true|false>,
45
+
credential='<username:password>',
46
+
catalog_credential='<username:password>',
47
+
storage_credential='<username:password>';
33
48
```
34
49
35
-
### DDL Settings {#settings}
50
+
### Settings
36
51
37
-
-`type`– Specifies the type of the database. Be sure to use `iceberg` for Iceberg tables.
38
-
-`catalog_uri`– Specifies the URI of the Iceberg catalog.
39
-
-`catalog_type`– Specifies the catalog type. Currently, only `rest` is supported in Timeplus.
40
-
-`warehouse`– The Iceberg warehouse identifier where the table data is stored.
41
-
-`storage_endpoint`– The S3-compatible endpoint where the data is stored. For AWS S3, use `https://bucketname.s3.region.amazonaws.com`.
42
-
-`rest_catalog_sigv4_enabled`– Enables AWS SigV4 authentication for secure catalog communication.
43
-
-`rest_catalog_signing_region`– AWS region used for signing the catalog requests.
44
-
-`rest_catalog_signing_name`– The service name used in AWS SigV4 signing.
45
-
-`use_environment_credentials`– Default to true, Timeplus will use environment-based credentials, useful for cases where Timeplus runs in an AWS EC2 instance with an assigned IAM role, or AWS credentials in environment variables as `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`. Setting this to false if you are using local minio or public S3 bucket.
46
-
-`credential`– A unified credential (username:password format) that applies to both catalog and storage if they share the same authentication (e.g. AWS access key and secret key).
47
-
-`catalog_credential`– If the catalog requires a separate credential, specify it here.
48
-
-`storage_credential`– If the storage (e.g. S3) requires a different credential, specify it separately.
52
+
-`type`— Must be set to `'iceberg'` to indicate an Iceberg database.
53
+
-`catalog_uri`— The URI of the Iceberg catalog (e.g., AWS Glue, Gravitino, or another REST catalog endpoint).
54
+
-`catalog_type`— Specifies the catalog type. Currently, only `'rest'` is supported in Timeplus.
55
+
-`warehouse`— The path or identifier of the Iceberg warehouse where table data is stored (e.g., an S3 path).
56
+
-`storage_endpoint`— The S3-compatible endpoint where data files are stored. For AWS S3, use `https://<bucket>.s3.<region>.amazonaws.com`.
57
+
-`rest_catalog_sigv4_enabled`— Enables [AWS SigV4](https://docs.aws.amazon.com/general/latest/gr/signing_aws_api_requests.html) authentication for secure catalog communication.
58
+
-`rest_catalog_signing_region`— The AWS region used for SigV4 signing (e.g., `us-west-2`).
59
+
-`rest_catalog_signing_name`— The service name used in SigV4 signing (typically `glue` or `s3`).
60
+
-`use_environment_credentials`— Defaults to `true`. When enabled, Timeplus uses environment-based credentials such as IAM roles or environment variables (`AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`). Set this to `false` when using local MinIO or a public S3 bucket.
61
+
-`credential`— A unified credential in `username:password` format (for example, AWS access key and secret key). Used for both catalog and storage if they share the same authentication.
62
+
-`catalog_credential`— Optional. Use when the catalog requires credentials different from the storage layer.
63
+
-`storage_credential`— Optional. Use when the storage backend (e.g., S3 or MinIO) requires separate credentials.
If you want to create new Iceberg tables from Timeplus, you can also set `storage_credential` to `'https://s3tables.us-west-2.amazonaws.com/(bucket-name)'`.
100
+
**Explanation**:
101
+
- This example configures an **AWS S3 Table REST Catalog** for Iceberg in Timeplus.
102
+
- The warehouse setting specifies the Glue catalog and S3 bucket location.
103
+
-`rest_catalog_sigv4_enabled=true` enables secure communication with AWS using SigV4 signing.
104
+
- To **create new Iceberg tables** directly from Timeplus, you can also set: `storage_credential='https://s3tables.us-west-2.amazonaws.com/bucket-name';`
## Writing to Iceberg via a Materialized View {#write_via_mv}
124
-
You can run `INSERT INTO` statements to write data to Iceberg tables, or set up a materialized view to continuously write data to Iceberg tables.
170
+
After creating an Iceberg database in Timeplus, you can list existing tables or create new ones directly via SQL.
171
+
172
+
## Writing to Iceberg
173
+
174
+
You can insert data directly via `INSERT INTO` SQL statement or continuously write to Iceberg streams using materialized views:
175
+
176
+
**Example**:
125
177
126
178
```sql
127
-
CREATE MATERIALIZED VIEW mv_write_iceberg INTO demo.transformedAS
128
-
SELECT now() AStimestamp, org_id, float_value,
129
-
length(`array_of_records.a_num`) AS array_length,
130
-
array_max(`array_of_records.a_num`) AS max_num,
131
-
array_min(`array_of_records.a_num`) AS min_num
179
+
CREATE MATERIALIZED VIEW sink_to_iceberg_mv INTO demo.transformedAS
180
+
SELECT
181
+
now() AStimestamp,
182
+
org_id,
183
+
float_value,
184
+
length(array_of_records.a_num) AS array_length,
185
+
array_max(array_of_records.a_num) AS max_num,
186
+
array_min(array_of_records.a_num) AS min_num
132
187
FROM msk_stream_read
133
188
SETTINGS s3_min_upload_file_size=1024;
134
189
```
135
190
136
-
## Querying Iceberg Data with SparkSQL {#query_iceberg}
191
+
This example continuously writes transformed data from a streaming source (`msk_stream_read`) into an Iceberg table.
192
+
193
+
## Reading from Iceberg
137
194
138
-
### Using SQL in Timeplus {#query_timeplus}
139
-
You can query Iceberg data in Timeplus by:
195
+
You can query Iceberg data in Timeplus using standard SQL syntax:
140
196
```sql
141
-
SELECT*FROMdemo.transformed
197
+
SELECT...FROM<iceberg_database>.<iceberg_stream>;
142
198
```
143
-
This will return all results and terminate the query. No streaming mode is supported for Iceberg tables yet. It's recommended to set `LIMIT` to a small value to avoid loading too much data from Iceberg to Timeplus.
144
199
200
+
:::info
201
+
Iceberg streams in Timeplus behave like static tables — queries return the full result set and then terminate.
202
+
203
+
For large tables, it’s recommended to include a LIMIT clause to avoid excessive data loading.
204
+
205
+
In future releases, **continuous streaming query support** for Iceberg streams will be added, allowing real-time incremental reads from Iceberg data.
206
+
:::
207
+
208
+
**Example**:
145
209
```sql
146
210
SELECTcount() FROMiceberg_db.table_name;
147
211
```
148
-
This query is optimized to return the count of rows in the specified Iceberg table with minimal scanning of metadata and data files.
149
212
150
-
### Using SparkSQL {#query_sparksql}
213
+
You can also use **SparkSQL** to validate or analyze Iceberg data created by Timeplus.
214
+
Depending on your catalog setup, use one of the following configurations:
151
215
152
-
Depending on whether you setup the catalog via AWS Glue or Apache Gravitino, you can also start a SparkSQL session to query or insert data into Iceberg tables.
0 commit comments