Skip to content

Commit 51be251

Browse files
authored
TTL for append-stream (#467)
1 parent 6f94b39 commit 51be251

File tree

5 files changed

+124
-8
lines changed

5 files changed

+124
-8
lines changed

docs/append-stream-tiered-storage.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Tiered Storage
1+
# Tier Storage
22

33
Since Timeplus Enterprise 2.8, we have introduced a new feature called Tiered Storage. This feature allows users to store data in a mix of local and remote storage. For example, users can store hot data in a local high-performance storage(e.g. NVMe SSD) for quick access and move the data to object storage(e.g. S3) for long-term retention.
44

docs/append-stream-ttl.md

Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
# TTL
2+
3+
TTL expression for Append Stream can specify the logic of automatic moving data between disks and volumes, or recompressing parts where all the data has been expired or just discarding the expired data.
4+
5+
Timeplus evalutes TTL against the data in the historical store in background asychronously during background data merge. The frequence can be controlled by **`merge_with_ttl_timeout`** setting.
6+
7+
## TTL Expression
8+
9+
```sql
10+
TTL expr
11+
[DELETE | RECOMPRESS codec_name1 | TO DISK 'xxx' | TO VOLUME 'xxx'][, DELETE | RECOMPRESS codec_name2 | TO DISK 'aaa' | TO VOLUME 'bbb'] ...
12+
[WHERE conditions]
13+
```
14+
15+
Type of TTL rule may follow each TTL expression. It affects an action which is to be done once the expression is satisfied (reaches current time):
16+
17+
- **`DELETE`** - delete expired rows (default action);
18+
- **`RECOMPRESS codec_name`** - recompress data part with the `codec_name`;
19+
- **`TO DISK 'aaa'`** - move part to the disk `aaa`;
20+
- **`TO VOLUME 'bbb'`** - move part to the disk `bbb`;
21+
22+
**`DELETE`** action can be used together with **`WHERE`** clause to delete only some of the expired rows based on a filtering condition:
23+
24+
```sql
25+
TTL time_column + INTERVAL 1 MONTH DELETE WHERE column = 'value'
26+
```
27+
28+
## Alter TTL
29+
30+
```sql
31+
ALTER STREAM <db.stream> MODIFY TTL <expr>;
32+
```
33+
34+
**Example:**
35+
```sql
36+
ALTER STREAM test MODIFY TTL d + INTERVAL 1 DAY;
37+
```
38+
39+
## Trigger Data Delete with Merge
40+
41+
Expired data is removed only during the merge process, which runs asynchronously in the background.
42+
43+
To accelerate this cleanup, you can manually trigger a merge by running the `OPTIMIZE` command. This attempts to start an unscheduled merge of data parts for a stream:
44+
45+
```sql
46+
OPTIMIZE STREAM <db.stream-name>;
47+
```
48+
49+
## Examples
50+
51+
### Delete After Expired
52+
53+
Creating a stream, where the rows are expired after one month. The expired rows where dates are Mondays are deleted:
54+
```sql
55+
CREATE STREAM stream_with_where
56+
(
57+
d datetime,
58+
a int
59+
)
60+
PARTITION BY to_YYYYMM(d)
61+
ORDER BY d
62+
TTL d + INTERVAL 1 MONTH DELETE WHERE to_day_of_week(d) = 1;
63+
```
64+
65+
### Recompress After Expired
66+
67+
Creating a stream, where expired rows are recompressed:
68+
```sql
69+
CREATE TABLE stream_for_recompression
70+
(
71+
d datetime,
72+
key uint64,
73+
value string
74+
)
75+
ORDER BY tuple()
76+
PARTITION BY key
77+
TTL d + INTERVAL 1 MONTH RECOMPRESS CODEC(ZSTD(17)), d + INTERVAL 1 YEAR RECOMPRESS CODEC(LZ4HC(10))
78+
SETTINGS
79+
min_rows_for_wide_part = 0,
80+
min_bytes_for_wide_part = 0;
81+
```
82+
83+
### Move After Expired
84+
85+
```sql
86+
CREATE STREAM stream_for_move
87+
(
88+
d datetime,
89+
a int
90+
)
91+
PARTITION BY to_YYYYMM(d)
92+
ORDER BY d
93+
TTL d + INTERVAL 1 MONTH DELETE,
94+
d + INTERVAL 1 WEEK TO VOLUME 'aaa',
95+
d + INTERVAL 2 WEEK TO DISK 'bbb';
96+
```

docs/append-stream.md

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,8 @@ SETTINGS
4040
fetch_threads=<remote-fetch-threads>,
4141
flush_threshold_count=<batch-flush-rows>,
4242
flush_threshold_ms=<batch-flush-timeout>,
43-
flush_threshold_bytes=<batch-flush-size>;
43+
flush_threshold_bytes=<batch-flush-size>,
44+
merge_with_ttl_timeout=<timeout-in-seconds>;
4445
```
4546

4647
### Storage Architecture
@@ -164,6 +165,10 @@ In most cases, you don't need a partition key, and if you do need to partition,
164165

165166
For partitioning by month, use the `to_YYYYMM(date_column)` expression, where `date_column` is a column with a date of the type `date`. The partition names here have the `YYYYMM` format.
166167

168+
### `TTL expr`
169+
170+
See [TTL](/append-stream-ttl) for details
171+
167172
### Settings
168173

169174
#### `shards`
@@ -292,14 +297,26 @@ Controls the parallelism when fetching data from remote shared storage.
292297

293298
Flushes data to the backend columnar store when this row threshold is reached.
294299

300+
**Default**: `100000`
301+
295302
#### `flush_threshold_ms`
296303

297304
Flushes data to the backend columnar store when this time threshold is reached.
298305

306+
**Default**: `2000`
307+
299308
#### `flush_threshold_bytes`
300309

301310
Flushes data to the backend columnar store when this bytes threshold is reached.
302311

312+
**Default**: `16777216`
313+
314+
#### `merge_with_ttl_timeout`
315+
316+
Minimum delay in seconds before repeating a merge with delete TTL.
317+
318+
**Default**: `14400`
319+
303320
## Enable Zero-Replication WAL
304321

305322
You can store WAL (NativeLog) data in S3-compatible cloud storage. To enable this, configure a disk and then create an append stream using that disk.

docs/mutable-stream.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -220,10 +220,14 @@ Controls the parallelism when fetching data from remote shared storage.
220220

221221
Flushes data to the backend key-value store (RocksDB) when this row threshold is reached.
222222

223+
**Default**: `100000`
224+
223225
#### `flush_ms`
224226

225227
Flushes data to the backend key-value store (RocksDB) when this time threshold is reached.
226228

229+
**Default**: `30000`
230+
227231
#### `log_kvstore`
228232

229233
If `true`, logs internal RocksDB activity for debugging.

docs/working-with-streams.md

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -9,29 +9,28 @@ Timeplus `streams` are conceptually similar to `tables` in traditional SQL datab
99

1010
To support a variety of use cases efficiently, Timeplus offers multiple types of streams:
1111

12-
1. [Append Stream](/append-stream)
12+
- [Append Stream](/append-stream)
1313

1414
The default stream type in Timeplus. It uses columnar encoding and is optimized for **range scans** via a **sorting key**. It suits workloads with infrequent data mutations (e.g., `UPDATE` or `DELETE`).
1515

16-
2. [Mutable Stream](/mutable-stream)
16+
- [Mutable Stream](/mutable-stream)
1717

1818
Row-encoded and similar in behavior to a **MySQL table**, where each primary key corresponds to a single row. It is optimized for **frequent data mutations** (`UPDATE`, `UPSERT`, `DELETE`) and supports **point and range queries** via **primary or secondary indexes**.
1919

20-
3. [Versioned Key-Value Stream](/versioned-stream)
20+
- [Versioned Key-Value Stream](/versioned-stream)
2121

2222
Similar to the mutable stream but uses **columnar encoding**. It offers better **compression** but lower performance for updates and point queries, especially when cardinality is high. Best suited for scenarios where **data mutations are less frequent**.
2323

24-
4. [Changelog Key-Value Stream](/changelog-stream)
24+
- [Changelog Key-Value Stream](/changelog-stream)
2525

2626
Designed to model **change data capture (CDC) events**, with **columnar encoding** for efficient downstream processing.
2727

28-
5. [External Stream](/external-stream)
28+
- [External Stream](/external-stream)
2929

3030
As the name implies, the data resides outside of Timeplus. Timeplus can reference external sources (e.g., a **Kafka topic**) and execute **streaming SQL** queries against them in real time.
3131

3232
> Note: Timeplus also supports [External Tables](/sql-create-external-table), which allow **historical queries and inserts** only (e.g., against ClickHouse, MySQL, PostgreSQL, MongoDB, etc.).
3333
34-
3534
## Stream Internals
3635

3736
When users [create a stream](/sql-create-stream) in Timeplus, they can specify the number of **shards** for the stream. Each shard consists of two core components at the storage layer:

0 commit comments

Comments
 (0)