Skip to content

Commit 99080e3

Browse files
authored
Skipping indexes for append stream (#463)
1 parent 8f29119 commit 99080e3

File tree

6 files changed

+283
-18
lines changed

6 files changed

+283
-18
lines changed

docs/append-stream-indexes.md

Lines changed: 194 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,194 @@
1+
# Indexes
2+
3+
Append Streams support both **primary** and **skipping** indexes to accelerate historical queries.
4+
5+
## Primary Index
6+
7+
The primary key of an Append Stream determines the physical order of rows in the historical columnar store. The **primary index** is automatically built on top of it. Primary index is parse in Append Stream.
8+
9+
Choosing an effective primary key can greatly improve query performance, especially when `WHERE` predicates align with the primary index.
10+
11+
**Example**:
12+
13+
Take the **(counter_id, date)** sorting key as an example. In this case, the sorting and index can be illustrated as follows
14+
15+
```
16+
Whole data: [---------------------------------------------]
17+
CounterID: [aaaaaaaaaaaaaaaaaabbbbcdeeeeeeeeeeeeefgggggggghhhhhhhhhiiiiiiiiikllllllll]
18+
Date: [1111111222222233331233211111222222333211111112122222223111112223311122333]
19+
Marks: | | | | | | | | | | |
20+
a,1 a,2 a,3 b,3 e,2 e,3 g,1 h,2 i,1 i,3 l,3
21+
Marks numbers: 0 1 2 3 4 5 6 7 8 9 10
22+
```
23+
24+
If the data query specifies:
25+
26+
- **counter_id IN ('a', 'h')**, the server reads the data in the ranges of marks [0, 3) and [6, 8).
27+
- **counter_id IN ('a', 'h') AND date = 3**, the server reads the data in the ranges of marks [1, 3) and [7, 8).
28+
- **date = 3**, the server reads the data in the range of marks [1, 10].
29+
30+
The examples above show that it is always more effective to use an index than a full scan.
31+
32+
A sparse index allows extra data to be read. When reading a single range of the primary key, up to index_granularity * 2 extra rows in each data block can be read.
33+
34+
Sparse indexes allow you to work with a very large number of table rows, because in most cases, such indexes fit in the computer's RAM.
35+
36+
:::info
37+
Once an Append Stream is created, its `sorting key` **cannot be changed**.
38+
:::
39+
40+
## Skipping Indexes
41+
42+
You can create different skipping indexes on an Append Stream to accelerate queries when the primary key index alone is insufficient.
43+
44+
### Create Skipping Indexes
45+
46+
The index declaration is in the columns section of the `CREATE` query.
47+
48+
```sql
49+
INDEX index_name <expr> TYPE type(...) [GRANULARITY granularity_value]
50+
```
51+
52+
These indices aggregate some information about the specified expression on blocks, which consist of granularity_value granules (the size of the granule is specified using the `index_granularity` setting in the stream). Then these aggregates are used in the historical table queries for reducing the amount of data to read from the disk by skipping big blocks of data where the where query cannot be satisfied.
53+
54+
The `GRANULARITY` clause can be omitted, the default value of `granularity_value` is **1**.
55+
56+
```sql
57+
CREATE STREAM test
58+
(
59+
u64 uint64,
60+
i32 int32,
61+
s string,
62+
...
63+
INDEX idx1 u64 TYPE bloom_filter GRANULARITY 3,
64+
INDEX idx2 u64 * i32 TYPE minmax GRANULARITY 3,
65+
INDEX idx3 u64 * length(s) TYPE set(1000) GRANULARITY 4
66+
)
67+
...
68+
```
69+
70+
Indices from the example can be used by ClickHouse to reduce the amount of data to read from disk in the following queries:
71+
72+
```sql
73+
SELECT count() FROM table(test) WHERE u64 == 10;
74+
SELECT count() FROM table(test) WHERE u64 * i32 >= 1234;
75+
SELECT count() FROM table(test) WHERE u64 * length(s) == 1234;
76+
```
77+
78+
Data skipping indexes can also be created on composite columns:
79+
80+
```sql
81+
-- on columns of type map:
82+
INDEX map_key_index map_keys(map_column) TYPE bloom_filter
83+
INDEX map_value_index map_values(map_column) TYPE bloom_filter
84+
85+
-- on columns of type tuple:
86+
INDEX tuple_1_index tuple_column.1 TYPE bloom_filter
87+
INDEX tuple_2_index tuple_column.2 TYPE bloom_filter
88+
89+
-- on columns of type nested:
90+
INDEX nested_1_index col.nested_col1 TYPE bloom_filter
91+
INDEX nested_2_index col.nested_col2 TYPE bloom_filter
92+
```
93+
94+
### Skip Index Types
95+
96+
Append Streams support the following types of skip indexes.
97+
98+
- **minmax** index
99+
- **set** index
100+
- **bloom_filter** index
101+
- **ngrambf_v1** index
102+
- **tokenbf_v1** index
103+
104+
105+
#### MinMax
106+
107+
For each index granule, the minimum and maximum values of an expression are stored. (If the expression is of type **tuple**, it stores the minimum and maximum for each tuple element.)
108+
109+
```sql
110+
TYPE minmax
111+
```
112+
113+
**Example:**
114+
115+
```sql
116+
INDEX idx2 u64 TYPE minmax GRANULARITY 3
117+
```
118+
119+
#### Set
120+
121+
For each index granule at most **max_rows** many unique values of the specified expression are stored. **max_rows = 0** means "store all unique values".
122+
123+
```sql
124+
TYPE minmax(max_rows)
125+
```
126+
127+
**Example:**
128+
```sql
129+
INDEX idx3 u64 TYPE set(1000) GRANULARITY 4
130+
```
131+
132+
#### Bloom filter
133+
134+
For each index granule stores a bloom filter for the specified columns.
135+
136+
```sql
137+
TYPE bloom_filter([false_positive_rate])
138+
```
139+
140+
**Example:**
141+
```sql
142+
INDEX idx1 u64 TYPE bloom_filter GRANULARITY 3
143+
```
144+
145+
The **`false_positive_rate`** parameter can take on a value between **0** and **1** (by default: **0.025**) and specifies the probability of generating a positive (which increases the amount of data to be read).
146+
147+
The following data types are supported:
148+
149+
- **`(u)int*`**
150+
- **`float*`**
151+
- **`enum`**
152+
- **`date`**
153+
- **`date_time`**
154+
- **`string`**
155+
- **`fixed_string`**
156+
- **`array`**
157+
- **`low_cardinality`**
158+
- **`nullable`**
159+
- **`uuid`**
160+
- **`map`**
161+
162+
163+
:::info
164+
**`map`** data type: specifying index creation with keys or values
165+
For the **`map`** data type, the client can specify if the index should be created for keys or for values using the **`map_keys`** or **`map_values`** functions.
166+
:::
167+
168+
169+
#### N-gram bloom filter
170+
171+
For each index granule stores a **bloom filter** for the **n-grams** of the specified columns.
172+
173+
```sql
174+
TYPE ngrambf_v1(n, size_of_bloom_filter_in_bytes, number_of_hash_functions, random_seed)
175+
```
176+
177+
Paramters Explaination:
178+
- **`n`**: ngram size
179+
- **`size_of_bloom_filter_in_bytes`**: Bloom filter size in bytes. You can use a large value here, for example, 256 or 512, because it can be compressed well).
180+
- **`number_of_hash_functions`**: The number of hash functions used in the bloom filter.
181+
- **`random_seed`**: Seed for the bloom filter hash functions.
182+
183+
This index only works with the following data types:
184+
- **`string`**
185+
- **`fixed_string`**
186+
- **`map`**
187+
188+
#### Token bloom filter
189+
190+
The token bloom filter is the same as ngrambf_v1, but stores tokens (sequences separated by non-alphanumeric characters) instead of ngrams.
191+
192+
```sql
193+
TYPE tokenbf_v1(size_of_bloom_filter_in_bytes, number_of_hash_functions, random_seed)
194+
```

docs/append-stream-skipping-indexes.md

Lines changed: 0 additions & 1 deletion
This file was deleted.

docs/append-stream.md

Lines changed: 54 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,22 @@
11
# Append Stream
22

3-
An **Append Stream** in Timeplus is best understood as a **streaming ClickHouse / Snowflake** table that uses a columnar format, designed and optimized for streaming analytics workloads where frequent data mutations are uncommon.
3+
An **Append Stream** in Timeplus is best understood as a **streaming ClickHouse / Snowflake** table that uses a columnar format, designed for high data ignest rates and huge data volumes, and optimized for streaming analytics workloads where frequent data mutations are uncommon.
44

55
## Create Append Stream
66

77
```sql
88
CREATE STREAM [IF NOT EXISTS] <db.stream-name>
99
(
1010
name1 [type1] [DEFAULT | ALIAS expr1] [COMMENT 'column-comment'] [compression_codec],
11-
name2 [type2] [DEFAULT | ALIAS expr1] [COMMENT 'column-comment'] [compression_codec],
11+
name2 [type2] [DEFAULT | ALIAS expr2] [COMMENT 'column-comment'] [compression_codec],
12+
...
13+
INDEX index-name1 expr1 TYPE type1(...) [GRANULARITY value1],
14+
INDEX index-name2 expr2 TYPE type2(...) [GRANULARITY value1],
1215
...
1316
)
14-
ORDER BY (column, ...)
17+
ORDER BY <expression>
1518
[PARTITION BY <expression>]
16-
[PRIMARY KEY (column, ...)]
19+
[PRIMARY KEY <expression>]
1720
[TTL expr
1821
[DELETE | TO DISK 'xxx' | TO VOLUME 'xxx' [, ...] ]
1922
[WHERE conditions]
@@ -47,6 +50,8 @@ Each shard in a Append Stream has [dural storage](/architecture#dural-storage),
4750
- Write-Ahead Log (WAL), powered by NativeLog. Enabling incremental processing.
4851
- Historical store, powered by high performant columnar data store.
4952

53+
Data is first ingested into the WAL, and then asynchronously committed to the historical columnar store in large batches.
54+
5055
The Append Stream settings allow fine-tuning of both storage layers to balance performance, durability, and efficiency.
5156

5257
### Default Values
@@ -115,6 +120,50 @@ SELECT id, size_bytes, size FROM test;
115120

116121
See [column compression codecs](/append-stream-codecs) for details.
117122

123+
### `ORDER BY expr`
124+
125+
**ORDER BY** — Defines the sorting key. **Required.**
126+
127+
You can specify a tuple of column names or arbitrary expressions.
128+
129+
Example:
130+
```sql
131+
ORDER BY (counter_id + 1, event_date)
132+
```
133+
134+
The sorting key determines the physical order of rows in the historical store. This not only improves query performance but can also enhance data compression. Internally, data in the historical store is always sorted by this key.
135+
136+
### `PRIMARY KEY expr`
137+
138+
**PRIMARY KEY** — Defines the primary index.
139+
140+
If not explicitly declared, the primary key defaults to the same expression as **ORDER BY**. If specified, the primary key expression must be a prefix of the **ORDER BY** expression.
141+
142+
143+
Example:
144+
```sql
145+
CREATE STREAM append
146+
(
147+
p string,
148+
p1 string,
149+
i int
150+
)
151+
ORDER BY (p, p1, i)
152+
PRIMARY KEY (p, p1); -- Primary key expression '(p, p1)' is a prefix of sorting expression '(p, p1, i)'
153+
```
154+
155+
In Append Streams, the primary key does not need to be unique (multiple rows can have same primary key which is different than [Mutable Stream](/mutable-stream#primary-key)).
156+
157+
Choosing an effective primary key can significantly speed up historical queries when **WHERE** predicates can leverage the primary index.
158+
159+
### `PARTITION BY expr`
160+
161+
**PARTITION BY** — Defines the partitioning key. **Optional.**
162+
163+
In most cases, you don't need a partition key, and if you do need to partition, generally you do not need a partition key more granular than by month. You should never use too granular partitioning. Don't partition your data by client identifiers or names (instead, make client identifier or name the first column in the **ORDER BY** expression).
164+
165+
For partitioning by month, use the `to_YYYYMM(date_column)` expression, where `date_column` is a column with a date of the type `date`. The partition names here have the `YYYYMM` format.
166+
118167
### Settings
119168

120169
#### `shards`
@@ -281,7 +330,7 @@ The following example creates an append-stream with:
281330
- zstd compression for WAL data
282331

283332
```sql
284-
CREATE MUTABLE STREAM elastic_serving_stream
333+
CREATE STREAM elastic_serving_stream
285334
(
286335
p string,
287336
id uint64,
Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,18 @@
1-
# Secondary Indexes
1+
# Indexes
2+
3+
Mutable Streams support both **primary** and **secondary** indexes, similar to MySQL or Postgres tables, to accelerate historical queries.
4+
5+
## Primary Index
6+
7+
The primary key of a Mutable Stream determines the physical order of rows in the historical key/value store. The **primary index** is automatically built on top of it.
8+
9+
Choosing an effective primary key can greatly improve query performance, especially when `WHERE` predicates align with the primary index.
10+
11+
:::info
12+
Once a Mutable Stream is created, its primary key **cannot be changed**.
13+
:::
14+
15+
## Secondary Indexes
216

317
You can create secondary indexes on a Mutable Stream, similar to MySQL tables, to accelerate queries when the primary key index alone is insufficient.
418

docs/mutable-stream.md

Lines changed: 13 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -65,8 +65,21 @@ Each shard in a Mutable Stream has [dural storage](/architecture#dural-storage),
6565
- Write-Ahead Log (WAL), powered by NativeLog. Enabling incremental processing.
6666
- Historical key-value store, powered by RocksDB.
6767

68+
Data is first ingested into the WAL, and then asynchronously committed to the row store in large batches.
69+
6870
The Mutable Stream settings allow fine-tuning of both storage layers to balance performance, durability, and efficiency.
6971

72+
### Secondary Indexes
73+
74+
See the [Secondary Indexes](/mutable-stream-secondary-index) documentation for details.
75+
76+
### PRIMARY KEY
77+
78+
**PRIMARY KEY** — Defines the uniqueness of a row in a Mutable Stream. **Required.**
79+
80+
Rows are organized and sorted based on the primary key, and the primary index is built on top of it.
81+
See [Mutable Stream Indexes](/mutable-stream-indexes) for more details.
82+
7083
### Settings
7184

7285
#### `shards`
@@ -263,10 +276,6 @@ CREATE MUTABLE STREAM auto_incr
263276
PRIMARY KEY (p);
264277
```
265278

266-
## Secondary Indexes
267-
268-
See the [Secondary Indexes](/mutable-stream-secondary-index) documentation for details.
269-
270279
## Column Families
271280

272281
A **column family** is a way to group related columns together with these grouping rules.

sidebars.js

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -276,11 +276,11 @@ const sidebars = {
276276
id: "append-stream",
277277
},
278278
items: [
279-
// {
280-
// type: "doc",
281-
// id: "append-stream-skipping-indexes",
282-
// label: "Skipping Indexes",
283-
// },
279+
{
280+
type: "doc",
281+
id: "append-stream-indexes",
282+
label: "Indexes",
283+
},
284284
{
285285
type: "doc",
286286
id: "append-stream-codecs",
@@ -310,8 +310,8 @@ const sidebars = {
310310
items: [
311311
{
312312
type: "doc",
313-
id: "mutable-stream-secondary-index",
314-
label: "Secondary Indexes",
313+
id: "mutable-stream-indexes",
314+
label: "Indexes",
315315
},
316316
{
317317
type: "doc",

0 commit comments

Comments
 (0)