You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
You can set key-value pairs in `checkpoint_settings`.
14
14
15
-
### type
15
+
### `type`
16
16
17
17
**Definition**: Defines which checkpoint type to use.
18
18
@@ -21,18 +21,18 @@ You can set key-value pairs in `checkpoint_settings`.
21
21
-`auto`**(default)** - Automatically determine whether to use `file` or `rocks` checkpoint based on the query’s state type.
22
22
-`file` - Native file format. You can explicitly use the local file system for the checkpoint storage, even for some materialized views, using rocksdb is recommended.
23
23
24
-
### storage_type
24
+
### `replication_type`
25
25
26
-
**Definition**: Specifies where checkpoints will be stored.
26
+
**Definition**: Specifies how checkpoints will be replicated.
27
27
28
28
**Possible Values**:
29
29
30
30
-`auto`**(default)** - Automatically determine whether to store in `local_file_system` or `nativelog`
31
31
-`local_file_system` - Stored in local file system for a single instance environment
32
-
-`nativelog` - Stored in nativelog, and ensure cluster synchronization through raft **(Only valid in clusters)**
33
-
-`s3` - Stored in S3, it must be bound to `disk_name`
32
+
-`nativelog` - Replicated via nativelog, and ensure cluster synchronization through raft **(Only valid in cluster setups)**
33
+
-`shared` - Replicated via shared storage like S3, it must be bound to `shared_disk`**(Valid for both single instance and cluster setups)**
34
34
35
-
### async
35
+
### `async`
36
36
37
37
**Definition**: Determines whether checkpoints are created asynchronously.
38
38
@@ -41,7 +41,7 @@ You can set key-value pairs in `checkpoint_settings`.
**Definition**: Indicates whether checkpoints are incremental (saving only changes since the last checkpoint) or full.
47
47
@@ -50,7 +50,7 @@ You can set key-value pairs in `checkpoint_settings`.
50
50
-`false`**(default)**
51
51
-`true` - Only enabled when using a hybrid hash table (Recommended for large states with low update frequency)
52
52
53
-
### interval
53
+
### `interval`
54
54
55
55
**Definition**: Specifies the time interval in seconds between checkpoint operations.
56
56
@@ -68,11 +68,11 @@ query_state_checkpoint:
68
68
...
69
69
```
70
70
71
-
### disk_name
71
+
### `shared_disk`
72
72
73
73
**Definition**: Specifies a disk name, which can be created through sql`create disk {disk_name} ...`, which is used with a shared checkpoint storage (i.e. `S3`)
74
74
75
-
## checkpoint_interval
75
+
## `checkpoint_interval`
76
76
77
77
In some cases, you may want to adjust the checkpoint interval after the materialized view is created. You can do this by modifying the `checkpoint_settings` parameter in the `ALTER VIEW` statement.
Copy file name to clipboardExpand all lines: docs/shuffle-data.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -19,7 +19,7 @@ FROM ...
19
19
SHUFFLE BY col1, ...
20
20
GROUP BY col1, col2, ...
21
21
EMIT ...
22
-
SETTINGS num_target_shards=<num-sub-streams>
22
+
SETTINGS substreams=<num-sub-streams>
23
23
```
24
24
25
25
> Note: The columns in the `SHUFFLE BY` clause must be a subset of the `GROUP BY` columns to ensure correct aggregation results.
@@ -62,7 +62,7 @@ The internal query plan for the above example looks like this:
62
62
63
63
By default, the system automatically determines the number of substreams after a shuffle. This default value may not be optimal, especially on nodes with many CPUs.
64
64
65
-
To customize this behavior, you can use the **`num_target_shards`** setting to control the number of target substreams.
65
+
To customize this behavior, you can use the **`substreams`** setting to control the number of target substreams.
66
66
- If not specified, the system typically chooses a value equal to the number of CPUs on the node.
67
67
68
68
**Example: Many-to-Many Data Shuffle**
@@ -84,17 +84,17 @@ FROM device_utils
84
84
SHUFFLE BY location
85
85
GROUP BY location, device
86
86
EMIT ONUPDATE WITH BATCH 1s
87
-
SETTINGS num_target_shards=8;
87
+
SETTINGS substreams=8;
88
88
```
89
89
90
-
The default system picked number of substreams after shuffle may be not ideal, especially when there are lots of CPUs in the node. You can use setting **`num_target_shards`** to control the number of target substreams. If it is not explicitly specified, the system will pick a value which is usually the number of CPUs of the node.
90
+
The default system picked number of substreams after shuffle may be not ideal, especially when there are lots of CPUs in the node. You can use setting **`substreams`** to control the number of target substreams. If it is not explicitly specified, the system will pick a value which is usually the number of CPUs of the node.
91
91
92
92
The internal query plan for the above query looks like this:
The `num_target_shards` value is always rounded **up to the nearest power of 2** for better shuffle performance. For example, if specifying `5` will be rounded to `8`.
97
+
The `substreams` value is always rounded **up to the nearest power of 2** for better shuffle performance. For example, if specifying `5` will be rounded to `8`.
Copy file name to clipboardExpand all lines: static/llms-full.txt
+12-12Lines changed: 12 additions & 12 deletions
Original file line number
Diff line number
Diff line change
@@ -808,7 +808,7 @@ Possible Values:
808
808
- auto (default) - Automatically determine whether to use file or rocks checkpoint based on the query’s state type.
809
809
- file - Native file format. You can explicitly use the local file system for the checkpoint storage, even for some materialized views, using rocksdb is recommended.
810
810
811
-
### storage_type
811
+
### replication_type
812
812
813
813
Definition: Specifies where checkpoints will be stored.
814
814
@@ -817,7 +817,7 @@ Possible Values:
817
817
- auto (default) - Automatically determine whether to store in local_file_system or nativelog
818
818
- local_file_system - Stored in local file system for a single instance environment
819
819
- nativelog - Stored in nativelog, and ensure cluster synchronization through raft (Only valid in clusters)
820
-
- s3 - Stored in S3, it must be bound to disk_name
820
+
- shared_disk - Shared storage like S3, it must be bound
821
821
822
822
### async
823
823
@@ -847,7 +847,7 @@ Possible Values:
847
847
848
848
This is also configurable via the global configuration file.
849
849
850
-
### disk_name
850
+
### shared_disk
851
851
852
852
Definition: Specifies a disk name, which can be created through sqlcreate disk {disk_name} ..., which is used with a shared checkpoint storage (i.e. S3)
853
853
@@ -865,22 +865,22 @@ For some scenarios with large states and low update frequency:
0 commit comments