Skip to content

Commit c36b2d9

Browse files
authored
Renaming settings (#502)
1 parent 36e6c1c commit c36b2d9

8 files changed

+38
-38
lines changed

docs/checkpoint-settings.md

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -9,10 +9,10 @@ SETTINGS default_hash_table='hybrid', default_hash_join='hybrid',
99
checkpoint_settings = 'incremental=true;interval=5';
1010
```
1111

12-
## checkpoint_settings
12+
## `checkpoint_settings`
1313
You can set key-value pairs in `checkpoint_settings`.
1414

15-
### type
15+
### `type`
1616

1717
**Definition**: Defines which checkpoint type to use.
1818

@@ -21,18 +21,18 @@ You can set key-value pairs in `checkpoint_settings`.
2121
- `auto` **(default)** - Automatically determine whether to use `file` or `rocks` checkpoint based on the query’s state type.
2222
- `file` - Native file format. You can explicitly use the local file system for the checkpoint storage, even for some materialized views, using rocksdb is recommended.
2323

24-
### storage_type
24+
### `replication_type`
2525

26-
**Definition**: Specifies where checkpoints will be stored.
26+
**Definition**: Specifies how checkpoints will be replicated.
2727

2828
**Possible Values**:
2929

3030
- `auto` **(default)** - Automatically determine whether to store in `local_file_system` or `nativelog`
3131
- `local_file_system` - Stored in local file system for a single instance environment
32-
- `nativelog` - Stored in nativelog, and ensure cluster synchronization through raft **(Only valid in clusters)**
33-
- `s3` - Stored in S3, it must be bound to `disk_name`
32+
- `nativelog` - Replicated via nativelog, and ensure cluster synchronization through raft **(Only valid in cluster setups)**
33+
- `shared` - Replicated via shared storage like S3, it must be bound to `shared_disk` **(Valid for both single instance and cluster setups)**
3434

35-
### async
35+
### `async`
3636

3737
**Definition**: Determines whether checkpoints are created asynchronously.
3838

@@ -41,7 +41,7 @@ You can set key-value pairs in `checkpoint_settings`.
4141
- `true` **(default)** - Asynchronous checkpoint replication
4242
- `false`
4343

44-
### incremental
44+
### `incremental`
4545

4646
**Definition**: Indicates whether checkpoints are incremental (saving only changes since the last checkpoint) or full.
4747

@@ -50,7 +50,7 @@ You can set key-value pairs in `checkpoint_settings`.
5050
- `false` **(default)**
5151
- `true` - Only enabled when using a hybrid hash table (Recommended for large states with low update frequency)
5252

53-
### interval
53+
### `interval`
5454

5555
**Definition**: Specifies the time interval in seconds between checkpoint operations.
5656

@@ -68,11 +68,11 @@ query_state_checkpoint:
6868
...
6969
```
7070

71-
### disk_name
71+
### `shared_disk`
7272

7373
**Definition**: Specifies a disk name, which can be created through sql`create disk {disk_name} ...`, which is used with a shared checkpoint storage (i.e. `S3`)
7474

75-
## checkpoint_interval
75+
## `checkpoint_interval`
7676

7777
In some cases, you may want to adjust the checkpoint interval after the materialized view is created. You can do this by modifying the `checkpoint_settings` parameter in the `ALTER VIEW` statement.
7878
```sql
@@ -100,8 +100,8 @@ checkpoint_settings = 'incremental=true;interval=5';
100100
For some scenarios with S3 checkpoint storage:
101101

102102
```sql
103-
--- create a S3 plain disk `diskS3`
104-
CREAET DISK diskS3 disk(
103+
--- create a S3 plain disk `s3_disk`
104+
CREAET DISK s3_disk disk(
105105
type='s3_plain',
106106
endpoint='http://localhost:11111/test/s3/',
107107
access_key_id='timeplusd',
@@ -112,5 +112,5 @@ CREATE MATERIALIZED VIEW mv AS
112112
SELECT key, count() FROM test group by key
113113
SETTINGS
114114
default_hash_table='hybrid', default_hash_join='hybrid',
115-
checkpoint_settings = 'storage_type=S3;disk_name=diskS3;incremental=true;interval=5';
115+
checkpoint_settings = 'replication_type=shared;shared_disk=s3_disk;incremental=true;interval=5';
116116
```

docs/global-aggregation.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -77,7 +77,7 @@ SHUFFLE BY location
7777
GROUP BY bucket_window_start, location, device
7878
EMIT ON UPDATE WITH BATCH 1s
7979
SETTINGS
80-
num_target_shards=8,
80+
substreams=8,
8181
default_hash_table='hybrid',
8282
max_hot_keys=100000,
8383
aggregate_state_ttl_sec=3600;

docs/materialized-perf-tuning.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -144,7 +144,7 @@ Most queries work well with default settings, but advanced workloads may require
144144

145145
### Data Shuffling
146146

147-
- `num_target_shards`: Used with `SHUFFLE BY`; number of target shards after shuffling.
147+
- `substreams`: Used with `SHUFFLE BY`; number of substreams after shuffling.
148148
`0` means the system will automatically pick a number. **Default: 0**
149149

150150
### Join

docs/materialized-view-checkpoint.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -108,7 +108,7 @@ SELECT
108108
FROM tumble(source, 5s)
109109
GROUP BY window_start, s
110110
SETTINGS
111-
checkpoint_settings='storage_type=shared;shared_disk=s3_plain_disk';
111+
checkpoint_settings='replication_type=shared;shared_disk=s3_plain_disk';
112112
```
113113

114114
### NativeLog + Shared Storage
@@ -149,7 +149,7 @@ SELECT
149149
FROM tumble(source, 5s)
150150
GROUP BY window_start, s
151151
SETTINGS
152-
checkpoint_settings = 'storage_type=nativelog;shared_disk=s3_plain_disk'; -- storage_type=nativelog optional
152+
checkpoint_settings = 'replication_type=nativelog;shared_disk=s3_plain_disk';
153153

154154
-- RocksDB-based incremental checkpoint with shared storage + NativeLog
155155
CREATE MATERIALIZED VIEW rocks_shared_nlog_ckpt_rep INTO sink
@@ -162,7 +162,7 @@ FROM tumble(source, 5s)
162162
GROUP BY window_start, s
163163
SETTINGS
164164
default_hash_table = 'hybrid', -- Uses RocksDB for incremental checkpoints
165-
checkpoint_settings = 'storage_type=nativelog;shared_disk=s3_plain_disk'; -- storage_type=nativelog optional
165+
checkpoint_settings = 'replication_type=nativelog;shared_disk=s3_plain_disk';
166166
```
167167

168168
## Choosing the Right Replication Strategy

docs/materialized-view-high-availability.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ In this model, a **centralized scheduler** monitors Materialized Views and resch
3434

3535
This model will be selected when creating a Materialized View with these settings:
3636
```sql
37-
SETTINGS checkpoint_settings='storage_type=shared;shared_disk=...'
37+
SETTINGS checkpoint_settings='replication_type=shared;shared_disk=...'
3838
```
3939

4040
We call this a **Scheduled Materialized View**, since it is governed by the scheduler.

docs/materialized-view.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -112,7 +112,7 @@ The **Materialized View checkpoint interval**, in seconds.
112112

113113
`checkpoint_settings` is a **semicolon-separated key/value string** that controls how query states are checkpointed. It supports the following keys:
114114

115-
- **`storage_type`**
115+
- **`replication_type`**
116116
- Defines where checkpoint data is stored.
117117
- Supported values: `auto` (default), `nativelog`, `shared`, `local_file_system`.
118118
- Users may fine-tune this for better checkpoint efficiency and performance.

docs/shuffle-data.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ FROM ...
1919
SHUFFLE BY col1, ...
2020
GROUP BY col1, col2, ...
2121
EMIT ...
22-
SETTINGS num_target_shards=<num-sub-streams>
22+
SETTINGS substreams=<num-sub-streams>
2323
```
2424

2525
> Note: The columns in the `SHUFFLE BY` clause must be a subset of the `GROUP BY` columns to ensure correct aggregation results.
@@ -62,7 +62,7 @@ The internal query plan for the above example looks like this:
6262

6363
By default, the system automatically determines the number of substreams after a shuffle. This default value may not be optimal, especially on nodes with many CPUs.
6464

65-
To customize this behavior, you can use the **`num_target_shards`** setting to control the number of target substreams.
65+
To customize this behavior, you can use the **`substreams`** setting to control the number of target substreams.
6666
- If not specified, the system typically chooses a value equal to the number of CPUs on the node.
6767

6868
**Example: Many-to-Many Data Shuffle**
@@ -84,17 +84,17 @@ FROM device_utils
8484
SHUFFLE BY location
8585
GROUP BY location, device
8686
EMIT ON UPDATE WITH BATCH 1s
87-
SETTINGS num_target_shards=8;
87+
SETTINGS substreams=8;
8888
```
8989

90-
The default system picked number of substreams after shuffle may be not ideal, especially when there are lots of CPUs in the node. You can use setting **`num_target_shards`** to control the number of target substreams. If it is not explicitly specified, the system will pick a value which is usually the number of CPUs of the node.
90+
The default system picked number of substreams after shuffle may be not ideal, especially when there are lots of CPUs in the node. You can use setting **`substreams`** to control the number of target substreams. If it is not explicitly specified, the system will pick a value which is usually the number of CPUs of the node.
9191

9292
The internal query plan for the above query looks like this:
9393

9494
![ShufflePipelineMany](/img/shuffle-pipeline-many-to-many.svg)
9595

9696
:::info
97-
The `num_target_shards` value is always rounded **up to the nearest power of 2** for better shuffle performance. For example, if specifying `5` will be rounded to `8`.
97+
The `substreams` value is always rounded **up to the nearest power of 2** for better shuffle performance. For example, if specifying `5` will be rounded to `8`.
9898
:::
9999

100100
## Data Already Shuffled in Storage

static/llms-full.txt

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -808,7 +808,7 @@ Possible Values:
808808
- auto (default) - Automatically determine whether to use file or rocks checkpoint based on the query’s state type.
809809
- file - Native file format. You can explicitly use the local file system for the checkpoint storage, even for some materialized views, using rocksdb is recommended.
810810

811-
### storage_type​
811+
### replication_type
812812

813813
Definition: Specifies where checkpoints will be stored.
814814

@@ -817,7 +817,7 @@ Possible Values:
817817
- auto (default) - Automatically determine whether to store in local_file_system or nativelog
818818
- local_file_system - Stored in local file system for a single instance environment
819819
- nativelog - Stored in nativelog, and ensure cluster synchronization through raft (Only valid in clusters)
820-
- s3 - Stored in S3, it must be bound to disk_name
820+
- shared_disk - Shared storage like S3, it must be bound
821821

822822
### async​
823823

@@ -847,7 +847,7 @@ Possible Values:
847847

848848
This is also configurable via the global configuration file.
849849

850-
### disk_name​
850+
### shared_disk
851851

852852
Definition: Specifies a disk name, which can be created through sqlcreate disk {disk_name} ..., which is used with a shared checkpoint storage (i.e. S3)
853853

@@ -865,22 +865,22 @@ For some scenarios with large states and low update frequency:
865865

866866
For some scenarios with S3 checkpoint storage:
867867

868-
- checkpoint_settingstypestorage_typeasyncincrementalintervaldisk_name
868+
- checkpoint_settingstypereplication_typeasyncincrementalintervalshared_disk
869869
- type
870-
- storage_type
870+
- replication_type
871871
- async
872872
- incremental
873873
- interval
874-
- disk_name
874+
- shared_disk
875875
- checkpoint_interval
876876
- Examples
877877

878878
- type
879-
- storage_type
879+
- replication_type
880880
- async
881881
- incremental
882882
- interval
883-
- disk_name
883+
- shared_disk
884884

885885

886886

@@ -13180,7 +13180,7 @@ Default: 120,000
1318013180

1318113181
Time in milliseconds to trigger a fsync
1318213182

13183-
#### storage_type​
13183+
#### replication_type
1318413184

1318513185
This is an advanced setting. Default value is hybrid to use both a streaming storage and a historical storage for the stream.
1318613186

@@ -13204,7 +13204,7 @@ For S3 Tried Storage, you can also specify when the cold data will be moved to S
1320413204
- Mutable Stream
1320513205
- Versioned Stream
1320613206
- Changelog Stream
13207-
- SETTINGSmodeshardsreplication_factorversion_columnkeep_versionsevent_time_columnlogstore_codeclogstore_retention_byteslogstore_retention_mslogstore_flush_messageslogstore_flush_msstorage_type
13207+
- SETTINGSmodeshardsreplication_factorversion_columnkeep_versionsevent_time_columnlogstore_codeclogstore_retention_byteslogstore_retention_mslogstore_flush_messageslogstore_flush_msreplication_type
1320813208
- mode
1320913209
- shards
1321013210
- replication_factor
@@ -13216,7 +13216,7 @@ For S3 Tried Storage, you can also specify when the cold data will be moved to S
1321613216
- logstore_retention_ms
1321713217
- logstore_flush_messages
1321813218
- logstore_flush_ms
13219-
- storage_type
13219+
- replication_type
1322013220
- TTL (Time-To-Live)
1322113221

1322213222
- Append Stream
@@ -13235,7 +13235,7 @@ For S3 Tried Storage, you can also specify when the cold data will be moved to S
1323513235
- logstore_retention_ms
1323613236
- logstore_flush_messages
1323713237
- logstore_flush_ms
13238-
- storage_type
13238+
- replication_type
1323913239

1324013240

1324113241

0 commit comments

Comments
 (0)