Renaming settings (#502)

chenziliang · web-flow · commit c36b2d9b900b · 2025-10-14T09:52:39.000-07:00
diff --git a/docs/checkpoint-settings.md b/docs/checkpoint-settings.md
@@ -9,10 +9,10 @@ SETTINGS default_hash_table='hybrid', default_hash_join='hybrid',
 checkpoint_settings = 'incremental=true;interval=5';
 ```
 
-## checkpoint_settings
+## `checkpoint_settings`
 You can set key-value pairs in `checkpoint_settings`.
 
-### type
+### `type`
 
 **Definition**: Defines which checkpoint type to use.
 
@@ -21,18 +21,18 @@ You can set key-value pairs in `checkpoint_settings`.
 - `auto`  **(default)** - Automatically determine whether to use `file` or `rocks` checkpoint based on the query’s state type.
 - `file` - Native file format. You can explicitly use the local file system for the checkpoint storage, even for some materialized views, using rocksdb is recommended.
 
-### storage_type
+### `replication_type` 
 
-**Definition**: Specifies where checkpoints will be stored.
+**Definition**: Specifies how checkpoints will be replicated.
 
 **Possible Values**:
 
 - `auto` **(default)**  - Automatically determine whether to store in `local_file_system` or `nativelog`
 - `local_file_system`  - Stored in local file system for a single instance environment
-- `nativelog` - Stored in nativelog, and ensure cluster synchronization through raft **(Only valid in clusters)**
-- `s3` - Stored in S3, it must be bound to `disk_name`
+- `nativelog` - Replicated via nativelog, and ensure cluster synchronization through raft **(Only valid in cluster setups)**
+- `shared` - Replicated via shared storage like S3, it must be bound to `shared_disk` **(Valid for both single instance and cluster setups)**
 
-### async
+### `async`
 
 **Definition**: Determines whether checkpoints are created asynchronously.
 
@@ -41,7 +41,7 @@ You can set key-value pairs in `checkpoint_settings`.
 - `true` **(default)** - Asynchronous checkpoint replication
 - `false`
 
-### incremental
+### `incremental`
 
 **Definition**: Indicates whether checkpoints are incremental (saving only changes since the last checkpoint) or full.
 
@@ -50,7 +50,7 @@ You can set key-value pairs in `checkpoint_settings`.
 - `false` **(default)**
 - `true`  - Only enabled when using a hybrid hash table (Recommended for large states with low update frequency)
 
-### interval
+### `interval`
 
 **Definition**: Specifies the time interval in seconds between checkpoint operations.
 
@@ -68,11 +68,11 @@ query_state_checkpoint:
     ...
 ```
 
-### disk_name
+### `shared_disk` 
 
 **Definition**: Specifies a disk name, which can be created through sql`create disk {disk_name} ...`, which is used with a shared checkpoint storage (i.e. `S3`)
 
-## checkpoint_interval
+## `checkpoint_interval`
 
 In some cases, you may want to adjust the checkpoint interval after the materialized view is created. You can do this by modifying the `checkpoint_settings` parameter in the `ALTER VIEW` statement.
 ```sql
@@ -100,8 +100,8 @@ checkpoint_settings = 'incremental=true;interval=5';
 For some scenarios with S3 checkpoint storage:
 
 ```sql
---- create a S3 plain disk `diskS3`
-CREAET DISK diskS3 disk(
+--- create a S3 plain disk `s3_disk`
+CREAET DISK s3_disk disk(
     type='s3_plain',
     endpoint='http://localhost:11111/test/s3/',
     access_key_id='timeplusd',
@@ -112,5 +112,5 @@ CREATE MATERIALIZED VIEW mv AS
 SELECT key, count() FROM test group by key
 SETTINGS
 default_hash_table='hybrid', default_hash_join='hybrid',
-checkpoint_settings = 'storage_type=S3;disk_name=diskS3;incremental=true;interval=5';
+checkpoint_settings = 'replication_type=shared;shared_disk=s3_disk;incremental=true;interval=5';
 ```
diff --git a/docs/global-aggregation.md b/docs/global-aggregation.md
@@ -77,7 +77,7 @@ SHUFFLE BY location
 GROUP BY bucket_window_start, location, device 
 EMIT ON UPDATE WITH BATCH 1s
 SETTINGS 
-  num_target_shards=8, 
+  substreams=8, 
   default_hash_table='hybrid', 
   max_hot_keys=100000, 
   aggregate_state_ttl_sec=3600;
diff --git a/docs/materialized-perf-tuning.md b/docs/materialized-perf-tuning.md
@@ -144,7 +144,7 @@ Most queries work well with default settings, but advanced workloads may require
 
 ### Data Shuffling
 
-- `num_target_shards`: Used with `SHUFFLE BY`; number of target shards after shuffling. 
+- `substreams`: Used with `SHUFFLE BY`; number of substreams after shuffling. 
   `0` means the system will automatically pick a number. **Default: 0**
 
 ### Join
diff --git a/docs/materialized-view-checkpoint.md b/docs/materialized-view-checkpoint.md
@@ -108,7 +108,7 @@ SELECT
 FROM tumble(source, 5s)
 GROUP BY window_start, s
 SETTINGS
-    checkpoint_settings='storage_type=shared;shared_disk=s3_plain_disk';
+    checkpoint_settings='replication_type=shared;shared_disk=s3_plain_disk';
 ```
 
 ### NativeLog + Shared Storage
@@ -149,7 +149,7 @@ SELECT
 FROM tumble(source, 5s)
 GROUP BY window_start, s
 SETTINGS
-    checkpoint_settings = 'storage_type=nativelog;shared_disk=s3_plain_disk'; -- storage_type=nativelog optional
+    checkpoint_settings = 'replication_type=nativelog;shared_disk=s3_plain_disk';
 
 -- RocksDB-based incremental checkpoint with shared storage + NativeLog
 CREATE MATERIALIZED VIEW rocks_shared_nlog_ckpt_rep INTO sink
@@ -162,7 +162,7 @@ FROM tumble(source, 5s)
 GROUP BY window_start, s
 SETTINGS
     default_hash_table = 'hybrid', -- Uses RocksDB for incremental checkpoints
-    checkpoint_settings = 'storage_type=nativelog;shared_disk=s3_plain_disk'; -- storage_type=nativelog optional
+    checkpoint_settings = 'replication_type=nativelog;shared_disk=s3_plain_disk';
 ```
 
 ## Choosing the Right Replication Strategy
diff --git a/docs/materialized-view-high-availability.md b/docs/materialized-view-high-availability.md
@@ -34,7 +34,7 @@ In this model, a **centralized scheduler** monitors Materialized Views and resch
 
 This model will be selected when creating a Materialized View with these settings:  
 ```sql
-SETTINGS checkpoint_settings='storage_type=shared;shared_disk=...'
+SETTINGS checkpoint_settings='replication_type=shared;shared_disk=...'
 ```
 
 We call this a **Scheduled Materialized View**, since it is governed by the scheduler.
diff --git a/docs/materialized-view.md b/docs/materialized-view.md
@@ -112,7 +112,7 @@ The **Materialized View checkpoint interval**, in seconds.
 
 `checkpoint_settings` is a **semicolon-separated key/value string** that controls how query states are checkpointed. It supports the following keys:
 
-- **`storage_type`**
+- **`replication_type`**
    - Defines where checkpoint data is stored.
    - Supported values: `auto` (default), `nativelog`, `shared`, `local_file_system`.
    - Users may fine-tune this for better checkpoint efficiency and performance.
diff --git a/docs/shuffle-data.md b/docs/shuffle-data.md
@@ -19,7 +19,7 @@ FROM ...
 SHUFFLE BY col1, ...
 GROUP BY col1, col2, ...
 EMIT ...
-SETTINGS num_target_shards=<num-sub-streams>
+SETTINGS substreams=<num-sub-streams>
 ```
 
 > Note: The columns in the `SHUFFLE BY` clause must be a subset of the `GROUP BY` columns to ensure correct aggregation results.
@@ -62,7 +62,7 @@ The internal query plan for the above example looks like this:
 
 By default, the system automatically determines the number of substreams after a shuffle. This default value may not be optimal, especially on nodes with many CPUs.  
 
-To customize this behavior, you can use the **`num_target_shards`** setting to control the number of target substreams.  
+To customize this behavior, you can use the **`substreams`** setting to control the number of target substreams.  
 - If not specified, the system typically chooses a value equal to the number of CPUs on the node.  
 
 **Example: Many-to-Many Data Shuffle** 
@@ -84,17 +84,17 @@ FROM device_utils
 SHUFFLE BY location 
 GROUP BY location, device 
 EMIT ON UPDATE WITH BATCH 1s
-SETTINGS num_target_shards=8; 
+SETTINGS substreams=8; 
 ```
 
-The default system picked number of substreams after shuffle may be not ideal, especially when there are lots of CPUs in the node.  You can use setting **`num_target_shards`**  to control the number of target substreams. If it is not explicitly specified, the system will pick a value which is usually the number of CPUs of the node. 
+The default system picked number of substreams after shuffle may be not ideal, especially when there are lots of CPUs in the node.  You can use setting **`substreams`**  to control the number of target substreams. If it is not explicitly specified, the system will pick a value which is usually the number of CPUs of the node. 
 
 The internal query plan for the above query looks like this:
 
 ![ShufflePipelineMany](/img/shuffle-pipeline-many-to-many.svg)
 
 :::info
-The `num_target_shards` value is always rounded **up to the nearest power of 2** for better shuffle performance. For example, if specifying `5` will be rounded to `8`. 
+The `substreams` value is always rounded **up to the nearest power of 2** for better shuffle performance. For example, if specifying `5` will be rounded to `8`. 
 :::
 
 ## Data Already Shuffled in Storage  
diff --git a/static/llms-full.txt b/static/llms-full.txt
@@ -808,7 +808,7 @@ Possible Values:
 - auto  (default) - Automatically determine whether to use file or rocks checkpoint based on the query’s state type.
 - file - Native file format. You can explicitly use the local file system for the checkpoint storage, even for some materialized views, using rocksdb is recommended.
 
-### storage_type​
+### replication_type 
 
 Definition: Specifies where checkpoints will be stored.
 
@@ -817,7 +817,7 @@ Possible Values:
 - auto (default)  - Automatically determine whether to store in local_file_system or nativelog
 - local_file_system  - Stored in local file system for a single instance environment
 - nativelog - Stored in nativelog, and ensure cluster synchronization through raft (Only valid in clusters)
-- s3 - Stored in S3, it must be bound to disk_name
+- shared_disk - Shared storage like S3, it must be bound 
 
 ### async​
 
@@ -847,7 +847,7 @@ Possible Values:
 
 This is also configurable via the global configuration file.
 
-### disk_name​
+### shared_disk 
 
 Definition: Specifies a disk name, which can be created through sqlcreate disk {disk_name} ..., which is used with a shared checkpoint storage (i.e. S3)
 
@@ -865,22 +865,22 @@ For some scenarios with large states and low update frequency:
 
 For some scenarios with S3 checkpoint storage:
 
-- checkpoint_settingstypestorage_typeasyncincrementalintervaldisk_name
+- checkpoint_settingstypereplication_typeasyncincrementalintervalshared_disk
 - type
-- storage_type
+- replication_type
 - async
 - incremental
 - interval
-- disk_name
+- shared_disk 
 - checkpoint_interval
 - Examples
 
 - type
-- storage_type
+- replication_type
 - async
 - incremental
 - interval
-- disk_name
+- shared_disk 
 
 
 
@@ -13180,7 +13180,7 @@ Default: 120,000
 
 Time in milliseconds to trigger a fsync
 
-#### storage_type​
+#### replication_type 
 
 This is an advanced setting. Default value is hybrid to use both a streaming storage and a historical storage for the stream.
 
@@ -13204,7 +13204,7 @@ For S3 Tried Storage, you can also specify when the cold data will be moved to S
 - Mutable Stream
 - Versioned Stream
 - Changelog Stream
-- SETTINGSmodeshardsreplication_factorversion_columnkeep_versionsevent_time_columnlogstore_codeclogstore_retention_byteslogstore_retention_mslogstore_flush_messageslogstore_flush_msstorage_type
+- SETTINGSmodeshardsreplication_factorversion_columnkeep_versionsevent_time_columnlogstore_codeclogstore_retention_byteslogstore_retention_mslogstore_flush_messageslogstore_flush_msreplication_type
 - mode
 - shards
 - replication_factor
@@ -13216,7 +13216,7 @@ For S3 Tried Storage, you can also specify when the cold data will be moved to S
 - logstore_retention_ms
 - logstore_flush_messages
 - logstore_flush_ms
-- storage_type
+- replication_type
 - TTL (Time-To-Live)
 
 - Append Stream
@@ -13235,7 +13235,7 @@ For S3 Tried Storage, you can also specify when the cold data will be moved to S
 - logstore_retention_ms
 - logstore_flush_messages
 - logstore_flush_ms
-- storage_type
+- replication_type