Skip to content

Commit d4f67f4

Browse files
authored
refine mv perf tuning settings (#471)
1 parent 0cc9425 commit d4f67f4

File tree

1 file changed

+33
-33
lines changed

1 file changed

+33
-33
lines changed

docs/materialized-perf-tuning.md

Lines changed: 33 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -123,55 +123,55 @@ Most queries work well with default settings, but advanced workloads may require
123123

124124
### Data Read & Processing
125125

126-
- `max_threads`: Maximum threads for query execution (soft limit).
127-
- `max_block_size`: Maximum rows per read block.
128-
- `input_format_parallel_parsing`: Enable parallel parsing (for supported formats).
129-
- `fetch_buffer_size`: Remote fetch buffer size per query.
130-
- `fetch_threads`: Threads for fetching from shared disk.
131-
- `record_consume_batch_count`: Maximum number of records to consume in one batch.
132-
- `record_consume_batch_size`: Maximum batch size in bytes.
133-
- `record_consume_timeout_ms`: Timeout for batch consumption.
126+
- `max_threads`: Maximum threads for query execution (soft limit). `0` means the system will automatically pick a value which is usually the number of CPUs. **Default: 0**
127+
- `max_block_size`: Maximum rows per read block. **Default: 65409**
128+
- `input_format_parallel_parsing`: Enable parallel parsing (for supported formats). **Default: true**
129+
- `fetch_buffer_size`: Remote fetch buffer size per query. **Default: 64 * 1024 * 1024**
130+
- `fetch_threads`: Threads for fetching from shared disk. **Default: 1**
131+
- `record_consume_batch_count`: Maximum number of records to consume in one batch. **Default: 1000**
132+
- `record_consume_batch_size`: Maximum batch size in bytes. **Default: 10 * 1024 * 1024**
133+
- `record_consume_timeout_ms`: Timeout for batch consumption. **Default: 100**
134134

135135
### Data Write
136136

137-
- `max_insert_threads`: Maximum threads for concurrent inserts (when possible).
138-
- `min_insert_block_size_rows`: Minimum block size in rows before flushing to the target.
139-
- `min_insert_block_size_bytes`: Minimum block size in bytes before flushing to the target.
140-
- `max_insert_block_size`: Maximum block size in rows before forcing a flush (batch write).
141-
- `max_insert_block_bytes`: Maximum block size in bytes before forcing a flush (batch write).
142-
- `insert_block_timeout_ms`: Timeout threshold (in ms) before forcing a flush (batch write).
143-
- `output_format_parallel_formatting`: Enable parallel formatting for certain output formats.
137+
- `max_insert_threads`: Maximum threads for concurrent inserts (when possible). `0` means the system will automatically pick a value. **Default: 0**
138+
- `min_insert_block_size_rows`: Minimum block size in rows before flushing to the target. **Default: 65409**
139+
- `min_insert_block_size_bytes`: Minimum block size in bytes before flushing to the target. **Default: 65409 * 256**
140+
- `max_insert_block_size`: Maximum block size in rows before forcing a flush (batch write). **Default: 65409**
141+
- `max_insert_block_bytes`: Maximum block size in bytes before forcing a flush (batch write). **Default: 1024 * 1024**
142+
- `insert_block_timeout_ms`: Timeout threshold (in ms) before forcing a flush (batch write). **Default: 500**
143+
- `output_format_parallel_formatting`: Enable parallel formatting for certain output formats. **Default: true**
144144

145145
### Data Shuffling
146146

147-
- `num_target_shards`: Used with `SHUFFLE BY`; number of target shards after shuffling.
148-
`0` means the system will automatically pick a number.
147+
- `num_target_shards`: Used with `SHUFFLE BY`; number of target shards after shuffling.
148+
`0` means the system will automatically pick a number. **Default: 0**
149149

150150
### Join
151151

152-
- `max_joined_block_size_rows`: Maximum block size (in rows) for JOIN results. `0` means unlimited.
153-
- `join_algorithm`: Algorithm for join execution (`parallel_hash`, `hash`, `direct`, etc.).
154-
- `join_max_buffered_bytes`: Maximum buffered bytes for stream-to-stream joins.
155-
- `join_buffered_data_block_size`: Block size used when buffering data in memory; merges small blocks into larger ones for efficiency. `0` disables merging.
156-
- `join_quiesce_threshold_ms`: Maximum wait time (ms) when one side of the join is quiesced.
157-
- `join_latency_threshold`: Controls when to align and start joining left/right streams. `0` lets the system choose automatically.
158-
- `default_hash_join`: Controls which hash join implementation is used for streaming joins.
152+
- `max_joined_block_size_rows`: Maximum block size (in rows) for JOIN results. `0` means unlimited. **Default: 65409**
153+
- `join_algorithm`: Algorithm for join execution (`parallel_hash`, `hash`, `direct`, etc.). **Default: default**
154+
- `join_max_buffered_bytes`: Maximum buffered bytes for stream-to-stream joins. **Default: 524288000**
155+
- `join_buffered_data_block_size`: Block size used when buffering data in memory; merges small blocks into larger ones for efficiency. `0` disables merging. **Default: 0**
156+
- `join_quiesce_threshold_ms`: Maximum wait time (ms) when one side of the join is quiesced. **Default: 1000**
157+
- `join_latency_threshold`: Controls when to align and start joining left/right streams. `0` lets the system choose the value automatically. **Default: 0**
158+
- `default_hash_join`: Controls which hash join (`memory or hybrid`) implementation is used for streaming joins:. **Default: memory**
159159

160160
### Aggregation
161161

162-
- `default_hash_table`: Controls which hash table is used for streaming queries (joins, aggregations).
162+
- `default_hash_table`: Controls which hash table (`memory or hybrid`) is used for streaming queries (joins, aggregations). **Default: memory**
163163
- Emit strategy is also critical for tuning. See [Streaming Aggregations: Emit Strategy](/streaming-aggregations#emit) for details.
164164

165165
### Backfill
166166

167-
- `enable_backfill_from_historical_store`: Enable backfill from historical data stores.
168-
- `emit_during_backfill`: Emit intermediate aggregation results while backfilling historical data.
169-
- `force_backfill_in_order`: Require backfill data to be processed strictly in sequence order.
167+
- `enable_backfill_from_historical_store`: Enable backfill from historical data stores. **Default: true**
168+
- `emit_during_backfill`: Emit intermediate aggregation results while backfilling historical data. **Default: false**
169+
- `force_backfill_in_order`: Controls if backfilling data to be processed strictly in sequence order which requires sorting according to `_tp_sn` if it is true. **Default: false**
170170

171171
### Miscellaneous
172172

173-
- `max_memory_usage`: Maximum memory usage per query. `0` means unlimited.
174-
- `count_distinct_optimization`: Rewrite `COUNT DISTINCT` into a `GROUP BY` subquery for optimization.
175-
- `javascript_vms`: Number of JavaScript VMs to use in one query (for executing JavaScript UDFs).
176-
- `use_index`: Apply a specific index when querying mutable streams.
177-
- `enforce_append_only`: For changelog storage, enforce append-only query mode.
173+
- `max_memory_usage`: Maximum memory usage per query. `0` means unlimited. **Default: 0**
174+
- `count_distinct_optimization`: Rewrite `COUNT DISTINCT` into a `GROUP BY` subquery for optimization. **Default: false**
175+
- `javascript_vms`: Number of JavaScript VMs to use in one query (for executing JavaScript UDFs). **Default: 1**
176+
- `use_index`: Apply a specific index when querying mutable streams. **Default: ''**
177+
- `enforce_append_only`: For changelog storage, enforce append-only query mode. **Default: false**

0 commit comments

Comments
 (0)