Skip to content

Commit a0cf012

Browse files
committed
add doc
1 parent 3e013a3 commit a0cf012

1 file changed

Lines changed: 7 additions & 0 deletions

File tree

docs/docs/flink-maintenance.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -219,6 +219,7 @@ env.execute("Table Maintenance Job");
219219
| `maxRewriteBytes(long)` | Maximum bytes to rewrite per execution | Long.MAX_VALUE | long |
220220
| `filter(Expression)` | Filter expression for selecting files to rewrite | Expressions.alwaysTrue() | Expression |
221221
| `maxFileGroupInputFiles(long)` | Maximum allowed number of input files within a file group | Long.MAX_VALUE | long |
222+
| `openParquetMerge(boolean)` | For Parquet tables, `rewriteDataFiles` can use an optimized row-group level merge strategy that is significantly faster than the standard read-rewrite approach. This optimization directly copies row groups without deserialization and re-serialization. | false | boolean |
222223

223224
#### DeleteOrphanFiles Configuration
224225

@@ -398,6 +399,12 @@ These keys are used in SQL (SET or table WITH options) and are applicable when w
398399
- Enable `partialProgressEnabled` for large rewrite operations
399400
- Set reasonable `maxRewriteBytes` limits
400401
- Setting an appropriate `maxFileGroupSizeBytes` can break down large FileGroups into smaller ones, thereby increasing the speed of parallel processing
402+
- For Parquet tables, `rewriteDataFiles` can open parquet merge, use an optimized row-group level merge strategy that is significantly faster than the standard read-rewrite approach. This optimization is applied when the following requirements are met:
403+
- * All files are in Parquet format
404+
- * Files have compatible schemas
405+
- * Files are not encrypted
406+
- * Files do not have associated delete files or delete vectors
407+
- * Table does not have a sort order (including z-ordered tables)
401408

402409
### Troubleshooting
403410

0 commit comments

Comments
 (0)