Skip to content

Commit 016b076

Browse files
QianyongYcxzl25
authored andcommitted
ORC-2131: Set default of orc.stripe.size.check.ratio and orc.dictionary.max.size.bytes to 0
### What changes were proposed in this pull request? Set default of `orc.stripe.size.check.ratio` and `orc.dictionary.max.size.bytes` to 0 ### Why are the changes needed? After enabling the optimizations related to orc.stripe.size.check.ratio and orc.dictionary.max.size.bytes, we observed that ORC files written with the current defaults are about 10%–20% larger than before. For example, datasets that were previously ~1.0–1.1 TB grow to ~1.2 TB with the current defaults, causing noticeable storage cost increase. ### How was this patch tested? Local test With orc.dictionary.max.size.bytes=16777216 or orc.stripe.size.check.ratio=2.0, the written ORC data grows to 1.2 TB (data inflation). ```shell 1 6665 1300347279057 hdfs://ns/user/hive/warehouse/tmp_sandbox_xxx.db/tmp_test_123_2/d=2026-03-15 ``` With orc.dictionary.max.size.bytes=0 and orc.stripe.size.check.ratio=0.0, the data size remains at the expected 1.0 TB. ```shell 1 6665 1143347882367 hdfs://ns/user/hive/warehouse/tmp_sandbox_xxx.db/tmp_test_123_1/d=2026-03-15 ``` ### Was this patch authored or co-authored using generative AI tooling? No Closes #2580 from QianyongY/features/ORC-2131. Authored-by: yongqian <yongqian@trip.com> Signed-off-by: Shaoyun Chen <csy@apache.org>
1 parent 1d51a8b commit 016b076

2 files changed

Lines changed: 4 additions & 4 deletions

File tree

java/core/src/java/org/apache/orc/OrcConf.java

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -121,7 +121,7 @@ public enum OrcConf {
121121
"dictionary encoding. Use 1 to always use dictionary encoding."),
122122
DICTIONARY_MAX_SIZE_IN_BYTES("orc.dictionary.max.size.bytes",
123123
"orc.dictionary.max.size.bytes",
124-
16 * 1024 * 1024,
124+
0,
125125
"If the total size of the dictionary is greater than this\n" +
126126
", turn off dictionary encoding. Use 0 to disable this check."),
127127
ROW_INDEX_STRIDE_DICTIONARY_CHECK("orc.dictionary.early.check",
@@ -190,7 +190,7 @@ public enum OrcConf {
190190
+ " Use orc.stripe.row.count instead if the value larger than orc.stripe.row.count."),
191191
STRIPE_SIZE_CHECKRATIO("orc.stripe.size.check.ratio",
192192
"orc.stripe.size.check.ratio",
193-
2.0,
193+
0.0,
194194
"Flush stripe if the tree writer size in bytes is larger than (this * orc.stripe.size). " +
195195
"Use 0 to disable this check."),
196196
OVERWRITE_OUTPUT_FILE("orc.overwrite.output.file", "orc.overwrite.output.file", false,

site/_docs/core-java-config.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -167,7 +167,7 @@ permalink: /docs/core-java-config.html
167167
</tr>
168168
<tr>
169169
<td><code>orc.dictionary.max.size.bytes</code></td>
170-
<td>16777216</td>
170+
<td>0</td>
171171
<td>
172172
If the total size of the dictionary is greater than this, turn off dictionary encoding. Use 0 to disable this check.
173173
</td>
@@ -293,7 +293,7 @@ permalink: /docs/core-java-config.html
293293
</tr>
294294
<tr>
295295
<td><code>orc.stripe.size.check.ratio</code></td>
296-
<td>2.0</td>
296+
<td>0.0</td>
297297
<td>
298298
Flush stripe if the tree writer size in bytes is larger than (this * orc.stripe.size). Use 0 to disable this check.
299299
</td>

0 commit comments

Comments
 (0)