Single-pass writeString fast path for short strings in ByteBuffersDataOutput by neoremind · Pull Request #16280 · apache/lucene

neoremind · 2026-06-21T16:14:07Z

Background

In #13863, ByteBuffersDataOutput.writeString() was optimized to avoid allocating BytesRef and copying bytes to the dest buffer, instead it encoded directly in place. Indeed, it requires two passes over the input string chars: first calcUTF16toUTF8Length to get the VInt length prefix, then UTF16toUTF8 for the utf8 encoding. The opportunity is: for short strings, we can save that first pass.

What this PR does

This PR adds a single-pass fast path for short strings (charCount <= 42) where the max UTF-8 byte length is 42 * 3 = 126, it always fits as 1-byte VInt. So we know the VInt prefix size without needing to go over the string chars upfront. Reserve 1 byte, encode directly into the dest buffer, then backfill the length. For strings that don't hit the shortcut, fall to existing logic.

To my understanding, this could benefit stored fields writes of short strings like business related keywords, IDs, titles, etc. Plus short strings like field infos, codec metadata, segment names, etc.

Benchmarks

I added a JMH benchmark comparing the new impl against the current across ASCII, CJK, and Latin-extended strings at various lengths, see here for keeping the current impl to do apple-to-apple compare. Target written byte size matches stored fields chunk sizes: 80KB (BEST_SPEED default), 480KB (BEST_COMPRESSION default), and 2MB (imagine customized larger chunk in store fields .fdt). The benchmark uses a resettable ByteBuffersDataOutput starting with 1KB blocks to mimic real-world workload.

Results show notable gains on short strings with no regressions on medium/long/very large strings (only acceptable jitter as I saw) which fall to the unchanged logic.

Throughput in ops/s. Each run writes target written byte size into the buffer. Measured on EC2 m5.2xlarge.

See detailed results


Benchmark                                               (stringType)  (targetBytes)   Mode  Cnt      Score     Error  Units
ByteBuffersDataOutputWriteStringBenchmark.newImpl            ascii_1          81920  thrpt   15   1924.154 ±   3.998  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl            ascii_1         491520  thrpt   15    325.054 ±   0.712  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl            ascii_1        2097152  thrpt   15     77.335 ±   0.249  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl           ascii_10          81920  thrpt   15   5127.397 ± 124.657  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl           ascii_10         491520  thrpt   15    894.737 ±   4.701  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl           ascii_10        2097152  thrpt   15    206.414 ±   2.523  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl           ascii_20          81920  thrpt   15   7907.056 ±  28.022  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl           ascii_20         491520  thrpt   15   1374.817 ±   4.420  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl           ascii_20        2097152  thrpt   15    325.101 ±   0.932  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl           ascii_30          81920  thrpt   15   9654.601 ±  40.498  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl           ascii_30         491520  thrpt   15   1764.192 ±   6.306  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl           ascii_30        2097152  thrpt   15    416.434 ±   1.790  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl           ascii_40          81920  thrpt   15  10563.802 ±  30.043  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl           ascii_40         491520  thrpt   15   1891.552 ±   4.140  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl           ascii_40        2097152  thrpt   15    449.588 ±   4.443  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl       ascii_medium          81920  thrpt   15   9263.776 ±  98.204  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl       ascii_medium         491520  thrpt   15   1514.433 ±   0.863  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl       ascii_medium        2097152  thrpt   15    356.831 ±   0.588  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl         ascii_long          81920  thrpt   15  12117.442 ± 424.084  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl         ascii_long         491520  thrpt   15   2114.019 ±   2.865  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl         ascii_long        2097152  thrpt   15    503.861 ±   5.616  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl       ascii_vlarge          81920  thrpt   15  11603.539 ±  28.604  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl       ascii_vlarge         491520  thrpt   15   2050.525 ±   1.159  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl       ascii_vlarge        2097152  thrpt   15    519.435 ±   5.892  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl              cjk_1          81920  thrpt   15   3598.613 ±  27.463  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl              cjk_1         491520  thrpt   15    589.760 ±   2.930  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl              cjk_1        2097152  thrpt   15    142.267 ±   1.822  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl             cjk_10          81920  thrpt   15   6516.930 ± 155.093  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl             cjk_10         491520  thrpt   15   1124.501 ±  51.999  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl             cjk_10        2097152  thrpt   15    268.392 ±  10.699  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl             cjk_20          81920  thrpt   15   7444.068 ±  28.467  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl             cjk_20         491520  thrpt   15   1251.821 ±  63.880  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl             cjk_20        2097152  thrpt   15    316.346 ±   4.879  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl             cjk_30          81920  thrpt   15   7735.062 ±  33.040  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl             cjk_30         491520  thrpt   15   1369.589 ±  23.248  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl             cjk_30        2097152  thrpt   15    310.114 ±  12.392  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl             cjk_40          81920  thrpt   15   7861.299 ±  44.006  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl             cjk_40         491520  thrpt   15   1426.798 ±   1.373  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl             cjk_40        2097152  thrpt   15    328.560 ±   8.392  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl         cjk_medium          81920  thrpt   15   5302.579 ±  67.898  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl         cjk_medium         491520  thrpt   15    829.204 ±   5.262  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl         cjk_medium        2097152  thrpt   15    210.442 ±   0.308  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl           cjk_long          81920  thrpt   15   5704.934 ± 119.140  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl           cjk_long         491520  thrpt   15    934.739 ±  31.456  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl           cjk_long        2097152  thrpt   15    211.968 ±   3.531  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl         cjk_vlarge          81920  thrpt   15   6736.329 ± 244.534  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl         cjk_vlarge         491520  thrpt   15    927.611 ±  12.725  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl         cjk_vlarge        2097152  thrpt   15    231.230 ±   4.009  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl        latin_ext_1          81920  thrpt   15   2330.881 ±  32.202  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl        latin_ext_1         491520  thrpt   15    398.409 ±   5.090  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl        latin_ext_1        2097152  thrpt   15     93.175 ±   1.428  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl       latin_ext_10          81920  thrpt   15   4296.039 ±  48.292  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl       latin_ext_10         491520  thrpt   15    748.831 ±   5.288  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl       latin_ext_10        2097152  thrpt   15    178.731 ±   2.817  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl       latin_ext_20          81920  thrpt   15   4953.465 ±  80.963  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl       latin_ext_20         491520  thrpt   15    859.932 ±  27.221  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl       latin_ext_20        2097152  thrpt   15    206.179 ±   6.109  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl       latin_ext_30          81920  thrpt   15   5053.684 ± 232.941  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl       latin_ext_30         491520  thrpt   15    878.187 ±  10.097  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl       latin_ext_30        2097152  thrpt   15    208.340 ±   1.234  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl       latin_ext_40          81920  thrpt   15   4932.669 ±   9.067  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl       latin_ext_40         491520  thrpt   15    962.194 ±  57.633  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl       latin_ext_40        2097152  thrpt   15    216.052 ±   2.011  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl   latin_ext_medium          81920  thrpt   15   3523.366 ±  14.522  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl   latin_ext_medium         491520  thrpt   15    593.160 ±   3.174  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl   latin_ext_medium        2097152  thrpt   15    138.684 ±   0.154  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl     latin_ext_long          81920  thrpt   15   3652.496 ±  86.858  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl     latin_ext_long         491520  thrpt   15    630.856 ±  23.506  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl     latin_ext_long        2097152  thrpt   15    152.758 ±   5.463  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl   latin_ext_vlarge          81920  thrpt   15   4227.879 ±   7.569  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl   latin_ext_vlarge         491520  thrpt   15    633.812 ±   1.601  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl   latin_ext_vlarge        2097152  thrpt   15    148.096 ±   0.526  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl              mixed          81920  thrpt   15   2610.423 ±   8.035  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl              mixed         491520  thrpt   15    526.189 ±  11.442  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl              mixed        2097152  thrpt   15    117.501 ±   5.147  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl           ascii_1          81920  thrpt   15   1449.904 ±   0.730  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl           ascii_1         491520  thrpt   15    237.547 ±   0.981  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl           ascii_1        2097152  thrpt   15     55.849 ±   0.035  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl          ascii_10          81920  thrpt   15   3632.715 ±   7.330  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl          ascii_10         491520  thrpt   15    608.009 ±   1.032  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl          ascii_10        2097152  thrpt   15    143.089 ±   0.086  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl          ascii_20          81920  thrpt   15   5513.255 ±  16.047  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl          ascii_20         491520  thrpt   15    939.471 ±   0.893  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl          ascii_20        2097152  thrpt   15    221.746 ±   0.437  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl          ascii_30          81920  thrpt   15   6810.637 ±  33.651  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl          ascii_30         491520  thrpt   15   1180.119 ±   2.552  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl          ascii_30        2097152  thrpt   15    276.847 ±   0.688  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl          ascii_40          81920  thrpt   15   7800.776 ±  14.315  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl          ascii_40         491520  thrpt   15   1310.465 ±   2.490  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl          ascii_40        2097152  thrpt   15    311.610 ±   0.348  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl      ascii_medium          81920  thrpt   15   9042.239 ±  37.124  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl      ascii_medium         491520  thrpt   15   1470.004 ±   5.105  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl      ascii_medium        2097152  thrpt   15    346.409 ±   0.763  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl        ascii_long          81920  thrpt   15  10884.157 ±  32.714  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl        ascii_long         491520  thrpt   15   2047.124 ±   3.786  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl        ascii_long        2097152  thrpt   15    485.906 ±   0.356  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl      ascii_vlarge          81920  thrpt   15  11570.370 ±  10.070  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl      ascii_vlarge         491520  thrpt   15   2070.484 ±   1.673  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl      ascii_vlarge        2097152  thrpt   15    506.705 ±  11.358  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl             cjk_1          81920  thrpt   15   2732.453 ±  18.110  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl             cjk_1         491520  thrpt   15    473.930 ±  11.438  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl             cjk_1        2097152  thrpt   15    109.360 ±   2.644  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl            cjk_10          81920  thrpt   15   4078.860 ± 229.551  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl            cjk_10         491520  thrpt   15    729.199 ±  42.046  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl            cjk_10        2097152  thrpt   15    163.849 ±   0.211  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl            cjk_20          81920  thrpt   15   4728.439 ± 108.248  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl            cjk_20         491520  thrpt   15    756.027 ±  28.522  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl            cjk_20        2097152  thrpt   15    180.958 ±  11.565  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl            cjk_30          81920  thrpt   15   4945.852 ± 123.435  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl            cjk_30         491520  thrpt   15    853.268 ±   4.967  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl            cjk_30        2097152  thrpt   15    199.801 ±   0.083  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl            cjk_40          81920  thrpt   15   5080.684 ± 114.575  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl            cjk_40         491520  thrpt   15    872.155 ±   0.935  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl            cjk_40        2097152  thrpt   15    198.099 ±   5.012  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl        cjk_medium          81920  thrpt   15   5114.304 ±  16.729  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl        cjk_medium         491520  thrpt   15    836.790 ±   3.880  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl        cjk_medium        2097152  thrpt   15    193.791 ±  14.359  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl          cjk_long          81920  thrpt   15   5636.091 ±  96.048  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl          cjk_long         491520  thrpt   15    899.898 ±   4.430  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl          cjk_long        2097152  thrpt   15    211.120 ±   0.845  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl        cjk_vlarge          81920  thrpt   15   6610.988 ± 368.882  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl        cjk_vlarge         491520  thrpt   15    897.061 ±  15.893  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl        cjk_vlarge        2097152  thrpt   15    226.848 ±   9.797  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl       latin_ext_1          81920  thrpt   15   1707.395 ±  20.488  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl       latin_ext_1         491520  thrpt   15    290.791 ±   0.661  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl       latin_ext_1        2097152  thrpt   15     68.084 ±   0.438  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl      latin_ext_10          81920  thrpt   15   2562.599 ±  27.365  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl      latin_ext_10         491520  thrpt   15    437.844 ±   3.480  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl      latin_ext_10        2097152  thrpt   15    103.573 ±   0.355  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl      latin_ext_20          81920  thrpt   15   2849.567 ±   5.463  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl      latin_ext_20         491520  thrpt   15    488.922 ±   4.148  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl      latin_ext_20        2097152  thrpt   15    114.500 ±   0.159  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl      latin_ext_30          81920  thrpt   15   3112.005 ± 104.903  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl      latin_ext_30         491520  thrpt   15    519.170 ±   1.386  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl      latin_ext_30        2097152  thrpt   15    125.173 ±   4.172  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl      latin_ext_40          81920  thrpt   15   3159.485 ±  13.467  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl      latin_ext_40         491520  thrpt   15    545.461 ±  10.699  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl      latin_ext_40        2097152  thrpt   15    129.708 ±   4.595  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl  latin_ext_medium          81920  thrpt   15   3521.568 ±   4.052  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl  latin_ext_medium         491520  thrpt   15    604.327 ±  17.521  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl  latin_ext_medium        2097152  thrpt   15    138.913 ±   0.268  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl    latin_ext_long          81920  thrpt   15   3583.787 ±  28.151  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl    latin_ext_long         491520  thrpt   15    619.880 ±   9.109  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl    latin_ext_long        2097152  thrpt   15    156.162 ±   0.251  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl  latin_ext_vlarge          81920  thrpt   15   4230.539 ±  11.689  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl  latin_ext_vlarge         491520  thrpt   15    636.914 ±   1.179  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl  latin_ext_vlarge        2097152  thrpt   15    147.291 ±   0.189  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl             mixed          81920  thrpt   15   2569.503 ±  34.528  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl             mixed         491520  thrpt   15    471.877 ±  13.853  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl             mixed        2097152  thrpt   15    111.679 ±   0.714  ops/s

80KB target (BEST_SPEED chunk size)

String Type	New	Prev	Delta
ascii_1	1924	1478	+30%
ascii_10	5127	3633	+41%
ascii_20	7907	5513	+43%
ascii_30	9655	6811	+42%
ascii_40	10564	7801	+35%
ascii_medium	9264	9042	+2%
ascii_long	12117	10884	+11%
ascii_vlarge	11604	11570	0%
cjk_1	3599	2732	+32%
cjk_10	6517	4079	+60%
cjk_20	7444	4728	+57%
cjk_30	7735	4946	+56%
cjk_40	7861	5081	+55%
cjk_medium	5303	5114	+4%
cjk_long	5705	5636	+1%
cjk_vlarge	6736	6611	+2%
latin_ext_1	2331	1707	+37%
latin_ext_10	4296	2563	+68%
latin_ext_20	4953	2850	+74%
latin_ext_30	5054	3112	+62%
latin_ext_40	4933	3159	+56%
latin_ext_medium	3523	3522	0%
latin_ext_long	3652	3584	+2%
latin_ext_vlarge	4228	4231	0%
mixed	2610	2570	+2%

480KB target (BEST_COMPRESSION chunk size)

String Type	New	Prev	Delta
ascii_1	325	238	+37%
ascii_10	895	608	+47%
ascii_20	1375	939	+46%
ascii_30	1764	1180	+49%
ascii_40	1892	1310	+44%
ascii_medium	1514	1470	+3%
ascii_long	2114	2047	+3%
ascii_vlarge	2051	2070	−1%
cjk_1	590	474	+24%
cjk_10	1125	729	+54%
cjk_20	1252	756	+66%
cjk_30	1370	853	+61%
cjk_40	1427	872	+64%
cjk_medium	829	837	−1%
cjk_long	935	900	+4%
cjk_vlarge	928	897	+3%
latin_ext_1	398	291	+37%
latin_ext_10	749	438	+71%
latin_ext_20	860	489	+76%
latin_ext_30	878	519	+69%
latin_ext_40	962	545	+76%
latin_ext_medium	593	604	−2%
latin_ext_long	631	620	+2%
latin_ext_vlarge	634	637	0%
mixed	526	472	+12%

2MB target (larger workload)

String Type	New	Prev	Delta
ascii_1	77	56	+38%
ascii_10	206	143	+44%
ascii_20	325	222	+47%
ascii_30	416	277	+50%
ascii_40	450	312	+44%
ascii_medium	357	346	+3%
ascii_long	504	486	+4%
ascii_vlarge	519	507	+3%
cjk_1	142	109	+30%
cjk_10	268	164	+64%
cjk_20	316	181	+75%
cjk_30	310	200	+55%
cjk_40	329	198	+66%
cjk_medium	210	194	+9%
cjk_long	212	211	0%
cjk_vlarge	231	227	+2%
latin_ext_1	93	68	+37%
latin_ext_10	179	104	+73%
latin_ext_20	206	115	+80%
latin_ext_30	208	125	+66%
latin_ext_40	216	130	+67%
latin_ext_medium	139	139	0%
latin_ext_long	153	156	−2%
latin_ext_vlarge	148	147	+1%
mixed	118	112	+5%

More thoughts

I initially attempted a more aggressive approach: adding a second fast path also for 2-byte VInt (charCount 128–5461) and a calcVIntSizeForUTF8Length utility method with early-exit scanning for ambiguous ranges. This showed strong wins for almost all setups but for configurations with larger block sizes or larger target written size (enlarged docs per chunk or chunk size). But for the default settings (80KB chunk / 1024 docs), there is one ~5% regression on ascii_medium, plus it introduced extra branches, more complex logic. So I kept it simple: only the 1-byte VInt fast path. The code is straightforward, easy to read, and no regressions for all cases.

…teBuffersDataOutput

neoremind · 2026-06-26T15:35:47Z

@dweiss would you mind taking a look? It's a small change that improves performance. Many thanks!

Apply fast path for short strings (charCount <= 42) writeString in By…

15e1615

…teBuffersDataOutput

github-actions Bot added the module:core/store label Jun 21, 2026

Update CHANGES.txt

9655f73

github-actions Bot added this to the 10.6.0 milestone Jun 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Single-pass writeString fast path for short strings in ByteBuffersDataOutput#16280

Single-pass writeString fast path for short strings in ByteBuffersDataOutput#16280
neoremind wants to merge 2 commits into
apache:mainfrom
neoremind:bbo_writestring_fast_path_pr

neoremind commented Jun 21, 2026

Uh oh!

neoremind commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

neoremind commented Jun 21, 2026

Background

What this PR does

Benchmarks

80KB target (BEST_SPEED chunk size)

480KB target (BEST_COMPRESSION chunk size)

2MB target (larger workload)

More thoughts

Uh oh!

neoremind commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant