Uniq: improve performances #13249
Conversation
|
GNU testsuite comparison: |
Merging this PR will not alter performance
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | Memory | uniq_with_count[10000] |
141.9 KB | 262.9 KB | -46.03% |
| ❌ | Memory | uniq_case_insensitive[10000] |
141.9 KB | 262.9 KB | -46.03% |
| ❌ | Memory | uniq_check_chars[(10000, "1")] |
142.4 KB | 263.4 KB | -45.94% |
| ❌ | Memory | uniq_check_chars[(10000, "512")] |
142.4 KB | 263.4 KB | -45.94% |
| ❌ | Memory | uniq_heavy_duplicates[10000] |
190.3 KB | 311.3 KB | -38.87% |
| ⚡ | Simulation | uniq_check_chars[(10000, "1")] |
14.5 ms | 2.3 ms | ×6.4 |
| ⚡ | Simulation | uniq_check_chars[(10000, "512")] |
18.4 ms | 6.3 ms | ×2.9 |
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing sylvestre:uniq-perf (6dc2c35) with main (51529dc)
Footnotes
-
46 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
| @@ -274,21 +295,20 @@ impl Uniq { | |||
| write_line_terminator!(writer, line_terminator)?; | |||
There was a problem hiding this comment.
Is there some special reason line_terminator is written differently here?
There was a problem hiding this comment.
sorry, what do you mean ? :)
There was a problem hiding this comment.
Here write_line_terminator! is used whereas on line 307 line_out.push(line_terminator); replaces the use of write_line_terminator!.
is_c_locale() was called on every line inside key_end_index() when -w/--check-chars is set, doing up to 3 std::env::var_os() lookups each time. Locale env vars can't change mid-process, so this was pure per-line overhead, causing uniq -w to be ~5x slower than GNU uniq even for small -w values. Compute is_c_locale() once at startup and cache it on the Uniq struct instead. Fixes uutils#13199
write_line() issued two separate write_all() calls per output line (line content, then the terminator byte), each going through the dynamically-dispatched Box<dyn Write> from open_output_file(). Merge them into a single write via a reused scratch buffer. Also match the input BufReader's capacity to the existing 128KB output buffer (previously the 8KB std default), for consistency. Measured on a 20x-repeated /usr/share/dict/words (~80MB, pinned to one CPU core to reduce noise): -w 1 dropped from 429.7ms to 395.5ms (~8%), -w 512 from 578.9ms to 497.8ms (~14%).
Closes: #13199