Skip to content

Uniq: improve performances #13249

Open
sylvestre wants to merge 2 commits into
uutils:mainfrom
sylvestre:uniq-perf
Open

Uniq: improve performances #13249
sylvestre wants to merge 2 commits into
uutils:mainfrom
sylvestre:uniq-perf

Conversation

@sylvestre

Copy link
Copy Markdown
Contributor

Closes: #13199

@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown

GNU testsuite comparison:

Skip an intermittent issue tests/date/date-locale-hour (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/date/resolution (passes in this run but fails in the 'main' branch)
Congrats! The gnu test tests/pr/bounded-memory is now passing!

@codspeed-hq

codspeed-hq Bot commented Jul 2, 2026

Copy link
Copy Markdown

Merging this PR will not alter performance

⚠️ Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

⚡ 2 improved benchmarks
❌ 5 regressed benchmarks
✅ 324 untouched benchmarks
⏩ 46 skipped benchmarks1

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Memory uniq_with_count[10000] 141.9 KB 262.9 KB -46.03%
Memory uniq_case_insensitive[10000] 141.9 KB 262.9 KB -46.03%
Memory uniq_check_chars[(10000, "1")] 142.4 KB 263.4 KB -45.94%
Memory uniq_check_chars[(10000, "512")] 142.4 KB 263.4 KB -45.94%
Memory uniq_heavy_duplicates[10000] 190.3 KB 311.3 KB -38.87%
Simulation uniq_check_chars[(10000, "1")] 14.5 ms 2.3 ms ×6.4
Simulation uniq_check_chars[(10000, "512")] 18.4 ms 6.3 ms ×2.9

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.


Comparing sylvestre:uniq-perf (6dc2c35) with main (51529dc)

Open in CodSpeed

Footnotes

  1. 46 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

Comment thread src/uu/uniq/src/uniq.rs
@@ -274,21 +295,20 @@ impl Uniq {
write_line_terminator!(writer, line_terminator)?;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some special reason line_terminator is written differently here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, what do you mean ? :)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here write_line_terminator! is used whereas on line 307 line_out.push(line_terminator); replaces the use of write_line_terminator!.

sylvestre added 2 commits July 4, 2026 13:56
is_c_locale() was called on every line inside key_end_index() when
-w/--check-chars is set, doing up to 3 std::env::var_os() lookups
each time. Locale env vars can't change mid-process, so this was
pure per-line overhead, causing uniq -w to be ~5x slower than GNU
uniq even for small -w values.

Compute is_c_locale() once at startup and cache it on the Uniq
struct instead.

Fixes uutils#13199
write_line() issued two separate write_all() calls per output line
(line content, then the terminator byte), each going through the
dynamically-dispatched Box<dyn Write> from open_output_file(). Merge
them into a single write via a reused scratch buffer.

Also match the input BufReader's capacity to the existing 128KB
output buffer (previously the 8KB std default), for consistency.

Measured on a 20x-repeated /usr/share/dict/words (~80MB, pinned to
one CPU core to reduce noise): -w 1 dropped from 429.7ms to 395.5ms
(~8%), -w 512 from 578.9ms to 497.8ms (~14%).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf(uniq): -w benchmark

2 participants