Skip to content

Conversation

@FidelSch
Copy link
Contributor

@FidelSch FidelSch commented Feb 9, 2026

Based on #10832

Avoids using splice on small files.
May result in a small perf improvement

Threshold set to 16KB, determined by trial and error:

 hyperfine -w 10 -i -L coreutils "target/release/coreutils_16kb","target/release/coreutils_32kb","target/release/coreutils_128kb","target/release/coreutils_base"  "{coreutils} cat /tmp/threshold_test/file_16K*" -N
Benchmark 1: target/release/coreutils_16kb cat /tmp/threshold_test/file_16K*
  Time (mean ± σ):       1.4 ms ±   0.5 ms    [User: 0.3 ms, System: 1.0 ms]
  Range (min … max):     0.8 ms …  11.2 ms    2089 runs
 
  Warning: Ignoring non-zero exit code.
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 2: target/release/coreutils_32kb cat /tmp/threshold_test/file_16K*
  Time (mean ± σ):       1.5 ms ±   0.4 ms    [User: 0.3 ms, System: 1.1 ms]
  Range (min … max):     0.8 ms …   3.3 ms    3211 runs
 
  Warning: Ignoring non-zero exit code.
 
Benchmark 3: target/release/coreutils_128kb cat /tmp/threshold_test/file_16K*
  Time (mean ± σ):       1.4 ms ±   0.4 ms    [User: 0.3 ms, System: 1.0 ms]
  Range (min … max):     0.7 ms …   4.1 ms    2657 runs
 
  Warning: Ignoring non-zero exit code.
 
Benchmark 4: target/release/coreutils_base cat /tmp/threshold_test/file_16K*
  Time (mean ± σ):       1.6 ms ±   0.5 ms    [User: 0.3 ms, System: 1.2 ms]
  Range (min … max):     0.7 ms …   5.1 ms    2113 runs
 
  Warning: Ignoring non-zero exit code.
 
Summary
  target/release/coreutils_16kb cat /tmp/threshold_test/file_16K* ran
    1.02 ± 0.48 times faster than target/release/coreutils_128kb cat /tmp/threshold_test/file_16K*
    1.14 ± 0.50 times faster than target/release/coreutils_32kb cat /tmp/threshold_test/file_16K*
    1.17 ± 0.53 times faster than target/release/coreutils_base cat /tmp/threshold_test/file_16K*

@FidelSch
Copy link
Contributor Author

FidelSch commented Feb 9, 2026

Still slower than GNU

@FidelSch FidelSch marked this pull request as draft February 9, 2026 17:14
@oech3
Copy link
Contributor

oech3 commented Feb 9, 2026

How about splice threshould + clap bypass?
Also please fix clippy.

@github-actions
Copy link

github-actions bot commented Feb 9, 2026

GNU testsuite comparison:

Congrats! The gnu test tests/tail/tail-n0f is now passing!

@sylvestre
Copy link
Contributor

please run the benchmark with /usr/bin/cat too

@FidelSch
Copy link
Contributor Author

FidelSch commented Feb 9, 2026

now running hyperfine using --show-output, as otherwise perf improvement does not make much sense when it comes to writes (every command takes about the same). Downside being it seems to introduce significant noise:

hyperfine -w 10  -L coreutils "target/release/cat_16k","target/release/cat_64k","target/release/cat_bypass","target/release/cat_base","target/release/cat_bypass_8k","/usr/bin/cat"  "{coreutils} /tmp/threshold_test/file_16K*" --show-output
Benchmark 1: target/release/cat_16k /tmp/threshold_test/file_16K*
  Time (mean ± σ):     347.5 ms ± 209.5 ms    [User: 1.3 ms, System: 2.0 ms]
  Range (min … max):   123.6 ms … 713.3 ms    10 runs
 
Benchmark 2: target/release/cat_64k /tmp/threshold_test/file_16K*
  Time (mean ± σ):     125.3 ms ± 134.1 ms    [User: 0.3 ms, System: 1.8 ms]
  Range (min … max):    68.3 ms … 704.3 ms    21 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 3: target/release/cat_bypass /tmp/threshold_test/file_16K*
  Time (mean ± σ):      77.5 ms ±  16.7 ms    [User: 0.4 ms, System: 0.9 ms]
  Range (min … max):    44.6 ms … 101.3 ms    33 runs
 
Benchmark 4: target/release/cat_base /tmp/threshold_test/file_16K*
  Time (mean ± σ):     109.1 ms ± 221.0 ms    [User: 0.3 ms, System: 1.1 ms]
  Range (min … max):    44.7 ms … 1710.4 ms    55 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 5: target/release/cat_bypass_8k /tmp/threshold_test/file_16K*
  Time (mean ± σ):      82.8 ms ±  23.6 ms    [User: 0.3 ms, System: 1.3 ms]
  Range (min … max):    52.9 ms … 154.0 ms    18 runs
 
Benchmark 6: /usr/bin/cat /tmp/threshold_test/file_16K*
  Time (mean ± σ):     142.9 ms ± 212.9 ms    [User: 0.3 ms, System: 1.2 ms]
  Range (min … max):    54.0 ms … 1130.2 ms    27 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Summary
  target/release/cat_bypass /tmp/threshold_test/file_16K* ran
    1.07 ± 0.38 times faster than target/release/cat_bypass_8k /tmp/threshold_test/file_16K*
    1.41 ± 2.87 times faster than target/release/cat_base /tmp/threshold_test/file_16K*
    1.62 ± 1.77 times faster than target/release/cat_64k /tmp/threshold_test/file_16K*
    1.84 ± 2.78 times faster than /usr/bin/cat /tmp/threshold_test/file_16K*
    4.48 ± 2.87 times faster than target/release/cat_16k /tmp/threshold_test/file_16K*

@oech3
Copy link
Contributor

oech3 commented Feb 10, 2026

Does this fix #9609 too?

@github-actions
Copy link

GNU testsuite comparison:

GNU test failed: tests/pr/bounded-memory. tests/pr/bounded-memory is passing on 'main'. Maybe you have to rebase?
Note: The gnu test tests/expand/bounded-memory is now being skipped but was previously passing.

@codspeed-hq
Copy link

codspeed-hq bot commented Feb 10, 2026

Merging this PR will not alter performance

✅ 284 untouched benchmarks
⏩ 38 skipped benchmarks1


Comparing FidelSch:cat-splice-perf (4102ac3) with main (ec7e81e)

Open in CodSpeed

Footnotes

  1. 38 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@FidelSch FidelSch marked this pull request as ready for review February 10, 2026 15:54
@FidelSch
Copy link
Contributor Author

Does this fix #9609 too?

As I understand, this should fix the specific example presented, but I frankly don't know if a similar case could arrive in the future. All it takes is for stat() to wrongly report some size larger that the defined threshold.

@github-actions
Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/tty/tty-eof (fails in this run but passes in the 'main' branch)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants