Skip to content

uucore: Remove ARGV cache#13276

Open
AnuthaDev wants to merge 1 commit into
uutils:mainfrom
AnuthaDev:arghh-alloc
Open

uucore: Remove ARGV cache#13276
AnuthaDev wants to merge 1 commit into
uutils:mainfrom
AnuthaDev:arghh-alloc

Conversation

@AnuthaDev

@AnuthaDev AnuthaDev commented Jul 4, 2026

Copy link
Copy Markdown
Contributor

Summary

uucore kept the CLI arguments cached in a static variable, stating that repeatedly calling env::args_os is expensive. However env::args_os is only called once during the binary lifecycle and storing the ARGV used extra memory without any benefit. This PR removes that cache to reduce memory allocation.

Rationale

The cache bought nothing and cost memory:

  • args_os() is called once per process: bin_inner! passes it into uumain, and the multicall coreutils main reads it once to dispatch. Caching a once-used value for the process lifetime is pure overhead.
  • The OS/libc keeps the original argv bytes alive for the whole process anyway. The static Vec<OsString> was a permanent duplicate — one heap allocation per argument plus the vector itself, up to ARG_MAX-sized invocations (e.g. rm fed by xargs).
  • The old flow actually copied argv twice: once into the static, then again element-by-element (.cloned()) when handing the iterator to clap. The new flow makes a single transient copy that is consumed by argument parsing and freed.

Changes

  • src/uucore/src/lib/lib.rs
    • Removed static ARGV and the LazyLock statics for UTIL_NAME / EXECUTION_PHRASE; replaced with OnceLocks.
    • Added init_util_name() / init_execution_phrase(). Both are first-write-wins (later calls are ignored), so harnesses that invoke uumain repeatedly in one process — the fuzz targets — are safe.
    • util_name() / execution_phrase() use get_or_init with a fallback that reproduces the old lazy derivation for un-wired callers.
    • bin_inner! (every standalone binary's main) peeks argv[0] from its single args_os() read and initializes both values before calling uumain. Deriving from argv[0] preserves the existing symlink/rename behavior (a binary copied to nap still reports nap: in errors).
    • args_os() / args_os_filtered() are now thin wrappers over std::env::args_os() (wild::args_os() on Windows); doc comments updated to say each call copies argv.
  • src/bin/coreutils.rs (multicall): initializes both values in the dispatch arm — from the raw binary path when matched via argv[0] (symlink case), or from <binary> <util> when the utility is the second argument, preserving usage strings like Try './coreutils ls --help' byte-for-byte.
  • src/bin/uudoc.rs: manpage generation initializes the execution phrase in main (where raw argv is available) and the utility name in gen_manpage after clap validates the utility, replacing the old manpage-skip hack for the wired path.

set_utility_is_second_arg() and its getter are retained: the multicall binary and uudoc still set the flag, and the lazy fallback derivation depends on it for un-wired callers.

Breaking changes

No API removals — the new init_* functions are additive, and util_name() / execution_phrase() / args_os() keep their signatures. Two semantic changes for downstream consumers of the uucore crate:

  1. uucore::args_os() is no longer cheap to call repeatedly. It previously returned clones from a process-lifetime cache; it now copies all of argv (and re-expands globs via wild on Windows) on every call. Call it once and reuse the result. In-tree callers were already single-call.
  2. util_name() / execution_phrase() can now be pinned by the embedding binary. If a downstream binary calls the new init_* functions, those values win over derivation from argv. Callers that never init see identical behavior to before (same derivation, now computed on first use without the persistent argv copy).

Measurements

  1. Create a test input file: printf "00 %.0s" {1..100000} > args.txt
  2. Run the binary:
valgrind --tool=dhat ./target/release/coreutils echo $(cat args.txt) > /dev/null

main:

Total:     5,286,993 bytes in 200,206 blocks
At t-gmax: 5,000,674 bytes in 100,006 blocks
At t-end:  2,652,146 bytes in 100,083 blocks
Reads:     10,776,549 bytes
Writes:    7,975,882 bytes

This PR:

Total:     2,687,044 bytes in 100,207 blocks
At t-gmax: 2,663,057 bytes in 100,147 blocks
At t-end:  52,103 bytes in 82 blocks
Reads:     10,576,682 bytes
Writes:    5,375,912 bytes

The savings scale with argv: total allocations drop by one full argv copy (bytes and one block per argument), and the duplicate no longer persists for the process lifetime — which previously lasted arbitrarily long for utilities like sleep, tail, or dd and could reach ARG_MAX (~2 MB) for glob- or xargs-fed invocations of ls, rm, cp, etc.

Testing

  • All existing tests pass

@sylvestre sylvestre left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the AI comment #0 isn't useful.
Please write it for a human as I am not an AI

Comment thread src/bin/coreutils.rs Outdated
@sylvestre

Copy link
Copy Markdown
Contributor

Please update comment 0

@github-actions

github-actions Bot commented Jul 4, 2026

Copy link
Copy Markdown

GNU testsuite comparison:

Skip an intermittent issue tests/tail/symlink (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/date/resolution (passes in this run but fails in the 'main' branch)

@codspeed-hq

codspeed-hq Bot commented Jul 4, 2026

Copy link
Copy Markdown

Merging this PR will improve performance by 5.16%

⚠️ Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

⚡ 6 improved benchmarks
✅ 325 untouched benchmarks
⏩ 46 skipped benchmarks1

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation hostname_ip_lookup[100000] 109.1 µs 100.5 µs +8.48%
Simulation complex_relative_date 146.1 µs 135 µs +8.22%
Simulation sort_ascii_utf8_locale 16.1 ms 15.5 ms +4.42%
Simulation unexpand_large_file[10] 291.9 ms 282.3 ms +3.41%
Simulation unexpand_many_lines[100000] 139.3 ms 134.7 ms +3.41%
Simulation single_date_now 85.4 µs 82.8 µs +3.18%

Tip

Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.


Comparing AnuthaDev:arghh-alloc (a952d5e) with main (51529dc)

Open in CodSpeed

Footnotes

  1. 46 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@AnuthaDev

Copy link
Copy Markdown
Contributor Author

Updated the Summary section in description (if that's what you meant by comment 0)

@sylvestre

Copy link
Copy Markdown
Contributor

this one:
#13276 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants