Background
dotnet/android carries a fork of LZ4, dotnet/lz4 (a fork of lz4/lz4), and also uses the K4os.Compression.LZ4 NuGet package on the build (host) side. LZ4 is used in two independent places:
- Assembly store compression — assemblies packaged inside the app are LZ4-compressed and decompressed by the runtime at app startup.
- Fast deployment ("fastdev") — during
dotnet build -t:Install (Debug), changed assemblies are LZ4-compressed on the host, streamed to the device, and decompressed there.
This issue collects what we measured about both, lays out the variables that matter, and lists possible directions. It is intentionally open-ended — there is no proposed decision here. The two areas have different constraints and different possible paths, so they are split into two blocks below.
Where LZ4 lives today (for reference):
- Assembly store, runtime decompress:
src/native/clr/host/assembly-store.cc (CoreCLR) and src/native/mono/monodroid/embedded-assemblies.cc (Mono), via LZ4_decompress_safe from external/lz4, guarded by #if defined(HAVE_LZ4) && defined(RELEASE).
- Assembly store, build-side compress:
src/Xamarin.Android.Build.Tasks/Utilities/AssemblyCompression.cs (K4os.Compression.LZ4), gated by the AndroidEnableAssemblyCompression MSBuild property (default true).
- Fastdev, device-side decompress:
tools/fastdev/xamarin.sync/main.cc (external/lz4).
- Fastdev, host-side compress:
src/Xamarin.Android.Build.Debugging.Tasks/Tasks/FastDeploy.cs (K4os.Compression.LZ4, level L03_HC).
So both external/lz4 and K4os.Compression.LZ4 have two consumers each. Removing the dependency would require addressing both areas.
Measurement environment for everything below: a stock dotnet new maui app, .NET 11 preview 5, net11.0-android, CoreCLR runtime, android-arm64, no AOT/R2R. On-device numbers are from a low-end Samsung Galaxy A16 (SM-A165F). Single device, single app — directional, not definitive.
Block 1 — Assembly store compression
What it is
When AndroidEnableAssemblyCompression is true (the default), each managed assembly is LZ4-compressed (the XALZ block format) and packed into lib/<abi>/libassembly-store.so. At app startup the runtime decompresses each assembly into memory (LZ4_decompress_safe). With compression false, the assemblies are stored raw and can be used/mapped without a decompression step.
Variables that matter
android:extractNativeLibs — the single most important variable, and the source of a real annoyance (below).
- Where the size is paid: inside the APK/AAB (download), and on-disk after install.
- Startup CPU spent decompressing assemblies on every cold start.
- RAM — LZ4'd assemblies are decompressed into memory; uncompressed assemblies can be demand-paged/mapped.
- Decompression speed of the algorithm (directly affects startup).
The extractNativeLibs / Play Store annoyance
libassembly-store.so is a native library, so it is subject to android:extractNativeLibs:
- With
extractNativeLibs=true (dotnet/android's current default), the APK zip itself DEFLATE-compresses the store. So the APK's own compression already shrinks the assemblies — and pre-compressing them with LZ4 first is largely redundant (you are compressing already-compressed data).
- With
extractNativeLibs=false, native libs (including the assembly store) are stored uncompressed and page-aligned in the APK and mapped directly — no extraction, no second on-disk copy. Google Play sets this automatically for app bundles on API 26+. So in what users actually download/install, the APK's zip compression does not help the assembly store at all, and you also can't trust a locally-built .apk/.aab size because Play re-packages it.
This is the annoying part: in the real distribution config we cannot lean on the APK's own compression for the assembly store, so if we don't compress it ourselves, it ships uncompressed.
Benchmarks
Size — extractNativeLibs=true (current default; APK zip DEFLATEs the store), android-arm64:
| store |
raw .so |
in the APK (DEFLATE) |
| LZ4 on |
7.64 MB |
6.10 MB |
| LZ4 off |
18.43 MB |
5.99 MB |
With the zip re-compressing, LZ4 is ~neutral (actually ~100 KB/ABI worse, because DEFLATE compresses raw assemblies better than already-LZ4'd bytes).
Size — extractNativeLibs=false (what Play delivers; store stored uncompressed), android-arm64:
| store |
in the APK (Stored) |
final signed APK |
| LZ4 on |
7.64 MB |
28.4 MB |
| LZ4 off |
18.43 MB |
38.9 MB |
Here LZ4 saves ~10.5 MB/ABI of download + on-disk size. Opposite conclusion from the true config — which is exactly why the variable matters.
Startup — A16, CoreCLR, 20x interleaved cold starts (am start -W TotalTime, force-stop between launches):
|
mean |
median |
stdev |
| compression on |
1109 ms |
1090 |
44 |
| compression off |
1044 ms |
1031 |
32 |
Compression makes cold startup ~65 ms / ~6% slower (Mann-Whitney z = 4.33, p < 0.001). The per-launch decompression CPU isn't offset by smaller I/O when the page cache is warm.
Decompression speed of candidate algorithms (A16, 139 MB of real assemblies, decompress + write):
| codec |
time |
throughput |
| LZ4 |
234 ms |
~593 MB/s |
| zlib/DEFLATE |
654 ms |
~212 MB/s |
LZ4 decompresses ~2.8x faster than zlib. This matters because the assembly store is decompressed at startup.
Possible directions (open)
- Keep LZ4 store compression. Real-config size win (~10.5 MB/ABI) at a ~6% startup cost and the
external/lz4 + K4os dependency.
- Drop store compression. Faster startup, less RAM, but ~10.5 MB/ABI larger downloads in the real (
extractNativeLibs=false) config.
- Switch to a different algorithm that doesn't need our LZ4 fork — but only if it's similarly fast to decompress, so startup doesn't regress. System
libz (zlib) is dependency-free on Android but ~2.8x slower to decompress (would worsen startup). Is there a fast-decompress option that's either system-provided or small enough to vendor without the fork (e.g. upstream lz4 directly, a minimal vendored decompressor, zstd, ...)?
- Make the default depend on
extractNativeLibs/packaging so we don't pay startup cost in configs where the size win doesn't materialize.
Open questions: what decompression throughput is "fast enough" to keep startup flat? Do we actually need the fork (dotnet/lz4), or just the algorithm? How much of the ~10.5 MB/ABI matters once split per-ABI in an AAB?
Block 2 — Fast deployment (fastdev)
What it is
On an incremental dotnet build -t:Install (Debug), only the assemblies the build determined changed are redeployed. For each such file, fastdev currently: LZ4-compresses it on the host (K4os, level L03_HC), then runs run-as <pkg> xamarin.sync <args> and streams the compressed bytes over the shell's stdin; the device-side xamarin.sync tool decompresses (external/lz4) and writes the file into the app's private files/.__override__/<abi>/ directory. Confirmed on net11 CoreCLR via live diagnostics (deploy.tool: xamarin.sync, deploy.supports.fastdev: True).
Importantly, this is one process spawn per file: FastDeploy.cs loops over the changed files and invokes run-as ... xamarin.sync ... once per file (FastDeploy.cs ~lines 730 -> 763 -> 792).
Variables that matter
- File count and file size of the changed set (per-file spawn overhead vs per-byte transfer).
- Per-file
run-as spawn overhead (measured ~40 ms each on the A16).
- Transfer channel throughput (the shell-stdin stream fastdev uses vs
adb push).
- Whether adb's own transfer compression (
adb push -z) is used (algorithms: any/none/brotli/lz4/zstd; assume available on modern adb).
- Host compression algorithm and level.
- Per-file vs batched invocation.
Benchmarks
Per-file spawn overhead (A16, on-device, differential - excludes adb roundtrip):
|
per spawn |
shell builtin (:) |
~0 ms (no fork) |
/system/bin/true (fork+exec floor) |
~24 ms |
run-as <pkg> true |
~43 ms |
run-as <pkg> <tool> |
~38 ms |
fastdev pays ~40 ms of run-as overhead per file, regardless of codec or payload.
Transfer throughput (A16, 139 MB):
| channel |
throughput |
| shell-stdin stream (what fastdev uses) |
~22 MB/s |
adb push -Z (compression off) |
~20 MB/s |
adb push -z (compression on) |
~60 MB/s |
The shell-stdin channel is uncompressed (~20 MB/s). adb push -z gives ~3x throughput on real assemblies, for free, with no host CPU.
Host-side compress cost (40 MB of real assemblies):
|
time |
LZ4 fast (L00) |
80 ms |
LZ4 L03_HC (current fastdev) |
~400 ms |
| DEFLATE level 1 |
365 ms |
| DEFLATE level 6 |
1538 ms |
Compression level matters a lot to end-to-end time; the currently-used L03_HC is on the slow side, and DEFLATE-6 would be slower than not compressing at all on a fast link.
Device tool sizes (stripped):
| tool |
size |
dependency |
C, LZ4 (bundles lz4.c) |
~65 KB |
external/lz4 |
C, zlib (links system libz.so) |
~6.6 KB |
system libz (no bundle) |
NativeAOT (C#, System.IO.Compression) |
~1.19 MB |
none (self-contained) |
Binary size is negligible in absolute terms; we don't think it's a deciding factor. (For completeness: a NativeAOT reimplementation is ~180x the C/zlib binary and the managed-runtime floor alone (~786 KB) exceeds the entire stripped C tool, so NativeAOT doesn't look attractive on size; it also adds process-startup cost.)
End-to-end deploy benchmark (A16, best of 3, md5-verified). This is the one that captures everything we care about: host compress + upload + decompress + move the file into the app's filesystem. Four strategies, framed by external dependency:
- S1 - host LZ4 +
run-as stdin stream + xamarin.sync decompress->app fs (dep: external/lz4) - i.e. ~today's design.
- S2 - host DEFLATE +
run-as stdin stream + tool inflate via system libz->app fs (dep: system libz).
- S3 - no host compression +
adb push -z to a tmp location + one batched run-as cp into the app fs (dep: none).
- S4 - no host compression +
adb push -Z (raw) to tmp + batched run-as cp (dep: none).
| changed file set |
S1 lz4-stream |
S2 zlib-stream |
S3 push -z |
S4 push -Z |
| 1 x 256 KB |
171 ms |
182 ms |
370 ms |
253 ms |
| 20 x 128 KB (2.5 MB) |
2487 ms |
2118 ms |
832 ms |
875 ms |
| 5 x 4 MB (20 MB) |
829 ms |
1276 ms |
669 ms |
1386 ms |
| 20 x 2 MB (40 MB) |
2107 ms |
2832 ms |
1452 ms |
2839 ms |
| 50 x 512 KB (25 MB) |
4525 ms |
4616 ms |
2028 ms |
2848 ms |
Observations from the matrix:
- The streaming strategies (S1/S2) scale poorly with file count because of the per-file
run-as spawn (~40 ms each); adb push transfers the whole set in one invocation.
adb push -z (S3) vs raw adb push -Z (S4) is ~2x on larger payloads - adb's built-in transfer compression roughly halves transfer time, with no host CPU and no tool.
- For a single tiny file, the batch
adb push setup overhead makes S3/S4 slower than streaming - but that's a small absolute difference.
- LZ4 vs zlib in the streaming model (S1 vs S2) is close; when file count is high both are dominated by spawn overhead, not the codec.
Possible directions (open)
- Rebase fastdev transport on
adb push (+ a batched run-as cp/mv) and let adb -z provide compression. This would remove the custom tool and the external/lz4/K4os fastdev dependency. The matrix suggests it's also faster for multi-file deploys (batching beats per-file spawn), though a single-tiny-file case is slightly slower.
- Keep a streaming tool but batch it (one invocation handling all files) to remove the per-file spawn cost, and/or use system
libz instead of bundling lz4.
- Tune the current path cheaply: switch the host LZ4 level from
L03_HC to a fast level, independent of any larger change.
- NativeAOT reimplementation of the tool: measured as a size/startup regression here; doesn't look worth it.
Open questions: is adb push -z reliably available across the device/adb versions we must support? How do these numbers look on faster hardware and over USB-3 vs the slower channel here? What does a realistic mixed incremental change (one large dll + a few small ones) look like end-to-end?
Cross-cutting
Both areas pull in external/lz4 (the fork) and K4os.Compression.LZ4. A recurring idea worth exploring for both: is there a compression choice that removes the fork/external-lz4 dependency while staying fast enough not to regress startup (store) or deploy time (fastdev)? System libz/zlib removes the dependency but is ~2.8x slower to decompress; that's likely fine for fastdev (transfer-bound) but a concern for the startup-sensitive assembly store. Other angles: using upstream lz4 directly rather than the fork, vendoring a minimal decompressor, or a different fast algorithm.
Caveats
Low-end single device (A16), CoreCLR only, a trivial app, single-ABI measurements, force-stop (warm-cache) cold starts rather than post-reboot cold-disk, and the fastdev numbers are from a synthetic harness that reproduces the real pipeline (host compress + run-as stream / adb push + decompress + move into the app fs) rather than from instrumenting FastDeploy itself. Numbers are directional. Happy to share the harness and raw data.
Background
dotnet/android carries a fork of LZ4, dotnet/lz4 (a fork of lz4/lz4), and also uses the
K4os.Compression.LZ4NuGet package on the build (host) side. LZ4 is used in two independent places:dotnet build -t:Install(Debug), changed assemblies are LZ4-compressed on the host, streamed to the device, and decompressed there.This issue collects what we measured about both, lays out the variables that matter, and lists possible directions. It is intentionally open-ended — there is no proposed decision here. The two areas have different constraints and different possible paths, so they are split into two blocks below.
Where LZ4 lives today (for reference):
src/native/clr/host/assembly-store.cc(CoreCLR) andsrc/native/mono/monodroid/embedded-assemblies.cc(Mono), viaLZ4_decompress_safefromexternal/lz4, guarded by#if defined(HAVE_LZ4) && defined(RELEASE).src/Xamarin.Android.Build.Tasks/Utilities/AssemblyCompression.cs(K4os.Compression.LZ4), gated by theAndroidEnableAssemblyCompressionMSBuild property (default true).tools/fastdev/xamarin.sync/main.cc(external/lz4).src/Xamarin.Android.Build.Debugging.Tasks/Tasks/FastDeploy.cs(K4os.Compression.LZ4, levelL03_HC).So both
external/lz4andK4os.Compression.LZ4have two consumers each. Removing the dependency would require addressing both areas.Block 1 — Assembly store compression
What it is
When
AndroidEnableAssemblyCompressionis true (the default), each managed assembly is LZ4-compressed (theXALZblock format) and packed intolib/<abi>/libassembly-store.so. At app startup the runtime decompresses each assembly into memory (LZ4_decompress_safe). With compression false, the assemblies are stored raw and can be used/mapped without a decompression step.Variables that matter
android:extractNativeLibs— the single most important variable, and the source of a real annoyance (below).The
extractNativeLibs/ Play Store annoyancelibassembly-store.sois a native library, so it is subject toandroid:extractNativeLibs:extractNativeLibs=true(dotnet/android's current default), the APK zip itself DEFLATE-compresses the store. So the APK's own compression already shrinks the assemblies — and pre-compressing them with LZ4 first is largely redundant (you are compressing already-compressed data).extractNativeLibs=false, native libs (including the assembly store) are stored uncompressed and page-aligned in the APK and mapped directly — no extraction, no second on-disk copy. Google Play sets this automatically for app bundles on API 26+. So in what users actually download/install, the APK's zip compression does not help the assembly store at all, and you also can't trust a locally-built.apk/.aabsize because Play re-packages it.This is the annoying part: in the real distribution config we cannot lean on the APK's own compression for the assembly store, so if we don't compress it ourselves, it ships uncompressed.
Benchmarks
Size —
extractNativeLibs=true(current default; APK zip DEFLATEs the store), android-arm64:.soWith the zip re-compressing, LZ4 is ~neutral (actually ~100 KB/ABI worse, because DEFLATE compresses raw assemblies better than already-LZ4'd bytes).
Size —
extractNativeLibs=false(what Play delivers; store stored uncompressed), android-arm64:Here LZ4 saves ~10.5 MB/ABI of download + on-disk size. Opposite conclusion from the
trueconfig — which is exactly why the variable matters.Startup — A16, CoreCLR, 20x interleaved cold starts (
am start -WTotalTime, force-stop between launches):Compression makes cold startup ~65 ms / ~6% slower (Mann-Whitney z = 4.33, p < 0.001). The per-launch decompression CPU isn't offset by smaller I/O when the page cache is warm.
Decompression speed of candidate algorithms (A16, 139 MB of real assemblies, decompress + write):
LZ4 decompresses ~2.8x faster than zlib. This matters because the assembly store is decompressed at startup.
Possible directions (open)
external/lz4+K4osdependency.extractNativeLibs=false) config.libz(zlib) is dependency-free on Android but ~2.8x slower to decompress (would worsen startup). Is there a fast-decompress option that's either system-provided or small enough to vendor without the fork (e.g. upstream lz4 directly, a minimal vendored decompressor, zstd, ...)?extractNativeLibs/packaging so we don't pay startup cost in configs where the size win doesn't materialize.Open questions: what decompression throughput is "fast enough" to keep startup flat? Do we actually need the fork (
dotnet/lz4), or just the algorithm? How much of the ~10.5 MB/ABI matters once split per-ABI in an AAB?Block 2 — Fast deployment (fastdev)
What it is
On an incremental
dotnet build -t:Install(Debug), only the assemblies the build determined changed are redeployed. For each such file, fastdev currently: LZ4-compresses it on the host (K4os, levelL03_HC), then runsrun-as <pkg> xamarin.sync <args>and streams the compressed bytes over the shell's stdin; the device-sidexamarin.synctool decompresses (external/lz4) and writes the file into the app's privatefiles/.__override__/<abi>/directory. Confirmed on net11 CoreCLR via live diagnostics (deploy.tool: xamarin.sync,deploy.supports.fastdev: True).Importantly, this is one process spawn per file:
FastDeploy.csloops over the changed files and invokesrun-as ... xamarin.sync ...once per file (FastDeploy.cs~lines 730 -> 763 -> 792).Variables that matter
run-asspawn overhead (measured ~40 ms each on the A16).adb push).adb push -z) is used (algorithms: any/none/brotli/lz4/zstd; assume available on modern adb).Benchmarks
Per-file spawn overhead (A16, on-device, differential - excludes adb roundtrip):
:)/system/bin/true(fork+exec floor)run-as <pkg> truerun-as <pkg> <tool>fastdev pays ~40 ms of
run-asoverhead per file, regardless of codec or payload.Transfer throughput (A16, 139 MB):
adb push -Z(compression off)adb push -z(compression on)The shell-stdin channel is uncompressed (~20 MB/s).
adb push -zgives ~3x throughput on real assemblies, for free, with no host CPU.Host-side compress cost (40 MB of real assemblies):
L00)L03_HC(current fastdev)Compression level matters a lot to end-to-end time; the currently-used
L03_HCis on the slow side, and DEFLATE-6 would be slower than not compressing at all on a fast link.Device tool sizes (stripped):
lz4.c)external/lz4libz.so)System.IO.Compression)Binary size is negligible in absolute terms; we don't think it's a deciding factor. (For completeness: a NativeAOT reimplementation is ~180x the C/zlib binary and the managed-runtime floor alone (~786 KB) exceeds the entire stripped C tool, so NativeAOT doesn't look attractive on size; it also adds process-startup cost.)
End-to-end deploy benchmark (A16, best of 3, md5-verified). This is the one that captures everything we care about: host compress + upload + decompress + move the file into the app's filesystem. Four strategies, framed by external dependency:
run-asstdin stream +xamarin.syncdecompress->app fs (dep:external/lz4) - i.e. ~today's design.run-asstdin stream + tool inflate via systemlibz->app fs (dep: system libz).adb push -zto a tmp location + one batchedrun-as cpinto the app fs (dep: none).adb push -Z(raw) to tmp + batchedrun-as cp(dep: none).push -zpush -ZObservations from the matrix:
run-asspawn (~40 ms each);adb pushtransfers the whole set in one invocation.adb push -z(S3) vs rawadb push -Z(S4) is ~2x on larger payloads - adb's built-in transfer compression roughly halves transfer time, with no host CPU and no tool.adb pushsetup overhead makes S3/S4 slower than streaming - but that's a small absolute difference.Possible directions (open)
adb push(+ a batchedrun-as cp/mv) and letadb -zprovide compression. This would remove the custom tool and theexternal/lz4/K4osfastdev dependency. The matrix suggests it's also faster for multi-file deploys (batching beats per-file spawn), though a single-tiny-file case is slightly slower.libzinstead of bundling lz4.L03_HCto a fast level, independent of any larger change.Open questions: is
adb push -zreliably available across the device/adb versions we must support? How do these numbers look on faster hardware and over USB-3 vs the slower channel here? What does a realistic mixed incremental change (one large dll + a few small ones) look like end-to-end?Cross-cutting
Both areas pull in
external/lz4(the fork) andK4os.Compression.LZ4. A recurring idea worth exploring for both: is there a compression choice that removes the fork/external-lz4 dependency while staying fast enough not to regress startup (store) or deploy time (fastdev)? Systemlibz/zlib removes the dependency but is ~2.8x slower to decompress; that's likely fine for fastdev (transfer-bound) but a concern for the startup-sensitive assembly store. Other angles: using upstream lz4 directly rather than the fork, vendoring a minimal decompressor, or a different fast algorithm.Caveats
Low-end single device (A16), CoreCLR only, a trivial app, single-ABI measurements, force-stop (warm-cache) cold starts rather than post-reboot cold-disk, and the fastdev numbers are from a synthetic harness that reproduces the real pipeline (host compress +
run-asstream /adb push+ decompress + move into the app fs) rather than from instrumentingFastDeployitself. Numbers are directional. Happy to share the harness and raw data.