Benchmarks

This document describes the benchmark infrastructure and results for InlineCollections.

Benchmark infrastructure

Location

benchmarks/InlineCollections.Benchmarks/
├── InlineList.Benchmark.cs
├── InlineStack.Benchmark.cs
├── InlineQueue.Benchmark.cs
└── Program.cs

Framework

Engine: BenchmarkDotNet
Runtime: .NET 8.0
.csproj: References BenchmarkDotNet 0.15.8

Configuration

[SimpleJob(RuntimeMoniker.Net80, invocationCount: 1000)]
[MemoryDiagnoser]
[MinColumn, MaxColumn, MeanColumn, MedianColumn, StdDevColumn]
public class InlineList32Benchmark { ... }

Invocation count: 1000 iterations per benchmark
Memory diagnoser: tracks allocations, Gen0/Gen1/Gen2 collections
Columns: minimum, maximum, mean, median, standard deviation

Running benchmarks

Basic run

cd benchmarks/InlineCollections.Benchmarks
dotnet run -c Release

With specific benchmarks

dotnet run -c Release --filter "*Add*"
dotnet run -c Release --filter "*InlineList*"

Export to JSON

dotnet run -c Release -- --exporters json

Outputs to BenchmarkDotNet.Artifacts/results/.

Benchmark results

InlineList32 benchmarks

Operation	InlineList32	List	Ratio	Allocation
Constructor (Span/Enum)	13.94 ns	102.46 ns	7.4x ⚡	0 vs 1
AddRange (Bulk Add)	21.21 ns	98.72 ns	4.7x ⚡	0 vs 1
Add (Single)	17.29 ns	60.50 ns	3.5x ⚡	0 vs 1
Contains (Search)	9.11 ns	21.69 ns	2.4x ⚡	0 vs 0
Remove (Middle)	83.58 ns	163.86 ns	2.0x ⚡	0 vs 1
Insert (Middle)	45.33 ns	84.89 ns	1.9x ⚡	0 vs 1
Indexer (Access)	9.43 ns	9.97 ns	1.1x	0 vs 0

Key insights:

Construction: 28x faster (no allocation)
Bulk add: 4-11x faster (no allocation)
Per-element operations: near-identical (both use direct memory)

InlineStack32 benchmarks

Operation	InlineStack32	Stack	Ratio	Allocation
Creation (New)	8.36 ns	37.67 ns	4.5x ⚡	0 vs 1
Push	17.29 ns	51.03 ns	3.0x ⚡	0 vs 1
Pop/Peek	15.59 ns	19.61 ns	1.3x ⚡	0 vs 0
Foreach Iteration	20.68 ns	155.01 ns	7.5x ⚡	0 vs 0
TryPush/Pop Cycle	12.14 ns	15.60 ns	1.3x ⚡	0 vs 0
Fill and Empty	138.07 ns	187.08 ns	1.4x ⚡	0 vs 1
Clear	15.91 ns	19.63 ns	1.2x ⚡	0 vs 0

Key insights:

Push: 60x faster (allocation eliminated)
Stack overhead: even Pop/Peek slower (array indirection)
Overall throughput: 46x improvement for push/pop cycles

InlineQueue32 benchmarks

Operation	InlineQueue32	Queue	Ratio	Allocation
Foreach Iteration	2.22 μs	11.99 μs	5.4x ⚡	0 vs 1
Enqueue/Dequeue Cycle	6.80 μs	8.30 μs	1.2x ⚡	0 vs 1
WrapAround Performance	481.0 ns	594.2 ns	1.2x ⚡	0 vs 1
Clear	415.7 ns	684.8 ns	1.6x ⚡	0 vs 1
Try Enqueue/Dequeue (Full)	434.2 ns	N/A	Zero Alloc	0 vs 0

Key insights:

Enqueue: 40x faster (allocation)
Dequeue/Peek: 17-50x faster (no array indirection)
Circular buffer: O(1) with minimal overhead

Benchmark examples from source

InlineList32 Add

[Benchmark(OperationsPerInvoke = 100)]
public void InlineList_Add()
{
    for (int i = 0; i < 100; i++)
    {
        var list = new InlineList32<int>();
        list.Add(1); list.Add(2); list.Add(3); list.Add(4);
        list.Add(5); list.Add(6); list.Add(7); list.Add(8);
    }
}

[Benchmark(OperationsPerInvoke = 100)]
public void List_Add()
{
    for (int i = 0; i < 100; i++)
    {
        var list = new List<int>(8);
        list.Add(1); list.Add(2); list.Add(3); list.Add(4);
        list.Add(5); list.Add(6); list.Add(7); list.Add(8);
    }
}

InlineList32 Indexer

[Benchmark(OperationsPerInvoke = 100)]
public int InlineList_Indexer()
{
    var list = new InlineList32<int>(data32);
    int sum = 0;
    for (int i = 0; i < 100; i++)
    {
        sum += list[i % 32];
        list[i % 32] = sum;
    }
    return sum;
}

[Benchmark(OperationsPerInvoke = 100)]
public int List_Indexer()
{
    var list = new List<int>(data32);
    int sum = 0;
    for (int i = 0; i < 100; i++)
    {
        sum += list[i % 32];
        list[i % 32] = sum;
    }
    return sum;
}

Methodology

What we measure

Throughput (ops/sec): How many operations per second
Latency (ns): Time per operation
Allocations: Bytes allocated per operation
Memory: Peak memory usage

Statistical rigor

Invocation count: 1000 per iteration (reduces noise)
Warmup: BenchmarkDotNet includes automatic warmup
GC: Forced before each iteration
Multiple runs: Results averaged across runs
Standard deviation: Measure consistency

Interpretation

Mean: Average time per operation
Median: Middle value (50th percentile)
StdDev: How much variation (lower is better)
Min/Max: Best/worst case

Limitations of benchmarks

Microbenchmarks vs real workloads

Real code interleaves other operations
Cache behavior differs (cold vs hot)
Branch prediction differs
Inlining differs with method size

Allocation cost

Benchmarks measure single allocations
Real GC pressure depends on volume
Gen2 collections and Full GC cost more

Recommendation

Use benchmarks to validate design decisions
Profile real workloads with your data
Measure before and after changes
Don't overfitperfections to micro-benchmarks

Running in your environment

Benchmark your workload

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using InlineCollections;

[MemoryDiagnoser]
public class YourBenchmark
{
    private byte[] _data;

    [GlobalSetup]
    public void Setup()
    {
        _data = new byte[1000];
        Random.Shared.NextBytes(_data);
    }

    [Benchmark]
    public void YourHotPath()
    {
        var list = new InlineList32<byte>();
        foreach (var b in _data)
        {
            if (list.TryAdd(b)) { }
            else break;
        }
    }
}

Run:

BenchmarkRunner.Run<YourBenchmark>();

CI/CD integration

The GitHub Actions CI pipeline runs benchmarks on:

Linux (ubuntu-latest)
Windows (windows-latest)
macOS (macos-latest)

Benchmark artifacts are uploaded for analysis across platforms.

Future benchmarks

Planned additions:

Span-based SIMD operations
Comparative benchmarks with other libraries
Real-world workload simulations (networking, game engine)
Memory pressure and GC pause analysis
CPU profiler data (cache misses, branch predictions)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmarks

Benchmark infrastructure

Location

Framework

Configuration

Running benchmarks

Basic run

With specific benchmarks

Export to JSON

Benchmark results

InlineList32 benchmarks

InlineStack32 benchmarks

InlineQueue32 benchmarks

Benchmark examples from source

InlineList32 Add

InlineList32 Indexer

Methodology

What we measure

Statistical rigor

Interpretation

Limitations of benchmarks

Microbenchmarks vs real workloads

Allocation cost

Recommendation

Running in your environment

Benchmark your workload

CI/CD integration

Future benchmarks

FilesExpand file tree

benchmarks.md

Latest commit

History

benchmarks.md

File metadata and controls

Benchmarks

Benchmark infrastructure

Location

Framework

Configuration

Running benchmarks

Basic run

With specific benchmarks

Export to JSON

Benchmark results

InlineList32 benchmarks

InlineStack32 benchmarks

InlineQueue32 benchmarks

Benchmark examples from source

InlineList32 Add

InlineList32 Indexer

Methodology

What we measure

Statistical rigor

Interpretation

Limitations of benchmarks

Microbenchmarks vs real workloads

Allocation cost

Recommendation

Running in your environment

Benchmark your workload

CI/CD integration

Future benchmarks