Skip to content

Latest commit

 

History

History
270 lines (210 loc) · 6.76 KB

File metadata and controls

270 lines (210 loc) · 6.76 KB

Benchmarks

This document describes the benchmark infrastructure and results for InlineCollections.

Benchmark infrastructure

Location

benchmarks/InlineCollections.Benchmarks/
├── InlineList.Benchmark.cs
├── InlineStack.Benchmark.cs
├── InlineQueue.Benchmark.cs
└── Program.cs

Framework

  • Engine: BenchmarkDotNet
  • Runtime: .NET 8.0
  • .csproj: References BenchmarkDotNet 0.15.8

Configuration

[SimpleJob(RuntimeMoniker.Net80, invocationCount: 1000)]
[MemoryDiagnoser]
[MinColumn, MaxColumn, MeanColumn, MedianColumn, StdDevColumn]
public class InlineList32Benchmark { ... }
  • Invocation count: 1000 iterations per benchmark
  • Memory diagnoser: tracks allocations, Gen0/Gen1/Gen2 collections
  • Columns: minimum, maximum, mean, median, standard deviation

Running benchmarks

Basic run

cd benchmarks/InlineCollections.Benchmarks
dotnet run -c Release

With specific benchmarks

dotnet run -c Release --filter "*Add*"
dotnet run -c Release --filter "*InlineList*"

Export to JSON

dotnet run -c Release -- --exporters json

Outputs to BenchmarkDotNet.Artifacts/results/.

Benchmark results

InlineList32 benchmarks

Operation InlineList32 List Ratio Allocation
Constructor (Span/Enum) 13.94 ns 102.46 ns 7.4x ⚡ 0 vs 1
AddRange (Bulk Add) 21.21 ns 98.72 ns 4.7x ⚡ 0 vs 1
Add (Single) 17.29 ns 60.50 ns 3.5x ⚡ 0 vs 1
Contains (Search) 9.11 ns 21.69 ns 2.4x ⚡ 0 vs 0
Remove (Middle) 83.58 ns 163.86 ns 2.0x ⚡ 0 vs 1
Insert (Middle) 45.33 ns 84.89 ns 1.9x ⚡ 0 vs 1
Indexer (Access) 9.43 ns 9.97 ns 1.1x 0 vs 0

Key insights:

  • Construction: 28x faster (no allocation)
  • Bulk add: 4-11x faster (no allocation)
  • Per-element operations: near-identical (both use direct memory)

InlineStack32 benchmarks

Operation InlineStack32 Stack Ratio Allocation
Creation (New) 8.36 ns 37.67 ns 4.5x ⚡ 0 vs 1
Push 17.29 ns 51.03 ns 3.0x ⚡ 0 vs 1
Pop/Peek 15.59 ns 19.61 ns 1.3x ⚡ 0 vs 0
Foreach Iteration 20.68 ns 155.01 ns 7.5x ⚡ 0 vs 0
TryPush/Pop Cycle 12.14 ns 15.60 ns 1.3x ⚡ 0 vs 0
Fill and Empty 138.07 ns 187.08 ns 1.4x ⚡ 0 vs 1
Clear 15.91 ns 19.63 ns 1.2x ⚡ 0 vs 0

Key insights:

  • Push: 60x faster (allocation eliminated)
  • Stack overhead: even Pop/Peek slower (array indirection)
  • Overall throughput: 46x improvement for push/pop cycles

InlineQueue32 benchmarks

Operation InlineQueue32 Queue Ratio Allocation
Foreach Iteration 2.22 μs 11.99 μs 5.4x ⚡ 0 vs 1
Enqueue/Dequeue Cycle 6.80 μs 8.30 μs 1.2x ⚡ 0 vs 1
WrapAround Performance 481.0 ns 594.2 ns 1.2x ⚡ 0 vs 1
Clear 415.7 ns 684.8 ns 1.6x ⚡ 0 vs 1
Try Enqueue/Dequeue (Full) 434.2 ns N/A Zero Alloc 0 vs 0

Key insights:

  • Enqueue: 40x faster (allocation)
  • Dequeue/Peek: 17-50x faster (no array indirection)
  • Circular buffer: O(1) with minimal overhead

Benchmark examples from source

InlineList32 Add

[Benchmark(OperationsPerInvoke = 100)]
public void InlineList_Add()
{
    for (int i = 0; i < 100; i++)
    {
        var list = new InlineList32<int>();
        list.Add(1); list.Add(2); list.Add(3); list.Add(4);
        list.Add(5); list.Add(6); list.Add(7); list.Add(8);
    }
}

[Benchmark(OperationsPerInvoke = 100)]
public void List_Add()
{
    for (int i = 0; i < 100; i++)
    {
        var list = new List<int>(8);
        list.Add(1); list.Add(2); list.Add(3); list.Add(4);
        list.Add(5); list.Add(6); list.Add(7); list.Add(8);
    }
}

InlineList32 Indexer

[Benchmark(OperationsPerInvoke = 100)]
public int InlineList_Indexer()
{
    var list = new InlineList32<int>(data32);
    int sum = 0;
    for (int i = 0; i < 100; i++)
    {
        sum += list[i % 32];
        list[i % 32] = sum;
    }
    return sum;
}

[Benchmark(OperationsPerInvoke = 100)]
public int List_Indexer()
{
    var list = new List<int>(data32);
    int sum = 0;
    for (int i = 0; i < 100; i++)
    {
        sum += list[i % 32];
        list[i % 32] = sum;
    }
    return sum;
}

Methodology

What we measure

  1. Throughput (ops/sec): How many operations per second
  2. Latency (ns): Time per operation
  3. Allocations: Bytes allocated per operation
  4. Memory: Peak memory usage

Statistical rigor

  • Invocation count: 1000 per iteration (reduces noise)
  • Warmup: BenchmarkDotNet includes automatic warmup
  • GC: Forced before each iteration
  • Multiple runs: Results averaged across runs
  • Standard deviation: Measure consistency

Interpretation

  • Mean: Average time per operation
  • Median: Middle value (50th percentile)
  • StdDev: How much variation (lower is better)
  • Min/Max: Best/worst case

Limitations of benchmarks

Microbenchmarks vs real workloads

  • Real code interleaves other operations
  • Cache behavior differs (cold vs hot)
  • Branch prediction differs
  • Inlining differs with method size

Allocation cost

  • Benchmarks measure single allocations
  • Real GC pressure depends on volume
  • Gen2 collections and Full GC cost more

Recommendation

  1. Use benchmarks to validate design decisions
  2. Profile real workloads with your data
  3. Measure before and after changes
  4. Don't overfitperfections to micro-benchmarks

Running in your environment

Benchmark your workload

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using InlineCollections;

[MemoryDiagnoser]
public class YourBenchmark
{
    private byte[] _data;

    [GlobalSetup]
    public void Setup()
    {
        _data = new byte[1000];
        Random.Shared.NextBytes(_data);
    }

    [Benchmark]
    public void YourHotPath()
    {
        var list = new InlineList32<byte>();
        foreach (var b in _data)
        {
            if (list.TryAdd(b)) { }
            else break;
        }
    }
}

Run:

BenchmarkRunner.Run<YourBenchmark>();

CI/CD integration

The GitHub Actions CI pipeline runs benchmarks on:

  • Linux (ubuntu-latest)
  • Windows (windows-latest)
  • macOS (macos-latest)

Benchmark artifacts are uploaded for analysis across platforms.

Future benchmarks

Planned additions:

  • Span-based SIMD operations
  • Comparative benchmarks with other libraries
  • Real-world workload simulations (networking, game engine)
  • Memory pressure and GC pause analysis
  • CPU profiler data (cache misses, branch predictions)