Skip to content

bench: foreach vs ReadGroups callback decode benchmark (#156)#161

Merged
pedrosakuma merged 1 commit into
mainfrom
bench/foreach-vs-callback-156
Apr 25, 2026
Merged

bench: foreach vs ReadGroups callback decode benchmark (#156)#161
pedrosakuma merged 1 commit into
mainfrom
bench/foreach-vs-callback-156

Conversation

@pedrosakuma
Copy link
Copy Markdown
Owner

Context

Follow-up to #156 / PR #158 (v1.5.0): we shipped the foreach-style group enumerator with the claim that it's faster and zero-alloc compared to the existing ReadGroups callback API. This PR backs that claim with numbers.

What

Adds GroupForeachVsCallbackBenchmarks comparing three decode strategies on MarketDataData (two simple top-level groups, qualifies for foreach), parameterized over GroupSize ∈ {10, 50, 100}:

  1. CallbackReadGroups with capturing lambdas (mirrors the most common user pattern: closure allocation per call)
  2. Foreach — v1.5.0 enumerator
  3. Foreach + early break — demonstrates the skip-cost of accessing groups without iterating them

Workload (identical across variants): sum Price.Value + Quantity.Value over every entry, returned to defeat dead-code elimination.

Results

AMD EPYC 7763, .NET 9.0.14, BenchmarkDotNet v0.15.4:

Method GroupSize Mean Ratio Allocated
Callback 10 124 ns 1.00 152 B
Foreach 10 22 ns 0.17 0 B
Foreach + break 10 3 ns 0.02 0 B
Callback 50 524 ns 1.00 152 B
Foreach 50 211 ns 0.40 0 B
Foreach + break 50 3 ns 0.006 0 B
Callback 100 999 ns 1.00 152 B
Foreach 100 184 ns 0.18 0 B
Foreach + break 100 3 ns 0.003 0 B

Takeaways

  • ~5× faster on full iteration at typical group sizes.
  • Eliminates the 152 B / call closure allocation (delegate + display class for the captured sum).
  • Early break is essentially free because each group property does an O(1) skip; with callbacks you always pay for every entry.
  • Validates the docs/perf-tuning-guide recommendation to prefer foreach for simple groups.

Notes

  • Existing RepeatingGroupBenchmarks.DecodeWithGroups is left untouched — it does encode+decode in the hot path and isn't directly comparable. New file is focused (decode-only, pre-encoded buffer).
  • Schema uses <sbe:message name=\"MarketData\"> which has only simple top-level groups, so it qualifies for the foreach emit path.
  • README under benchmarks/ updated with the new benchmark description and reference numbers.

Adds GroupForeachVsCallbackBenchmarks comparing the v1.5.0 foreach-style
group enumerator against the original ReadGroups callback API on
MarketDataData (two simple top-level groups), parameterized over
GroupSize ∈ {10, 50, 100}.

Results on AMD EPYC 7763 / .NET 9 (GroupSize=100):

  Callback         999 ns   152 B   1.00x
  Foreach          184 ns     0 B   0.18x
  Foreach + break    3 ns     0 B   0.003x

Foreach is ~5x faster on full iteration and eliminates the 152 B
per-call closure allocation. Early break is essentially free because
each group property does an O(1) skip rather than running every entry.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@pedrosakuma pedrosakuma merged commit a5c16bd into main Apr 25, 2026
1 check passed
@pedrosakuma pedrosakuma deleted the bench/foreach-vs-callback-156 branch April 25, 2026 19:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant