bench: foreach vs ReadGroups callback decode benchmark (#156) by pedrosakuma · Pull Request #161 · pedrosakuma/SbeSourceGenerator

pedrosakuma · 2026-04-25T19:15:02Z

Context

Follow-up to #156 / PR #158 (v1.5.0): we shipped the foreach-style group enumerator with the claim that it's faster and zero-alloc compared to the existing ReadGroups callback API. This PR backs that claim with numbers.

What

Adds GroupForeachVsCallbackBenchmarks comparing three decode strategies on MarketDataData (two simple top-level groups, qualifies for foreach), parameterized over GroupSize ∈ {10, 50, 100}:

Callback — ReadGroups with capturing lambdas (mirrors the most common user pattern: closure allocation per call)
Foreach — v1.5.0 enumerator
Foreach + early break — demonstrates the skip-cost of accessing groups without iterating them

Workload (identical across variants): sum Price.Value + Quantity.Value over every entry, returned to defeat dead-code elimination.

Results

AMD EPYC 7763, .NET 9.0.14, BenchmarkDotNet v0.15.4:

Method	GroupSize	Mean	Ratio	Allocated
Callback	10	124 ns	1.00	152 B
Foreach	10	22 ns	0.17	0 B
Foreach + break	10	3 ns	0.02	0 B
Callback	50	524 ns	1.00	152 B
Foreach	50	211 ns	0.40	0 B
Foreach + break	50	3 ns	0.006	0 B
Callback	100	999 ns	1.00	152 B
Foreach	100	184 ns	0.18	0 B
Foreach + break	100	3 ns	0.003	0 B

Takeaways

~5× faster on full iteration at typical group sizes.
Eliminates the 152 B / call closure allocation (delegate + display class for the captured sum).
Early break is essentially free because each group property does an O(1) skip; with callbacks you always pay for every entry.
Validates the docs/perf-tuning-guide recommendation to prefer foreach for simple groups.

Notes

Existing RepeatingGroupBenchmarks.DecodeWithGroups is left untouched — it does encode+decode in the hot path and isn't directly comparable. New file is focused (decode-only, pre-encoded buffer).
Schema uses <sbe:message name=\"MarketData\"> which has only simple top-level groups, so it qualifies for the foreach emit path.
README under benchmarks/ updated with the new benchmark description and reference numbers.

Adds GroupForeachVsCallbackBenchmarks comparing the v1.5.0 foreach-style group enumerator against the original ReadGroups callback API on MarketDataData (two simple top-level groups), parameterized over GroupSize ∈ {10, 50, 100}. Results on AMD EPYC 7763 / .NET 9 (GroupSize=100): Callback 999 ns 152 B 1.00x Foreach 184 ns 0 B 0.18x Foreach + break 3 ns 0 B 0.003x Foreach is ~5x faster on full iteration and eliminates the 152 B per-call closure allocation. Early break is essentially free because each group property does an O(1) skip rather than running every entry. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

pedrosakuma merged commit a5c16bd into main Apr 25, 2026
1 check passed

pedrosakuma deleted the bench/foreach-vs-callback-156 branch April 25, 2026 19:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bench: foreach vs ReadGroups callback decode benchmark (#156)#161

bench: foreach vs ReadGroups callback decode benchmark (#156)#161
pedrosakuma merged 1 commit into
mainfrom
bench/foreach-vs-callback-156

pedrosakuma commented Apr 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pedrosakuma commented Apr 25, 2026

Context

What

Results

Takeaways

Notes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant