comparison for load/store vs loadu/storeu #21

mperikov · 2025-12-02T18:09:27Z

The following instructions were compared:

_mm256_loadu_si256 vs _mm256_load_si256
_mm256_storeu_si256 vs _mm256_store_si256

No obvious differences in execution time were observed.

Also need to compare analogs from AVX512.

Malkovsky

Missing core functionality

Malkovsky · 2025-12-09T08:38:56Z

src/alignment_comparison.cpp

+
+#ifdef PIXIE_AVX512_SUPPORT
+
+static void BM_Loadu512(benchmark::State& state) {


We discussed that we want to compare:
-- Aligned store/load
-- Aligned storeu/loadu
-- Unaligned storeu/loadu within a single 64-byte block (for <=256-bit registers)
-- Unaligned storeu/loadu crossing a 64-byte block border
-- (optional) test that unaligned store/load crashes

Also I think that benchmarks are better be organized so that store/load are performed to a conditionally random address to an array of different sizes so that we also see the impact of the cache misses. Specifically we definitely expect some degradation for store/load on an address that crosses 64-byte block border.

I.e. for example unaligned storeu/loadu within a 64-byte block should be something like

alignas(64) uint8_t data[64 * n]; for (auto _ : state) { const __m256i* ptr = reinterpret_cast<const __m256i*>(data + 1 + 32 * (rng() % (n - 1))); benchmark::DoNotOptimize(_mm256_loadu_si256(ptr)); }

Note that rng() call and % might be heavy in this context

…ixie into alignment-comparison

Malkovsky · 2025-12-15T13:10:25Z

src/alignment_comparison.cpp

+  std::mt19937_64 rng(42);
+
+  for (auto _ : state) {
+    size_t idx = 64 * (rng() % (n - 1));


it is probably better to make n=2^k+1 and perform rng() & ((1 << k) - 1).

mperikov added 4 commits December 2, 2025 21:01

comparison for load/store vs loadu/storeu

b5a743f

Format fix

110ff2f

Format fix

d999c89

typo correction

770244a

Malkovsky approved these changes Dec 6, 2025

View reviewed changes

Malkovsky requested changes Dec 9, 2025

View reviewed changes

mperikov and others added 8 commits December 9, 2025 16:00

Benhmarks update

1e4c1b4

Random pointers

5109a81

Merge branch 'main' into alignment-comparison

e525544

Format fix

1f77b2b

Merge branch 'alignment-comparison' of https://github.com/Malkovsky/p…

059c3bd

…ixie into alignment-comparison

Random benchmarks fix

a917da8

4 types of tests

caf8ea3

Format

6c897d5

Malkovsky reviewed Dec 15, 2025

View reviewed changes

mperikov added 2 commits December 15, 2025 19:25

array alignas and size fix

e922dc7

Format fix

fb88e0b

Malkovsky approved these changes Dec 18, 2025

View reviewed changes

Malkovsky merged commit 96e8966 into main Dec 18, 2025
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

comparison for load/store vs loadu/storeu #21

comparison for load/store vs loadu/storeu #21

Uh oh!

mperikov commented Dec 2, 2025

Uh oh!

Malkovsky left a comment

Uh oh!

Malkovsky Dec 9, 2025 •

edited

Loading

Uh oh!

Malkovsky Dec 9, 2025

Uh oh!

Malkovsky Dec 9, 2025 •

edited

Loading

Uh oh!

Malkovsky Dec 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		#ifdef PIXIE_AVX512_SUPPORT

		static void BM_Loadu512(benchmark::State& state) {

comparison for load/store vs loadu/storeu #21

comparison for load/store vs loadu/storeu #21

Uh oh!

Conversation

mperikov commented Dec 2, 2025

Uh oh!

Malkovsky left a comment

Choose a reason for hiding this comment

Uh oh!

Malkovsky Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Malkovsky Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

Malkovsky Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Malkovsky Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Malkovsky Dec 9, 2025 •

edited

Loading

Malkovsky Dec 9, 2025 •

edited

Loading