fix misaligned pointer use in hash code#229
Conversation
|
My biggest concern is performance. MurmurHash was considered for its speed. This patch:
Sounds like we should analyze the feasibility of feeding aligned-ptrs-only instead:
I tried using inline bool isAligned( const uint8_t *x )
{
return (reinterpret_cast<size_t>(x) & 0xFu) == 0u;
}and it did not trigger; but a simple "works on my machine" (Linux x64) isn't a scientific measurement. If it's unfeasible to feed aligned ptrs, then we have no other choice but to bite the bullet. Alternative:Brainstorming here: Having 2 versions of the routine and dispatch based on ptr alignment may be an interesting approach; ideally only used on platforms where unaligned reads are actually unsupported (though that's UB) |
|
To give a specific example, I'm getting odd addresses passed to the hash function by |
3904aa1 to
da6dbf9
Compare
76d820e to
03fbfb6
Compare
2ee1c83 to
8b3ddcc
Compare
49c451d to
84c07a0
Compare
a3c7671 to
67f9fdd
Compare
|
That is a real bug in MurmurHash3 code. It is hidden by the fact that x86/x64/arm64 allows unaligned reads, but should explode on arm32 and maybe other platforms. Other problem is with aliasing - casting (uint8*) to (uint32*) is already UB, as it allows compiler to assume things that are not true and then optimize basing on those assumptions. Moreover, similar problem was found with storing results in (uint32_t)out = h1; Others found this issue too, https://discourse.julialang.org/t/problem-with-issue-murmurhash3-has-undefined-behavior/9807 and I highly recommend to read that thread with justification of using memcpy compiler intrinsic to solve both issues - alignment and aliasing. Here is their fix https://github.com/JuliaLang/julia/pull/30220/files Unfortunately, problems are not fixed in upstream https://github.com/aappleby/smhasher/blob/master/src/MurmurHash3.cpp |
|
Using a different hash algorithm is not out of the question. There have been newer hashing algos since Murmur came out. |
The MurmurHash code was using misaligned pointers, which is undefined behavior according to the C++ standard and probably unsafe on multiple platforms. See Is it well-defined to hold a misaligned pointer, as long as you don't ever dereference it? on Stack Overflow.