Skip to content

Consider further optimisation on mod_function. #50

@sh1boot

Description

@sh1boot

Supposing your hash values are uniformly distributed "random" values in the range of 0..2**64-1, you can skip the modulo and instead use random-number scaling techniques to distribute the result (the bucket number) evenly throughout that space. I don't think there's any reason to stick with modulo, is there?

The quickest-and-dirtiest of these would be:

size_t bucket(uint64_t hash) {
    return uint64_t(uint128_t(hash) * K >> 64);
}

But if the incoming hashes are not already uniform then try this one weird trick:

size_t bucket(uint64_t hash) {
     crc32(hash) * K >> 32;
}

(which only works when K is less than 2**32), and you'll have to find the right intrinsic for the target to get the hardware-accelerated CRC instruction)

This one is more thorough:

static inline uint64_t murmurmix64(uint64_t h) {
    h ^= h >> 33;
    h *= 0xff51afd7ed558ccdULL;
    h ^= h >> 33;
    h *= 0xc4ceb9fe1a85ec53ULL;
    h ^= h >> 33;

    return h;
}

size_t bucket(uint64_t hash) {
    hash = murmurmix64(hash);
    return uint64_t(uint128_t(hash) * K >> 64);
}

And this one is probably fine, covers up to 64-bit ranges, and is probably still faster than mod by a constant:

size_t bucket(uint64_t hash) {
    hash *= 0xc4ceb9fe1a85ec53ULL;
    return uint64_t(uint128_t(hash) * K >> 64);
}

Plus it probably makes no difference if K is not constant, and so you can avoid calling via a function pointer and just load the table size into a register and perform the multiply by that inline.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions