Conversation
|
After a couple of measurements on a different (10 years younger) machine, I can measure a difference - this PR makes mapping-only mode about 2% faster. (This comes at the expense of one less bit available for the hash, but this has very little impact.) |
|
Great! Don't we anyway have B top bits available to store other things because of our prefix vector? This depends of course on that the bit is added after the sorted vector has been produced. |
Right, good point! I have the impression the filter bit would better fit in those upper bits anyway. Let me update the PR later. |
Profiling suggested that the
Index::is_filtered()call is a bit slow. It checks whether a randstrobe occurs more often thanfilter_cutoffby accessingrandstrobes[i]andrandstrobes[i + filter_cutoff]and comparing the hashes.The slowness could come from two cache misses because two quite far apart memory locations are read. To get rid of the second access, the idea is to use one bit within
RefRandstrobeto store whether the item is filtered.Somewhat unexpectedly, this does not improve speed. It does reduce cache misses according to
perf stat -d, but this does not translate to a shorter runtime.