Although the code and paper suggest that 64-bit hashes are being used, the Java Object.hashCode() function only returns 32 bits. The good news is that the bug in #19 has no effect since the upper 16-bits are always 0 (or perhaps all 1s, depending on sign extension effects).
The bad news is that because bits 32-47 are either all zero (or perhaps evenly divided between all zero & all one), I suspect all (or at least half) of the documents will end up being clustered together, making for a very expensive O(n^2) comparison.
You can probably ignore PR #20 for now. It'll get subsumed into the larger rework necessary.
Although the code and paper suggest that 64-bit hashes are being used, the Java Object.hashCode() function only returns 32 bits. The good news is that the bug in #19 has no effect since the upper 16-bits are always 0 (or perhaps all 1s, depending on sign extension effects).
The bad news is that because bits 32-47 are either all zero (or perhaps evenly divided between all zero & all one), I suspect all (or at least half) of the documents will end up being clustered together, making for a very expensive O(n^2) comparison.
You can probably ignore PR #20 for now. It'll get subsumed into the larger rework necessary.