-
Notifications
You must be signed in to change notification settings - Fork 148
Open
Description
Due to security vulnerabilities in Intel processors up to the Ice Lake generation, the gather instruction was microcode patched and is now extremely slow. Intel advisory: https://www.intel.com/content/www/us/en/security-center/advisory/intel-sa-00828.html. JVector uses gather instructions in multiple places that are worth looking into:
| float assemble_and_sum_f32_512(const float* data, int dataBase, const unsigned char* baseOffsets, int baseOffsetsOffset, int baseOffsetsLength) { |
| float pq_decoded_cosine_similarity_f32_512(const unsigned char* baseOffsets, int baseOffsetsOffset, int baseOffsetsLength, int clusterCount, const float* partialSums, const float* aMagnitude, float bMagnitude) { |
‣ Ref: other libraries (e.g., NumPy’s x86 simd sort) improved performance by replacing gather with scalar loads: numpy/x86-simd-sort#65
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels