Skip to content

Assess performance bottlenecks of the gather instruction in JVector #632

@r-devulap

Description

@r-devulap

Due to security vulnerabilities in Intel processors up to the Ice Lake generation, the gather instruction was microcode patched and is now extremely slow. Intel advisory: https://www.intel.com/content/www/us/en/security-center/advisory/intel-sa-00828.html. JVector uses gather instructions in multiple places that are worth looking into:

float assemble_and_sum_f32_512(const float* data, int dataBase, const unsigned char* baseOffsets, int baseOffsetsOffset, int baseOffsetsLength) {

float pq_decoded_cosine_similarity_f32_512(const unsigned char* baseOffsets, int baseOffsetsOffset, int baseOffsetsLength, int clusterCount, const float* partialSums, const float* aMagnitude, float bMagnitude) {

‣ Ref: other libraries (e.g., NumPy’s x86 simd sort) improved performance by replacing gather with scalar loads: numpy/x86-simd-sort#65

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions