Skip to content

Use AVX512F and AVX512IFMA to speed up integer multiplication #2696

@user202729

Description

@user202729

There was a paper "Accelerating Big Integer Arithmetic Using Intel IFMA Extensions" (see below) that uses AVX512F and AVX512IFMA to multiply small integers. I (asked AI to) benchmark it against flint and get:

  flint_mpn_mul_n_1024      95.94 ns  +/-   0.44 ns
  mul1024_avx512           100.64 ns  +/-   0.82 ns
  mul1024_vpmadd            45.60 ns  +/-   0.43 ns

  flint_mpn_mul_n_2048     294.76 ns  +/-   4.59 ns
  mul2048_avx512           275.32 ns  +/-   3.53 ns
  mul2048_vpmadd           116.30 ns  +/-   1.96 ns

  flint_mpn_mul_n_3072     649.84 ns  +/-  11.00 ns
  mul3072_avx512           595.43 ns  +/-   7.82 ns
  mul3072_vpmadd           192.20 ns  +/-   2.27 ns

  flint_mpn_mul_n_4096     962.09 ns  +/-  14.97 ns
  mul4096_avx512           899.81 ns  +/-  11.22 ns
  mul4096_vpmadd           607.16 ns  +/-  10.44 ns

Looks quite competitive. Note that the suffix is the number of bits, so for example 1024 means 16 limbs input, 32 limbs output.

The paper only provide some fixed sizes, however. More work is needed to adapt it to all sizes.

Note that the repo use GPL v3, but flint is LGPL v3. Might have issue here.

There's a sharp spike at mul4096_vpmadd. No doubt that can be improved.

References:

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions