Skip to content

Idea for slightly faster fft_small #2687

@user202729

Description

@user202729

Recently I was looking at the fft_small code, and notice that in nmod_poly_mul, at most 4 primes are needed.

This is because the modulus is M < 2^64, so for sequences of length, say, n > 2^22, then n ⋅ M^2 may exceed 2^150. Each CRT prime gives you 50 bits of precision, so 3 primes may be insufficient in this case.

One idea to do slightly better is the following. If it works, it may be 25% faster than the current algorithm in the largest cases where 4 primes are used.

Suppose we want to compute the convolution of (a_0, ..., a_{n-1}) and (b_0, ..., b_{m-1}) modulo M. For each i, let a'_i be randomly either a_i or a_i - M, the probability is such that the expected value of a'_i is zero. Similar for b'_i. Compute the convolution of a' and b' using CRT as before. Then, each resulting coefficient c'_k is the sum of at most n independent random variables, each has expected value 0 and constant variance, so you can apply Chernoff bound (need to work through the detail) to show that ℙ[|c'_k| ≥ C ⋅ √n ⋅ M^2] ∼ exp(-C ⋅ constant). Pick large enough C so that the failure probability is vanishingly small, then take the union bound over all k.

We expect to do better in this case if C ≪ √n. Would be nice if there is a similar deterministic algorithm though.

With a similar idea, one can gain one bit in the CRT product. Note that [-M/2, M/2] × [-M/2, M/2] = [-M^2/4, M^2/4] which has width M^2/2, while [0, M) × [0, M) = [0, (M-1)^2] which has width (M-1)^2.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions