Skip to content

Simpqs-2.0 full#48

Open
hvds wants to merge 24 commits intodanaj:masterfrom
hvds:simpqs-full
Open

Simpqs-2.0 full#48
hvds wants to merge 24 commits intodanaj:masterfrom
hvds:simpqs-full

Conversation

@hvds
Copy link
Copy Markdown

@hvds hvds commented Apr 4, 2024

TLDR: quadratic sieve gets 50% speedup for large and medium inputs, some (smaller) speedup for small inputs; memory use 4x smaller for large inputs, roughly unchanged for medium and small inputs.

Incorporate block Lanczos, partial relations and sieving speedups from simpqs-2.0. There are some additional minor fixes/optimizations along the way, and comments marked '(hv)' where I think there may be additional bugs or infelicities that I was not able to resolve. Results from timing tests below show user CPU time and maxresident from /usr/bin/time on an unloaded machine.

Build A: MPUG master @2389dcbc44, fixed to build standalone
- fails to find factors for ds=88 (asterisked) due to bug fixed in 955bb25
Build B: MPUG with simpqs-2.0, without last commit (midprime sieving)
Build C: MPUG with simpqs-2.0, with last commit

CPU   46    61     70      80      88
A:  0.12  4.03  68.26  542.25 3405.12*
B:  0.09  2.73  37.44  289.59 1730.13
C:  0.09  2.33  30.70  248.16 1482.69

RAM(k) 46     61     70      80      88
A:   5904  17672  37596  587976 1324160*
B:   5600  19720  38036  183036  333624
C:   5672  19684  38140  182900  333560

Test cases:
ds=46: 7520587276150280509023659781989524950620229823
= 342594914346335019487 * 21951835713905522424868129
ds=61: 1000000000000000000000000000045999999999999999999999999999373
= 999999999999999999999999999989 * 1000000000000000000000000000057
ds=70: 9999999999999999999999999999890000000120999999999999999999999999998669
= 999999999999999999999999999989 * 10000000000000000000000000000000000000121
ds=80: 99999999999999999999999999999999999941006299999999999999999999999999999999996283
= 99999999999999999999999999999999999941 * 1000000000000000000000000000000000000000063
ds=88: 9180435894215652020112626451519741807924491872283194331211065824189266475562769247511369
= 1059405511716795159376523757718030895723651 * 8665648604506982829490012915377592720602462019

hvds added 17 commits March 25, 2024 14:52
Add a balancing _destroy_factor() to free memory grabbed by _init_factor().
To be reversed after edits are done.
_GMP_init/destroy; use MPUG prime iterators; use MPUG verbosity; remove
REPORT (since it is handled by verbosity).

The standalone program now takes the number to factor and optional
verbosity as command-line arguments.
QS factorization failure has been observed due to primecount[]
overflowing a 16-bit unsigned short.

We end up setting primecount[] values to a count of relations, of which
there are (of the order of) the number of primes in the factorbase,
so it should be of the same type as primesNo[].
Contents will be mpz_init()ed
Possible bugs or opportunities for improvement.
1 complaint for 7520587276150280509023659781989524950620229823
from l = 1936
7 complaints for 1000000000000000000000000000045999999999999999999999999999373
from l in (5937..5943)
mpz_t q, r are used only to hold M /% CACHEBLOCKSIZE, with M <= 192000.
Replace them with unsigned long Mq, Mr.
This wasn't important before, since it is used as a cut-off for partials.
Matrix stored in la_col_t, full and partial relations stored in rel_t,
collected in dynamic array arel_t.

Block Lanczos code imported almost unchanged from simpqs-2.0, just minor
changes to fix memory leaks; relations structure added by hv to replace
temp files used in simpqs.
Another change incorporated from simpqs-2.0.
Whitespace and minor comment fixups only.
@hvds
Copy link
Copy Markdown
Author

hvds commented Apr 6, 2024

I note that this occasionally shows:

lanczos error: not all columns used
linear algebra failed; retrying...

This does not seem to cause a problem, and it seems one or two retries is always enough to succeed; however the noise to stdout is a bit irritating, and should probably hide behind a verbosity level. I'll try to get some stats on how often this happens, and whether it depends on the size of the matrix.

(For a factor base of size numPrimes, simpqs looks for numPrimes + 64 relations whereas we require only numPrimes relations. It is conceivable that having some extra relations acts as a buffer in some way that avoids this error; I'll have to experiment with that too.)

It appears to be normal for the algorithm to fail occasionally: about
4% of the time in tests (and more often on a given matrix after it has
failed once).

Allow for this by a) retrying and b) reporting only if verbosity > 3.
Introduce a hard limit of 100 retries, just in case.
@hvds
Copy link
Copy Markdown
Author

hvds commented Apr 8, 2024

It appears that "not all columns used" is relatively common, and "Failed, no rows found (mask == 0)" is rarer but also possible (and should not be fatal). I've added a commit to cope with those more cleanly. I assume the remaining check "lanczos error: dependencies don't work" would represent an actual bug, and have left that as a fatal error.

I ran my code using this for a while with extra instrumentation. There were 1700 numbers that reached _GMP_simpqs, of size 41 to 50 digits. There were 70 that failed at least once (about 4%); number of occasions that failed n times was (56, 12, 1, 0, 0, 0, 1), implying that chances of subsequent failures on a given matrix are higher; to cope with that I've also added a hard limit of 100 retries - that could be made run-time configurable if desired.

Since the evaluation time is typically about 15-20% of the total (slightly larger for the smallest inputs), this implies runtimes about 1% longer on average than with no failures.

There was no obvious correlation with size of target, and seeking numPrimes + 64 relations was no different from seeking numPrimes relations. Number of targets for a given number of digits in the test run, from 41 digits, was (1, 1, 2, 45, 241, 733, 586, 60, 29, 2).

@hvds
Copy link
Copy Markdown
Author

hvds commented Apr 9, 2024

Sigh, I've now found a case where - for a given state of the (QS, not block_lanczos) rng - it fails even after 10000 iterations. Analyzing the input matrix, it appears to have an unluckily high number of linearly-dependent entries: it needs numPrimes + 130 relations before it gets enough linear independence to succeed. So I'm now considering making just 1 or 2 attempts (instead of 100), then finding more relations before trying again.

For the record, the failure occurs when the QS rng starts with randval = 1935087715 when trying to factorize n = 52125806551349591 * 3654328997351298614303.

(I am tempted to reset the rngs for each new factorization to aid reproducibility - I was lucky that this failure occurred just 30 minutes into a run - but I think it is unwise. I'll probably instead combine the two simplistic rngs into one, and capture the initial seed to output in the case of eventual failure instead.)

hvds added 5 commits April 15, 2024 13:56
Rename to simple_random(), save the initial seed for diagnostics, and
when standalone allow the seed to be set with '-r'.

Running with "./qs -r1935087715 190484846390921103486305577532825800073"
demonstrates a seed-specific failure.
We don't care about the sign, and this ensures we also remove duplicates
that differ only by sign. This is probably a bugfix, since enough of such
duplicates can cause the algorithm to fail.

This is enough to fix the example failure mentioned in 3c6cc9311c; it is
not yet known if it fixes all such failures.
We expect maybe 10% of the primes to have a count of zero, so it's worth
checking for.
Generate relations combined from partials directly into existing list
rather than keeping them separate. This also ensures we check for
duplicates between the natural and combined relations.
@hvds
Copy link
Copy Markdown
Author

hvds commented Apr 15, 2024

Ah, now fixed: the source value (X) for relations was stored signed, and the failure case had an unusually high number of (X, -X) duplicates (119 in total) leaving the matrix underspecified. Now changed to store abs(X), which has fixed the problem.

I note that this makes the two largest testcases about 10% slower. I assume they also had a bunch of such duplicates - not enough to leave the matrix underspecified, but enough that removing them causes significant extra work to make up the shortfall. In the future it may be worth experimenting to see if we can systematically demand fewer relations without risking underspecification.

I'll continue running with this (updated) code to keep an eye out for any other problems.

@hvds
Copy link
Copy Markdown
Author

hvds commented Apr 16, 2024

FWIW, here's an updated version of the tables above, adding build D (MPUG with latest commits up to c7c5a4e) and for comparison build E (latest msieve run with -q -t 1 to limit to a single thread). This shows there is still lots of opportunity for further improvement in both speed and memory use.

CPU   46    61     70      80      88
A:  0.12  4.03  68.26  542.25 3405.12*
B:  0.09  2.73  37.44  289.59 1730.13
C:  0.09  2.33  30.70  248.16 1482.69
D:  0.09  2.31  30.58  281.93 1631.79
E:  0.11  2.19  25.89  191.85  764.24

RAM    46     61     70      80      88
A:   5904  17672  37596  587976 1324160*
B:   5600  19720  38036  183036  333624
C:   5672  19684  38140  182900  333560
D:   5692  19832  37988  183260  336012
E:  20516  22008  27600   48924   76348

The former is a BSDism not universally available (eg on Cygwin), and
we already use the latter elsewhere.
@hvds
Copy link
Copy Markdown
Author

hvds commented Dec 10, 2025

I've been using this implementation without problems for the last 18 months, so I have high confidence in it. If it is of interest to anybody I can update it for latest master: it needs minor adaptations to rebase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant