Study of a new algorithm for solving systems of linear equations modulo p
Team
Person in Charge
Objectives
- Implementing the new algorithm
- Assessing its efficiency
- Comparing it with the state of the art
- Designing and implementing a parallel version of it (MPI or GPU)
- Contributing the implementation to open-source scientific libraries
- Isolating the key ideas of the new algorithm, and adapting them to other contexts
Keywords
High-Perfomance Scientific Computing, Parallelization, Algorithms, Linear Algebra
Context
Amongst all the possible types of equations scientists routinely have
to solve, linear systems are probably the easiest to deal
with. Gaussian elimination is taught, in a basic form, to high school
students. In today's science, the ability to solve linear systems is
critical, because they have countless applications. Many engineering
problems can only be attacked through numerical simulations, which in
turn often require (sometimes large) linear systems to be solved. In
this case, computations are performed using floating-point numbers,
and thus with a limited accuracy. The need of scientists and engineers
for solutions of large linear systems led to the development of
several high-quality open-source libraries for numerical linear
algebra: ATLAS, GotoBLAS, LAPACK, UMFPACK,
MUMPS, LinPack, the GNU Scientific Library,
GPLU, SuperLU, SciPy, etc. These libraries are widely used, mostly through commercial software such as MatLab, Mathematica, and Maple, or via open-source alternatives such as SAGE, SciLab and
Octave. In any case, efficiency is crucial, because users always want to solve larger and larger systems.
Solving linear systems exactly is somewhat different, and often
computationally more difficult. It usually has completely different
applicative domains such as, for instance, data security (cryptology
and error-correcting codes). In this setting, the coefficients of
linear systems must lie in a domain over which it is possible to
compute exactly, without accuracy loss. Usual such domains include the
integers, the rational numbers, integers modulo a prime number, etc.
Software tools are less readily available in the exact case than in
the numerical setting, but a few open-source libraries can be used, amongst which
FFLAS-FFPACK, LinBox, LELA, M4RI(E), NTL, PARI and FLINT.
Performance also matters in the exact case. To give only one example,
we focus on large integers factoring (i.e. the decomposition
of a number into smaller non-trivial divisors, which when multiplied
together equal the original integer). The most advanced factoring
algorithms produce extremely large systems of linear equations that
must be solved modulo 2 in order to recover the prime factors. The
largest (public) factorization effort has been accomplished in 2009: a
232-digit long number has been factored in more than 2000-CPU
years [5]. Along the way, a system of more than 200 millions of linear
equations in as many unknown had to be solved. Just storing the
equations requires more than 100Gb. An evil-minded individual could
forge fake ``VIGIK'' tags used by postmen to open the entrance
hall of most buildings in France, if only (s)he could factor numbers
of this size. At the time of this writing only one open-source
package, CADO-NFS, is capable of dealing with such large
instances.
One could thus believe that the problem of solving linear systems of equations would be the most well-studied problem in computer algebra ; that all the algorithms and techniques one could possibly think of would be known, and already implemented in off-the-shelf scientific software.
And yet, last summer, Prasad Raghavendra, an american researcher, described a new and amazingly simple probabilistic algorithm to solve linear systems modulo p [1]. In its present form, Raghavendra's algorithm probably cannot improve on classical techniques [2] in the most generic case, because its complexity should be comparable to that of Gaussian elimination. However, it possesses an interesting feature: it can automatically take advantage of the sparsity of the linear system (a system of equation is sparse if the equations only contain a small number of terms). A lot of interesting application produce sparse systems, and our example application (integer factoring) is no exception: each equation of the large system mentioned above only contains an average of 150 terms out of 200 millions! In the sparse case, Raghavendra's algorithm should (asymptotically) compete with the best known exact sparse techniques [3] (although it should require more memory).
Work To Do
The first objective of the internship consists in implementing Raghavendra's algorithm in any reasonable programming langage. This implementation will be the basis of all subsequent work. It will be used to assess the efficiency of the algorithm, and to compare it with the classical techniques in both the dense case (Gaussian elimination) and the sparse case (iterative methods). The implementation will then be used to investigate ways to extend the algorithm. For instance, the student could try to:
- Parallelize the algorithm. This does not look obvious, as the algorithm in its present form is inherently sequential.
- Isolate the key idea from the algorithm, and build a whole linear algebra library on top of it. Some algorithms should be easy to design (sum of two linear subspaces), while some others should be not-so-obvious (intersection of two linear subspaces), and a few ones seems to require a non-trivial research effort (dimension of a subspace).
- Adapt the algorithm to the Learning Parity With Noise and Learning With Error problems. These two NP-complete problems are used to build secure cryptographic schemes (so that faster algorithms to solve them could break crytpographic stuff). These two problems consist in solving systems of linear equations modulo p, but with "perturbated" right-hand sides.
This internship is susceptible to take several directions according to
the profile of the student. If (s)he enjoys programming and succeeds
in producing a high-enough quality implementation, then it should be
possible to contribute some code to existing open-source softwares
such as SAGE [4], to which the team has already contributed. If the student likes more theoretical problems, then it is possible to work on extending the algorithm. It is possible to work on either more applied or more theoretical aspects of the question, or (ideally) both. Indeed, parallelizing the algorithm
raises both practical and theoretical issues.
Finally, this internship will allow the student to gently get acquainted with the world of symbolic and exact computation. It is also an exciting opportunity to work on an actual computer science, which is simultaneously ``hot'' and fundamental, simultaneously simple and deep.
The student will be compensated by the computer Algebra team.
Follow-up
This research internship may lead to a PhD thesis within the Computer Algebra team.
References
[1] Prasad Raghavendra. A randomized algorithm for linear equations over prime fields. unpub- lished note, August 2012. http://www.eecs.berkeley.edu/~prasad/linsystems.pdf.
[2] Nov. 2008:Jean-Guillaume Dumas, Pascal Giorgi and Clément Pernet. Dense Linear Algebra over Finite Fields. ACM Transactions on Mathematical Software. Volume 35, n° 3, 34 pp. article
[3] Douglas H. Wiedemann. Solving sparse linear equations over finite fields. IEEE Transactions on Information Theory, 32(1):54–62, 1986.

