From Équipe Calcul Formel

Internship: Proposal for a Master's Research Internship

Study of a new algorithm for solving systems of linear equations modulo p

Team

Computer Algebra

Person in Charge

François Boulier (HDR) and Charles Bouillaguet.

Objectives

Keywords

High-Perfomance Scientific Computing, Parallelization, Algorithms, Linear Algebra

Context

Amongst all the possible types of equations scientists routinely have to solve, linear systems are probably the easiest to deal with. Gaussian elimination is taught, in a basic form, to high school students. In today's science, the ability to solve linear systems is critical, because they have countless applications. Many engineering problems can only be attacked through numerical simulations, which in turn often require (sometimes large) linear systems to be solved. In this case, computations are performed using floating-point numbers, and thus with a limited accuracy. The need of scientists and engineers for solutions of large linear systems led to the development of several high-quality open-source libraries for numerical linear algebra: ATLAS, GotoBLAS, LAPACK, UMFPACK, MUMPS, LinPack, the GNU Scientific Library, GPLU, SuperLU, SciPy, etc. These libraries are widely used, mostly through commercial software such as MatLab, Mathematica, and Maple, or via open-source alternatives such as SAGE, SciLab and Octave. In any case, efficiency is crucial, because users always want to solve larger and larger systems.

Solving linear systems exactly is somewhat different, and often computationally more difficult. It usually has completely different applicative domains such as, for instance, data security (cryptology and error-correcting codes). In this setting, the coefficients of linear systems must lie in a domain over which it is possible to compute exactly, without accuracy loss. Usual such domains include the integers, the rational numbers, integers modulo a prime number, etc. Software tools are less readily available in the exact case than in the numerical setting, but a few open-source libraries can be used, amongst which FFLAS-FFPACK, LinBox, LELA, M4RI(E), NTL, PARI and FLINT.

Performance also matters in the exact case. To give only one example, we focus on large integers factoring (i.e. the decomposition of a number into smaller non-trivial divisors, which when multiplied together equal the original integer). The most advanced factoring algorithms produce extremely large systems of linear equations that must be solved modulo 2 in order to recover the prime factors. The largest (public) factorization effort has been accomplished in 2009: a 232-digit long number has been factored in more than 2000-CPU years [5]. Along the way, a system of more than 200 millions of linear equations in as many unknown had to be solved. Just storing the equations requires more than 100Gb. An evil-minded individual could forge fake ``VIGIK'' tags used by postmen to open the entrance hall of most buildings in France, if only (s)he could factor numbers of this size. At the time of this writing only one open-source package, CADO-NFS, is capable of dealing with such large instances.

One could thus believe that the problem of solving linear systems of equations would be the most well-studied problem in computer algebra ; that all the algorithms and techniques one could possibly think of would be known, and already implemented in off-the-shelf scientific software.

And yet, last summer, Prasad Raghavendra, an american researcher, described a new and amazingly simple probabilistic algorithm to solve linear systems modulo p [1]. In its present form, Raghavendra's algorithm probably cannot improve on classical techniques [2] in the most generic case, because its complexity should be comparable to that of Gaussian elimination. However, it possesses an interesting feature: it can automatically take advantage of the sparsity of the linear system (a system of equation is sparse if the equations only contain a small number of terms). A lot of interesting application produce sparse systems, and our example application (integer factoring) is no exception: each equation of the large system mentioned above only contains an average of 150 terms out of 200 millions! In the sparse case, Raghavendra's algorithm should (asymptotically) compete with the best known exact sparse techniques [3] (although it should require more memory).

Work To Do

The first objective of the internship consists in implementing Raghavendra's algorithm in any reasonable programming langage. This implementation will be the basis of all subsequent work. It will be used to assess the efficiency of the algorithm, and to compare it with the classical techniques in both the dense case (Gaussian elimination) and the sparse case (iterative methods). The implementation will then be used to investigate ways to extend the algorithm. For instance, the student could try to:

  1. Parallelize the algorithm. This does not look obvious, as the algorithm in its present form is inherently sequential.
  2. Isolate the key idea from the algorithm, and build a whole linear algebra library on top of it. Some algorithms should be easy to design (sum of two linear subspaces), while some others should be not-so-obvious (intersection of two linear subspaces), and a few ones seems to require a non-trivial research effort (dimension of a subspace).
  3. Adapt the algorithm to the Learning Parity With Noise and Learning With Error problems. These two NP-complete problems are used to build secure cryptographic schemes (so that faster algorithms to solve them could break crytpographic stuff). These two problems consist in solving systems of linear equations modulo p, but with "perturbated" right-hand sides.

This internship is susceptible to take several directions according to the profile of the student. If (s)he enjoys programming and succeeds in producing a high-enough quality implementation, then it should be possible to contribute some code to existing open-source softwares such as SAGE [4], to which the team has already contributed. If the student likes more theoretical problems, then it is possible to work on extending the algorithm. It is possible to work on either more applied or more theoretical aspects of the question, or (ideally) both. Indeed, parallelizing the algorithm raises both practical and theoretical issues.

Finally, this internship will allow the student to gently get acquainted with the world of symbolic and exact computation. It is also an exciting opportunity to work on an actual computer science, which is simultaneously ``hot'' and fundamental, simultaneously simple and deep.

The student will be compensated by the computer Algebra team.

Follow-up

This research internship may lead to a PhD thesis within the Computer Algebra team.

References

[1] Prasad Raghavendra. A randomized algorithm for linear equations over prime fields. unpub- lished note, August 2012. http://www.eecs.berkeley.edu/~prasad/linsystems.pdf.

[2] Nov. 2008:Jean-Guillaume Dumas, Pascal Giorgi and Clément Pernet. Dense Linear Algebra over Finite Fields. ACM Transactions on Mathematical Software. Volume 35, n° 3, 34 pp. article

[3] Douglas H. Wiedemann. Solving sparse linear equations over finite fields. IEEE Transactions on Information Theory, 32(1):54–62, 1986.

[4] http://www.sagemath.org

[5] Factorization of a 768-bit RSA modulus

Retrieved from http://www.lifl.fr/CALFORME/pmwiki/index.php?n=Internship.Stage-2012-2
Page last modified on January 17, 2013, at 05:43 PM