Mathieu Giraud (www), équipe Bioinfo - Sequoia
collaboration of Philippe Marquet (www), équipe West - Dart
LIFL / INRIA Futurs
giraud[AT]lifl[DOT]fr, marquet[AT]lifl[DOT]fr
Subjects : computer architecture and bioinformatics
Prerequisites : knowledge in digital circuit design
In bioinformatics, pattern matching help to compare known models with new sequences in large databanks, like in a collection of genomes. The easiest pattern matching problem is to look for exact occurences on a word (ATTACTCT). One can look for words with substitution or indels errors, for automata, weighted patterns, variable patterns (V-ATTCCT-V with |V| = 5). Generally speaking, a pattern is defined by a language giving all words that should be recognized [2].
Pattern matching can be done with a systolic architecture. A systolic architecture is a network of similar cells, in which data is flowing in a synchronous way [3]. Some programmable systolic arrays have been proposed, in which the computation in each cell and the data transmission could be statically or dynamically reconfigured [1]. Those studies have not been applied to bioinformatics problems.
The goal of the internship is to take some ideas from programmable systolic architectures to contribute to a genome on chip architecture. Today, several thousands of elementary cells can be simulated on a FPGA processor. Ideally, a complete genome is stored on the chip, and a systolic computation execute pattern matchings in a few clock cycles. Practically, to explore genomes with billions of bases, an external memory should be used.
Depending on the candidate's profile, the internship will
Some perspectives include a real implementation on prototype boards in collaborations with the Symbiose team (IRISA, Rennes).