Subject of Master Thesis - University of Lille 1 - LIFL Master 2


Equipe: Mostrare – INRIA Lille Nord Europe -  http://mostrare.lille.inria.fr/


Contacts: Angela Bonifati, Anne-Cecile Caron, Sophie Tison


Keywords: RDF databases, Semantic Web, Dynamic Programming, Algorithms


EFFICIENT IN-MEMORY EVALUATION OF GRAPH-BASED QUERIES 

The advent of new data formats in the Web 2.0, the so called Semantic Web, is changing the way people formulate their queries. In particular, whereas tree data formats, such as XML have been fully understood and the efficient evaluation of queries for such formats is indeed a reality, this is not the case for the RDF[3] format. An RDF triple is composed of <subject, predicate, object> and can be encoded as a connected component of a graph, in which the subject and the object are graph nodes and the predicate is an oriented edge. The RDF graph needs to be inspected to answer queries in a RDF-based language, that is SparQL. Answering SPARQL queries can be thought as finding subgraphs in the RDF data that match a given graph pattern. The following example allows to find the names of strikers for FC Barcelona.

SELECT ?name 

WHERE { ?player type footballer  .

               ?player name  ?name     . 

               ?player position striker   . 

               ?player playsFor FC_Barcelona . }


Single-node processing of SparQL queries has brought to the development of efficient systems for the management of RDF data, such as Jena[4], Sesame, 3store, and RDF-3X [2]. However, all such systems rely on an efficient storage for RDF triples and exploit streamlined architectures to avoid in-memory evaluation of queries. Indeed, the latter may turn to be inefficient and may not be suitable for the efficient management of RDF data. 

On the other hand, efficient in-memory evaluation of tree-structured queries, such as XML queries, has been thoroughly studied in the past [1]. The proposed algorithms improve the basic algorithm that uses dynamic programming for XPath evaluation, and show two important optimizations, bottom-up and top-down, respectively.  Such optimizations have also be proved to be memory-efficient.

The goal of this Master thesis is doublefold: (1) designing an algorithm based on dynamic programming for the in-memory evaluation of RDF graphs; (2) studying possible optimizations, inspired by previous work on XPath optimizations. An implementation of the above algorithms is also foreseen. 

Remark: We foresee a follow-up as a PhD thesis on the same subject.

Bibliography

[1] Georg Gottlob, Christoph Koch, Reinhard Pichler: Efficient Algorithms for Processing XPath Queries. VLDB 2002: 95-106

[2]  Thomas Neumann Gerhard Weikum: RDF-3X: a RISC-style engine for RDF. PVLDB I(1): 647-659 (2008)

[3]   http://www.w3.org/RDF/

[4]   http://jena.sourceforge.net/