Skip to Main content Skip to Navigation

Querying probabilistic XML

Abstract : Probabilistic XML is a probabilistic model for uncertain tree-structured data, with applications to data integration, information extraction, or uncertain version control. We explore in this dissertation efficient algorithms for evaluating tree-pattern queries with joins over probabilistic XML or, more specifically, for approximating the probability of each item of a query result. The approach relies on, first, extracting the query lineage over the probabilistic XML document, and, second, looking for an optimal strategy to approximate the probability of the propositional lineage formula. ProApproX is the probabilistic query manager for probabilistic XML presented in this thesis. The system allows users to query uncertain tree-structured data in the form of probabilistic XML documents. It integrates a query engine that searches for an optimal strategy to evaluate the probability of the query lineage. ProApproX relies on a query-optimizer--like approach: exploring different evaluation plans for different parts of the formula and predicting the cost of each plan, using a cost model for the various evaluation algorithms. We demonstrate the efficiency of this approach on datasets used in a number of most popular previous probabilistic XML querying works, as well as on synthetic data. An early version of the system was demonstrated at the ACM SIGMOD 2011 conference. First steps towards the new query solution were discussed in an EDBT/ICDT PhD Workshop paper (2011). A fully redesigned version that implements the techniques and studies shared in the present thesis, is published as a demonstration at CIKM 2012. Our contributions are also part of an IEEE ICDE
Document type :
Complete list of metadata
Contributor : ABES STAR :  Contact
Submitted on : Tuesday, October 28, 2014 - 5:41:46 PM
Last modification on : Friday, July 31, 2020 - 10:44:07 AM
Long-term archiving on: : Friday, April 14, 2017 - 5:01:10 PM


Version validated by the jury (STAR)


  • HAL Id : tel-01078361, version 1



Asma Souihli. Querying probabilistic XML. Other [cs.OH]. Télécom ParisTech, 2012. English. ⟨NNT : 2012ENST0046⟩. ⟨tel-01078361⟩



Record views


Files downloads