A fundamental problem related to RDF query processing is selectivity estimation, which is crucial to query optimization for determining a join order of RDF triple patterns. In this paper we focus research on selectivity estimation for SPARQL graph patterns. The previous work takes the join uniformity assumption when estimating the joined triple patterns. This assumption would lead to highly inaccurate estimations in the cases where properties in SPARQL graph patterns are correlated. We take into account the dependencies among properties in SPARQL graph patterns and propose a more accurate estimation model. Since star and chain query patterns are common in SPARQL graph patterns, we first focus on these two basic patterns and propose to use Bayesian network and chain histogram respectively for estimating the selectivity of them. Then, for estimating the selectivity of an arbitrary SPARQL graph pattern, we design algorithms for maximally using the precomputed statistics of the star paths and chain paths. The experiments show that our method outperforms existing approaches in accuracy.
Proceedings of the 20th ACM Conference on Information and Knowledge Management (CIKM'11), Glasgow, Scotland, 24-28 October 2011 / Bettina Berendt, Arjen de Vries, Wenfei Fan, Craig Macdonald, Iadh Ounis and Ian Ruthven (eds.),