Please use this identifier to cite or link to this item: http://hdl.handle.net/1959.3/192988
|Download PDF (Accepted manuscript) (Adobe Acrobat PDF, 466 KB)|
- A hybrid algorithm for finding top-k twig answers in probabilistic XML
- Ning, Bo; Liu, Chengfei
- Uncertainty is inherently ubiquitous in data of real applications, and those uncertain data can be naturally represented by the XML. Matching twig pattern against XML data is a core problem, and on the background of probabilistic XML, each twig answer has a probabilistic value because of the uncertainty of data. The twig answers that have small probabilistic values are useless to the users, and the users only want to get the answers with the largest k probabilistic values. In this paper, we address the problem of finding twig answers with top-k probabilistic values against probabilistic XML documents directly. To cope with this problem, we propose a hybrid algorithm which takes both the probability value constraint and structural relationship constraint into account. The main idea of the algorithm is that the element with larger path probability value will more likely contribute to the twig answers with larger twig probability values, and at the same time lots of useless answers that do not satisfy the structural constraint can be filtered. Therefore the proposed algorithm can avoid lots of intermediate results, and find the top-k answers quickly. Experiments have been conducted to study the performance of the algorithm.
- Publication type
- Conference paper
- Research centre
- Swinburne University of Technology
- Lecture notes in computer science: Proceedings of the 16th International Conference on Database Systems for Advanced Applications (DASFAA 2011), Hong Kong, China, 22-25 April 2011 / Jeffrey Xu Yu, Myoung Ho Kim and Rainer Unland (eds.), Vol. 6587, pp. 528-542
- Publication year
- FOR Code(s)
- 0804 Data Format
- Probability value constraint; Structural relationship constraint; Twig answers; Twig queries; Hybrid algorithms; XML; XML documents
- 0302-9743 (series ISSN)
- 9783642201486, 3642201482
- Publisher URL
- Copyright © Springer-Verlag Berlin Heidelberg 2011. The accepted manuscript is reproduced in accordance with the copyright policy of the publisher. The definitive version of the publication is available at www.springer.com.
- Research Projects
- Additional information
- The authors acknowledge support from the Fundamental Research Funds for the Central Universities of China (Grant No. 2009QN030).
- Full text
- Peer reviewed