Traditionally, searching XML data with a structured query often identifies the exact matches for the query and returns them as the qualified results. However, the structural heterogeneity of the large number of XML data sources will make it difficult to answer the structured queries exactly. As such, query relaxation is necessary when the exact results do not exist or the number of the exact results is not enough. Previous work on XML query relaxation poses the problem of unnecessary computation of a big number of unqualified relaxed queries. This thesis addresses several fundamental issues in ranking a set of heterogeneous XML data sources, designing the adaptive relaxation rules and efficiently processing the relaxed queries over the data sources. Relaxing the specified queries may loose the constraints of the users' original preferences, which can increase the number of relevant results significantly. Therefore, in this work the users' queries would be answered with a ranked list of the best matched results, e.g., top-k problem means that only the top k relevant results are interesting to users. In addition, at each time we prefer to evaluate the relaxed queries over the data source that is the most relevant to the specified queries, and incrementally output the retrieved results that are guaranteed to have the higher relevance to the queries than the candidates in the other data sources, thus minimizing query processing time. However, query relaxation would become time-consuming and ineffective in some cases, such as the data structure is too complex for users to write structured queries, or the users do not know structured query languages. To address this problem, we allow users to issue keyword queries. Different from previous keyword search methods, we first construct structured query templates based on the given keyword query and the underlying source schemas and then evaluate the query templates to answer the original keyword query based on our proposed ranking model.
Copyright © 2009 Jianxin Li.
A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy, Swinburne University of Technology, 2009.