TitleUnified structure and content search for personal information management systems
NameWang, Wei (author), Marian, Amélie (chair), Nguyen, Thu D (internal member), Martin, Richard (internal member), Srivastava, Divesh (outside member), Rutgers University, Graduate School - New Brunswick,
Personal information management,
Electronic information resource searching,
Querying (Computer science)
DescriptionThe ability to quickly retrieve files in personal information systems is becoming increasingly important as users store and collect ever larger amounts of personal data. This explosion of information is driving a critical need for complex search tools to access often very heterogeneous data in a simple and efficient manner. Numerous search tools have been developed to locate personal information stored in file systems. They often allow for some ranking on the textual part of the query, but only consider structure (e.g., file directory) and metadata (e.g., date, file type) as filtering conditions. However, simple keyword-only searches are often insufficient. First, files that are very relevant to the keywords, but which do no satisfy the exact filtering conditions would not be considered valid answers. Second, search tools often strictly separate the directory information from content information, and relevant answers that do not adhere to this strict separation would be missed. Last, current tools do not consider relationships between structure and content, possibly across multiple directories and/or files, even if the additional information helps narrow search results and improves search accuracy. Because of the heterogeneity of data in personal information systems, we believe it is critical to support approximate matches on both the structure and content components of queries and to allow for query conditions to be evaluated across file boundaries. In this thesis, we developed a suite of techniques that allow search tools to effectively and efficiently evaluate flexible query conditions against directory structure and structure contained within files. Our work started by considering how to score fuzzy conditions in each query dimension (i.e., structure, content, metadata) and meaningfully combine individual dimension scores. We then proceeded to consider how to unify structure and content, such that users can specify a single query that contains both structure and content components and that can be evaluated at once across file boundaries. Finally, we have designed algorithms and data structures to support efficient processing of the multi-dimensional as well as unified queries. We evaluated our techniques and our results show that the system has the potential to significantly improve ranking accuracy and is practical for everyday usage.
NoteIncludes bibliographical references
Noteby Wei Wang
CollectionGraduate School - New Brunswick Electronic Theses and Dissertations
Organization NameRutgers, The State University of New Jersey
RightsThe author owns the copyright to this work.