Collaborative Research: Support Efficient Similarity Searches for Multidimensional Non-ordered Discrete Data Spaces
Supported by the National Science Foundation
Research Grants
and The University of Michigan
Research Grants (OVPR and
UMD)
Principal Investigator at UM-D
Dr. Qiang Zhu
Department of Computer and Information Science,
The University of Michigan, Dearborn, MI, USA
qzhu@umich.edu
(NSF Grant for UM-D: IIS-0414594)
Principal Investigator at MSU
Dr. Sakti Pramanik
Department of Computer Science and Engineering,
Michigan State University, East Lansing, MI, USA
pramanik@msu.edu
(NSF Grant for MSU: IIS-0414576)
Graduate Students
Changqing Chen
Dashiell Matthews Kolbe
Hyun-Jeong Seok
Alok Watve
Qiang Xue
Undergraduate Students
Chad Klochko
Rachel Radziszewski
High School Student
David Kong
Alumni
Gang Qian
Project Overview
This collaborative research project conducted jointly by the Michigan State University and the University of Michigan at Dearborn is to investigate the issues and techniques for supporting efficient similarity searches in multidimensional Non-ordered Discrete Data Spaces (NDDS). Similarity searches in NDDSs are becoming increasingly important for application areas such as genome sequence databases. Efficient similarity searches require robust indexing techniques. Unfortunately, existing indexing methods are either not suitable for an NDDS (e.g., the R*-tree) or too generic to provide good performance for an NDDS (e.g., metric trees). The main goal of this project is to study the fundamental properties of NDDSs and develop indexing methods exploiting these properties to support efficient similarity searches in NDDSs. We introduce a set of essential geometric concepts for an NDDS based on some similar concepts for a traditional (ordered) Continuous Data Spaces (CDS). Using them, we explore and compare promising data-partitioning (DP) based and space-partitioning (SP) based indexing techniques for NDDSs, including index tree structures, building strategies, search algorithms and performance models. We also study other related issues including supporting various types of queries, adopting different distance measures, indexing hybrid data spaces with mixed ordered and non-ordered dimensions, developing efficient bulk loading techniques, and utilizing effective compression schemes. This research will provide contemporary database indexing techniques to solve relevant issues in the emerging fields such as bioinformatics, biometrics and E-commerce.
Project References
-
Dashiell Kolbe, Qiang Zhu and Sakti Pramanik, "Efficient k-Nearest Neighbor Searching in Non-ordered Discrete Data Spaces", ACM Transactions on Information Systems (TOIS), 2009 (to appear).
-
Hyun-Jeong Seok, Gang Qian, Qiang Zhu, Alexander Oswald and Sakti Pramanik,
"Bulk-Loading the ND-Tree in Non-ordered Discrete Data Spaces", Proc. of 13th International Conference on Database Systems for Advanced Applications (DASFAA'08), pp 156-171, New Delhi, India, March 19 - 22, 2008.
-
Qiang Xue, James Cole and Sakti Pramanik, "Sequence Homology Search Based on Database Indexing Using the Profile Hidden Markov Model", Proc. of IEEE International Conference on Bioinformatics and Bioengineering (BIBE'06), pp 135-140, Washington, DC, Oct. 2006.
Project Web Sites
Web site
at MSU
Web site
at UM-D
Other Related Links
Database Research Group
at UM-D
Number of visitors to this page is:
since 05/01/2005