Proceedings of the 16th IEEE International Conference on Data Engineering, San Diego, CA, Feb.29 - March 3, 2000 ------------------------------------------------------------------------- Developing Cost Models with Qualitative Variables for Dynamic Multidatabase Environments Qiang Zhu Yu Sun S. Motheramgari Department of Computer and Information Science The University of Michigan -Dearborn Dearborn, MI 48128, U.S.A. {qzhu,yusun,motheram}@umich.edu Abstract A major challenge for global query optimization in a multidatabase system (MDBS) is lack of local cost information at the global level due to local autonomy. A number of methods to derive local cost models have been suggested in the literature recently. However, these methods are only suitable for a static multidatabase environment. In this paper, we propose a new multi-states query sampling method to develop local cost models for a dynamic environment. The system contention level at a dynamic local site is divided into a number of discrete contention states based on the costs of a probing query. To determine an appropriate set of contention states for a dynamic environment, two algorithms based on iterative uniform partition and data clustering, respectively, are introduced. A qualitative variable is used to indicate the contention states for the dynamic environment. The techniques from our previous (static) query sampling method, including query sampling, automatic variable selection, regression analysis, and model validation, are extended so as to develop a cost model incorporating the qualitative variable for a dynamic environment. Experimental results demonstrate that this new multi-states query sampling method is quite promising in developing useful cost models for a dynamic multidatabase environment. Keywords: multidatabase, global query optimization, cost model, regression analysis, data clustering, dynamic environment