TitleOptimization models and algorithms for sample-preserved classification and clustering
NameFan, Ya-Ju (author), Chaovalitwongse, Wanpracha Art (chair), Albin, Susan (internal member), Elsayed, Elsayed A (internal member), Pham, Hoang (internal member), Boros, Endre (outside member), Rutgers University, Graduate School - New Brunswick,
SubjectIndustrial and Systems Engineering,
DescriptionThis dissertation presents the development of new optimization models and algorithms for sample-preserved classification and clustering. A sample-preserved method keeps some or all of the existing samples when training a rule for classification or clustering, and continues to use them in the testing or predicting phase. Developing a sample-preserved method provides the capability of analyzing time series data due to the largely applied similarity measures on time series. A proposed sample-preserved classification technique, called Support Feature Machine (SFM), finds an optimal combination of features that gives the best classification based on the nearest neighbor rule. It keeps all baseline samples of the selected features in the predicting phase. Variations of SFM models are also presented. In addition, the bilinear program sample-preserved k-median (BPSPKM) clustering algorithm is introduced. While the original k-median problem can be solved by a simple and efficient bilinear program algorithm, it does not have the sample-preserved property, and only works with the 1-norm distance. The sample-preserved k-median (SPKM) clustering method is formulated as an integer programming problem, which is very hard to solve. A bilinear program algorithm is herein proposed in order to obtain local optimal solutions of the SPKM clustering method, as well as a new sequential search algorithm that can solve the SPKM clustering more efficiently. Finally, a novel feature space sample-preserved k-median (FSSPKM) clustering algorithm is proposed, as well as feature selection methods tailor made for such clustering technique. The experimental results show that the original k-median clustering fails to classify time series data due to the lack of the sample-preserved property, and the utilization of time series similarity measures. The sample-preserved medians can avoid having invalid values in some application domains and can be used to represent the samples in the clusters. The BPSPKM clustering algorithm with the Euclidean distance is suggested for clustering attribute (non-time series), univariate time series and multivariate time series data sets. Furthermore, the proposed feature selection methods consider the distances between cluster centers and cluster densities. The results show that the proposed algorithms outperform other feature selection techniques used in the original k-median methods.
NoteIncludes bibliographical references
Noteby Ya-Ju Fan
CollectionGraduate School - New Brunswick Electronic Theses and Dissertations
Organization NameRutgers, The State University of New Jersey
RightsThe author owns the copyright to this work.