TitleBoolean matrix decomposition and extension with applications
NameLu, Haibing (author), Atluri, Vijayalakshmi (chair), Vaidya, Jaideep (co-chair), Adam, Nabil (internal member), Xiong, Hui (internal member), Samarati, Pierangela (outside member), Rutgers University, Graduate School - Newark,
DescriptionBoolean matrix decomposition (BMD) refers to decomposing of an input Boolean matrix into a product of two Boolean matrices, where the first matrix represents a set of meaningful concepts, and the second describes how the observed data can be expressed as combinations of those concepts. As opposed to standard matrix factorization, BMD focuses on Boolean data and employs Boolean matrix product instead of standard matrix product. The key advantage of BMD is that BMD solutions provide much more interpretability, which enable BMD to have wide applications in multiple domains, including role mining, text mining, discrete pattern mining, and many others. There are three main challenges in the research of BMD. First, real applications carry varying expectations and constraints on BMD solutions, which make the task of searching for a good BMD solution nontrivial. Second, BMD by itself has the issue of insufficiency in modeling some real data semantics, as only the set union operation is employed in combination. Third, BMD variants are generally NP-hard in nature, which makes practitioners reluctant to apply the BMD model to large scale data analysis. All of the three challenges are addressed in this dissertation. First, a unified framework, which is based on integer linear programming, is presented to encompass all BMD variants. Such a framework allows us to directly adopt fruitful research results in the optimization field to solve our own problems. It also provides researchers across different domains with a new perspective to view their problems and enables them to share their research results. Second, a novel extended Boolean matrix decomposition (EBMD) model is proposed. It allows describing an observed record as an inclusion of some concepts with an exclusion of some other concepts. Thus EBMD is effective to meet the needs of modeling some complex data semantics. Third, rank-one BMD is studied. Rank-one BMD is to decompose a Boolean matrix into the product of two Boolean vectors, which can be interpreted as a dominant pattern vector and a presence vector. By recursively applying rank-one BMD, a Boolean matrix is partitioned into clusters and discrete patterns of the observed data are thus discovered. Rank-one BMD can serve many functions of regular BMD, while rank-one BMD is relatively easy to solve compared to regular BMD. In addition, efficient 2-approximation algorithms are found for some special cases of rank-one BMD.
NoteIncludes bibliographical references
Noteby Haibing Lu
CollectionGraduate School - Newark Electronic Theses and Dissertations
Organization NameRutgers, The State University of New Jersey
RightsThe author owns the copyright to this work.