TitleClassification and multiple testing for microarray data
NameCherkas, Yauheniya (author), Cabrera, Javier (chair), Strawderman, William (internal member), Tyler, David (internal member), Amaratunga, Dhammika (outside member), Rutgers University, Graduate School - New Brunswick,
SubjectStatistics and Biostatistics,
Statistical hypothesis testing,
DescriptionThis thesis aims to provide a solution to the classification and hypothesis testing problems as well as to create a tool to perform clustering, hypothesis testing or classification tasks automatically via simple menu-driven interface. Since the first appearance of microarrays in 1995, they became a technique for large gene expression screening worldwide. The quantity of data generated from microarray experiments is enormous, requiring new careful methods of analysis of these high-dimensional data. One of the problems encountered when dealing with this type of data is overfitting. Overfitting happens when information selected is related to the condition of interest only by chance. This thesis consists of four major parts. The first part contains the overview of microarray methodology and current techniques applied to analyze gene expression data. The second part uses partial least squares themed idea to develop the algorithm where one can control the FDR (false discovery rate) to extract differentially expressed genes in the analysis of gene expression data. The above procedure can be either used separately or as a part of the scheme where it provides weights that can be used together with another selection method or as a part of ensemble. The third part of the thesis deals with the problem of comparing several treatments to the control. In the setting where one wants to find a ‘bump’ in measurements of several groups, the test statistic is considered that is based on maximum and minimum of the group mean differences. Then the derived distribution of a proposed test statistic can be used to make inferences. The fourth part describes the software developed to provide a menu-driven computing environment for data manipulation and analysis. It includes different methods that can be used to compare expression profiles of genes and methods for gene clustering and various visualization and exploration.
NoteIncludes bibliographical references
Noteby Yauheniya Cherkas
CollectionGraduate School - New Brunswick Electronic Theses and Dissertations
Organization NameRutgers, The State University of New Jersey
RightsThe author owns the copyright to this work.