METHODS IN LARGE SCALE MULTIPLE TESTING: MIXTURE NULL, SMALL SAMPLE REPLICATES, AND POWER BOOSTING

Author/Creator ORCID

Date

2021-01-01

Department

Mathematics and Statistics

Program

Statistics

Citation of Original Publication

Rights

This item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu
Distribution Rights granted to UMBC by the author.
Access limited to the UMBC community. Item may possibly be obtained via Interlibrary Loan thorugh a local library, pending author/copyright holder's permission.

Subjects

Abstract

In this dissertations, we study some methods in multiple testing. In the first topic, we consider the setting of gene expression experiments that use logfold change statistics where the null distribution is assumed to be a mixture of two normal distributions. An important issue in this setting is choosing the optimal interval of statistic values with which to estimate the null distribution. A modified cumulative sum changepoint detection criterion is constructed for this purpose and incorporated in three different methods for estimating local false discovery rate. In simulation studies, it is shown that two of those three methods successfully control false discovery rate (FDR). Both methods that controlled FDR produced better power than a baseline method. In the second topic, the problem of small sample replicates in logfold change-based experiments is addressed. A 2-stage method was constructed that addressed the magnitude of the signal and the variability of the signal separately. It is shown that the method controls false discovery rate, and that it performs competitively compared to a baseline method when there is considerable variability in the weighted counts of replicates coming from the alternative distribution. In the third topic, a new decision rule is proposed under some structural assumptions. When it can be assumed that the p-values of true nulls are uncorrelated, it is shown that this decision rule controls family-wise error rate (FWER) in the weak sense. Furthermore, under some conditions, simulation studies are presented to show that it controls false discovery rate in the strong sense. Most importantly, it is demonstrated using genome-wide association studies data how this method can be used as an ``add-on'' to existing FDR controlling methods in order to ``boost'' overall power.