A GROUP SEQUENTIAL MULTIPLE TESTING METHOD AND ITS APPLICATION TO GENOMIC DATA

Author/Creator

Author/Creator ORCID

Date

2022-01-01

Department

Mathematics and Statistics

Program

Statistics

Citation of Original Publication

Rights

This item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu
Distribution Rights granted to UMBC by the author.
Access limited to the UMBC community. Item may possibly be obtained via Interlibrary Loan thorugh a local library, pending author/copyright holder's permission.

Abstract

In this dissertations, we consider the simultaneous testing of groups and hypotheses within the groups which occurs in many scientific problems. A group is commonly judged to be significant if at least one hypothesis within the group is significant which is implemented via a global test for complete null hypothesis. However, this null hypothesis for group significance is strict, so all groups tend to be rejected especially when the number of hypotheses within a group is large. To avoid such trivial hypothesis testing results, we introduce the concept of margin to multiple testing problems so that we can adjust different levels of significance of the group. Based on this idea, we propose a group sequential multiple testing method with controlling false discovery rate (FDR) which incorporates the margin for group significance. As real data applications, we apply the proposed method to functional groups of single nucleotide polymorphisms (SNPs). We select significantly associated pairs of the summary statistics from genome-wide association study (GWAS) and linkage disequilibrium (LD) score. We further investigate additional local associations within haplotype blocks while existing methods such as LD score regression (LDSC) uses the whole SNPs. Our findings provide different aspects of explanation on the associations between the summary statistics and LD score such as Simpson's paradox. In the second real data applications, we consider non-coding GWAS SNPs of regulatory DNA marked by deoxyribonuclease I (DNase I) hypersensitive sites (DHSs). By partitioning the GWAS SNPs for type 2 diabetes into DHSs groups, we apply the proposed method to detect statistically associated DHSs groups with type 2 diabetes. Each of the 32 DHSs groups represents a unique organ, the group related to the pancreas is detected as a significant group even with a large margin, and the findings are consistent with the intuition and published articles.