Multiple Testing Procedures controlling False Discovery Rate with applications to genomic data

dc.contributor.advisorPark, Junyong
dc.contributor.authorGauran, Iris Mirales
dc.contributor.departmentMathematics and Statistics
dc.contributor.programStatistics
dc.date.accessioned2021-01-29T18:13:49Z
dc.date.available2021-01-29T18:13:49Z
dc.date.issued2018-01-01
dc.description.abstractIn recent mutation studies, analyses based on protein domain positions are gaining popularity over traditional gene-centric approaches since the latter have limitations in considering the functional context that the position of the mutation provides. This presents a large-scale simultaneous inference problem, with hundreds of hypothesis tests to consider at the same time. The overarching objective of this thesis is to propose different multiple testing procedures which can address the problems posed by discrete genomic data. Specifically, we are interested in identifying significant mutation counts while controlling a given level of Type I error via False Discovery Rate (FDR) procedures. One main assumption is that the mutation counts follow a zero-inflated model in order to account for the true zeros in the count model and the excess zeros. The class of models considered is the Zero-inflated Generalized Poisson (ZIGP) distribution. In the first study, we developed an Empirical Bayes procedure. We assumed that there exists a cut-off value such that smaller counts than this value are generated from the null distribution. We present several data-dependent methods to determine the cut-off value. We also consider a two-stage procedure based on screening process so that the number of mutations exceeding a certain value should be considered as significant mutations. Simulated and protein domain data sets are used to illustrate this procedure in estimation of the empirical null using a mixture of discrete distributions. Overall, while maintaining control of the FDR, the proposed cut-off method juxtaposed with the two-stage testing procedure has superior empirical power. In the second study, we developed full Bayesian procedures. We addressed the caveat of the Empirical Bayes procedure by proposing methods which can handle both the weakened assumption on the null distribution and the sparsity condition which is apparent among protein domains whose number of positions is considerably small. Based on the simulation studies, the full Bayesian methods have the ability to control FDR when the Empirical Bayes method fails. We also studied several cases in order to assess whether we need to implement the zero assumption on the null distribution. Results revealed that implementing this key assumption would still yield good results in terms of control of FDR and high values of the empirical power. In general, simulation results suggest that lesser number of rejections is preferable. The number of identified hotspots in the real data analysis are consistent with the simulation studies.
dc.formatapplication:pdf
dc.genredissertations
dc.identifierdoi:10.13016/m2uhkg-jubg
dc.identifier.other11820
dc.identifier.urihttp://hdl.handle.net/11603/20911
dc.languageen
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Mathematics and Statistics Department Collection
dc.relation.ispartofUMBC Theses and Dissertations Collection
dc.relation.ispartofUMBC Graduate School Collection
dc.relation.ispartofUMBC Student Collection
dc.sourceOriginal File Name: Gauran_umbc_0434D_11820.pdf
dc.subjectEmpirical Null estimation
dc.subjectFalse Discovery Rate
dc.subjectMultiple Testing
dc.subjectProtein domain data
dc.subjectZero Inflated Generalized Poisson
dc.titleMultiple Testing Procedures controlling False Discovery Rate with applications to genomic data
dc.typeText
dcterms.accessRightsDistribution Rights granted to UMBC by the author.
dcterms.accessRightsAccess limited to the UMBC community. Item may possibly be obtained via Interlibrary Loan thorugh a local library, pending author/copyright holder's permission.
dcterms.accessRightsThis item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Gauran_umbc_0434D_11820.pdf
Size:
4.53 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
GauranIMultiple_Open.pdf
Size:
44.84 KB
Format:
Adobe Portable Document Format
Description: