Statistical Inference on High Dimensional Normal Mean Under Linear Inequality Constraints and Efficient Integration of Data in Meta-Analysis

Author/Creator

Author/Creator ORCID

Date

2022-01-01

Department

Mathematics and Statistics

Program

Statistics

Citation of Original Publication

Rights

This item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu
Distribution Rights granted to UMBC by the author.
Access limited to the UMBC community. Item may possibly be obtained via Interlibrary Loan thorugh a local library, pending author/copyright holder's permission.

Abstract

In this dissertations, we provide a framework for incorporating linear inequality parameter constraints in estimation and hypothesis testing involving high dimensional normal means. Modern statistical problems often involve such linear inequality constraints on model parameters. Ignoring natural parameter constraints usually results in less efficient statistical procedures. To this end, we define a notion of `sparsity' for such restricted sets using lower-dimensional features (Chapter 2). We allow our framework to be flexible so that the number of restrictions may be higher than the number of parameters. We show that the proposed notion of sparsity agrees with the usual notion of sparsity in the unrestricted case and proves the validity of the proposed definition as a measure of sparsity. The proposed sparsity measure also allows us to generalize popular priors for sparse vector estimation to the constrained case. We also explore the properties of some of these priors for the non-negativity restrictions (Chapter 1). Along with Bayesian estimation of the constrained mean, we also consider the classical one-sided normal mean testing problem where the null hypothesis of a zero mean vector is tested against the alternative that all the components are non-negative and at least one is positive (Chapter 3). It is unlikely for a single test to perform equally well for dense and sparse parameter configuration in high dimension. We develop a computationally efficient omnibus test with reasonable power for the entire spectrum of alternatives. Finally, we propose a meta-analysis approach for combining treatment effects across aggregate data (AD) and individual patient data (IPD) under a generalized linear model structure (Chapter 4). Often for some studies with AD, the associated IPD may be available, albeit at some extra effort or cost to the analyst. For many different models, design constraints under which the AD estimators are the IPD estimators, and hence fully efficient, are known. For such models, we advocate a selection procedure that chooses AD studies over IPD studies to force least departure from design constraints using the proposed combination method and hence ensures an efficient combined AD and IPD estimator.