Generation and Analysis of Synthetic Data for Privacy Protection Under the Multivariate Linear Regression Model

Author/Creator ORCID

Date

2018-01-01

Department

Mathematics and Statistics

Program

Statistics

Citation of Original Publication

Rights

Distribution Rights granted to UMBC by the author.
Access limited to the UMBC community. Item may possibly be obtained via Interlibrary Loan thorugh a local library, pending author/copyright holder's permission.
This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.

Abstract

In this dissertations, the author derives likelihood-based exact inference for multiply imputed synthetic data under the multiple (p>1) univariate linear regression model and for singly and multiply imputed data under the multivariate linear regression model. In the former, the synthetic data are generated under plug-in sampling, where unknown parameters in the model are set equal to observed values of point estimators. In the latter, synthetic data are also generated under posterior predictive sampling where they are drawn from a posterior predictive distribution. Simulations are presented to confirm the methodology performs as the theory predicts and to evaluate privacy protection. Robustness studies are also given. In the final chapter, a new privacy protection method similar to bottom- and top-coding is proposed and its inferential properties explored.