Bayesian Analysis of Synthetic Data under Multiple Linear Regression, Multivariate Normal and Multivariate Regression Models

dc.contributor.advisorSinha, Bimal Prof.
dc.contributor.advisorRoy, Anindya Prof.
dc.contributor.authorGuin, Abhishek
dc.contributor.departmentMathematics and Statistics
dc.contributor.programStatistics
dc.date.accessioned2022-02-09T15:52:53Z
dc.date.available2022-02-09T15:52:53Z
dc.date.issued2020-01-01
dc.description.abstractStatistical Disclosure Control (SDC) methods are used to preserve confidentiality of publicly released microdata, without compromising on its fundamental structure, so as to ensure adequate and accurate statistical analysis of the data. The synthetic data approach is a popular form of SDC methodology where (all or part of) the real data are not released, but are instead used to create synthetic data which are released. In this dissertations we develop Bayesian inference based on singly or multiply imputed synthetic data, when the original data are derived from the following models: multiple linear regression, multivariate normal and multivariate regression. We assume that the synthetic data are generated by using two methods: plug-in sampling, where unknown parameters in the data model are set equal to observed values of their point estimators based on the original data, and synthetic data are drawn from this estimated version of the model; posterior predictive sampling, where an imputed posterior distribution of the unknown parameters is used to generate a posterior draw, which in turn is plugged in the original model to produce synthetic data. In the single imputation case, the procedures developed here fill the gap in the existing literature where inferential methods are only available for multiple imputation and by being based on exact distributions, it may even be applied to cases where the sample size is small. Simulation results are presented to demonstrate how the proposed methodology performs compared to the theoretical predictions. We also outline some ways to extend the proposed methodology for certain scenarios where the required set of conditions do not hold.
dc.formatapplication:pdf
dc.genredissertations
dc.identifierdoi:10.13016/m2y3o5-rb2f
dc.identifier.other12433
dc.identifier.urihttp://hdl.handle.net/11603/24210
dc.languageen
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Mathematics and Statistics Department Collection
dc.relation.ispartofUMBC Theses and Dissertations Collection
dc.relation.ispartofUMBC Graduate School Collection
dc.relation.ispartofUMBC Student Collection
dc.sourceOriginal File Name: Guin_umbc_0434D_12433.pdf
dc.subjectPartially synthetic data
dc.subjectPivotal quantity
dc.subjectPlug-in sampling
dc.subjectPosterior predictive sampling
dc.titleBayesian Analysis of Synthetic Data under Multiple Linear Regression, Multivariate Normal and Multivariate Regression Models
dc.typeText
dcterms.accessRightsDistribution Rights granted to UMBC by the author.
dcterms.accessRightsThis item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Guin_umbc_0434D_12433.pdf
Size:
1.1 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Guin-Abhishek_843428.pdf
Size:
233.2 KB
Format:
Adobe Portable Document Format
Description: