High dimensional, robust, unsupervised record linkage

dc.contributor.authorBera, Sabyasachi
dc.contributor.authorChatterjee, Snigdhansu
dc.date.accessioned2025-03-11T14:43:06Z
dc.date.available2025-03-11T14:43:06Z
dc.date.issued2020
dc.description.abstractWe develop a technique for record linkage on high dimensional data, where the two datasets may not have any common variable, and there may be no training set available. Our methodology is based on sparse, high dimensional principal components. Since large and high dimensional datasets are often prone to outliers and aberrant observations, we propose a technique for estimating robust, high dimensional principal components. We present theoretical results validating the robust, high dimensional principal component estimation steps, and justifying their use for record linkage. Some numeric results and remarks are also presented.
dc.description.sponsorshipThis research is partially supported by the US National Science Foundation (NSF) under grants # DMS-1622483, # DMS-1737918, # OAC-1939916 and #DMR-1939956.
dc.description.urihttps://sit.stat.gov.pl/Article/182
dc.format.extent22 pages
dc.genrejournal articles
dc.identifierdoi:10.13016/m2jqys-uura
dc.identifier.citationBera, Sabyasachi, and Snigdhansu Chatterjee. “High Dimensional, Robust, Unsupervised Record Linkage.” Statistics in Transition New Series 21, no. 4 (2020): 123–43. https://doi.org/10.21307/stattrans-2020-034.
dc.identifier.urihttps://doi.org/10.21307/stattrans-2020-034
dc.identifier.urihttp://hdl.handle.net/11603/37803
dc.language.isoen_US
dc.publisherStatistics Poland
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Mathematics and Statistics Department
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 International CC BY-NC-ND 4.0 Deed
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/4.0/
dc.titleHigh dimensional, robust, unsupervised record linkage
dc.typeText
dcterms.creatorhttps://orcid.org/0000-0002-7986-0470

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
High_Dimensional.pdf
Size:
359.9 KB
Format:
Adobe Portable Document Format