Sparcle: Boosting the Accuracy of Data Cleaning Systems through Spatial Awareness

dc.contributor.authorHuang, Yuchuan
dc.contributor.authorMokbel, Mohamed F.
dc.date.accessioned2025-10-29T19:14:57Z
dc.date.issued2024-05-01
dc.description.abstractThough data cleaning systems have earned great success and wide spread in both academia and industry, they fall short when trying to clean spatial data. The main reason is that state-of-the-art data cleaning systems mainly rely on functional dependency rules where there is sufficient co-occurrence of value pairs to learn that a certain value of an attribute leads to a corresponding value of another attribute. However, for spatial attributes that represent locations, there is very little chance that two records would have the same exact coordinates, and hence co-occurrence is unlikely to exist. This paper presents Sparcle (SPatially-AwaRe CLEaning); a novel framework that injects spatial awareness into the core engine of rule-based data cleaning systems through two main concepts: (1) Spatial Neighborhood, where co-occurrence is relaxed to be within a certain spatial proximity rather than same exact value, and (2) Distance Weighting, where records are given different weights of whether they satisfy a dependency rule, based on their relative distance. Experimental results using a real deployment of Sparcle inside a state-of-the-art data cleaning system, and real and synthetic datasets, show that Sparcle significantly boosts the accuracy of data cleaning systems when dealing with spatial data.
dc.description.sponsorshipThis work is supported by NSF under grants IIS-2203553 and OAC-2118285
dc.description.urihttps://dl.acm.org/doi/10.14778/3665844.3665862
dc.format.extent14 pages
dc.genrejournal articles
dc.identifierdoi:10.13016/m2mq7a-j31h
dc.identifier.citationHuang, Yuchuan, and Mohamed F Mokbel. “Sparcle: Boosting the Accuracy of Data Cleaning Systems through Spatial Awareness.” Proceedings of the VLDB Endowment 17, no. 9 (2024). world. https://doi.org/10.14778/3665844.3665862.
dc.identifier.urihttps://doi.org/10.14778/3665844.3665862
dc.identifier.urihttp://hdl.handle.net/11603/40693
dc.language.isoen
dc.publisherACM
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofiHARP NSF HDR Institute for Harnessing Data and Model Revolution in the Polar Regions
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 International
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/4.0/deed.en
dc.titleSparcle: Boosting the Accuracy of Data Cleaning Systems through Spatial Awareness
dc.typeText

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
3665844.3665862.pdf
Size:
2.77 MB
Format:
Adobe Portable Document Format