Scoping: Towards Streamlined Entity Collections for Multi-Sourced Entity Resolution with Self-Supervised Agents

dc.contributor.authorTraeger, Leonard
dc.contributor.authorBehrend, Andreas
dc.contributor.authorKarabatis, George
dc.date.accessioned2024-07-12T14:57:12Z
dc.date.available2024-07-12T14:57:12Z
dc.date.issued2024
dc.description26th International Conference on Enterprise Information Systems, 2024
dc.description.abstractLinking multiple entities to a real-world object is a time-consuming and error-prone task. Entity Resolution (ER) includes techniques for vectorizing entities (signature), grouping similar entities into partitions (blocking), and matching entity pairs based on specified similarity thresholds (filtering). This paper introduces scoping as a new and integral phase in multi-sourced ER with potentially increased heterogeneity and more unlinkable entities. Scoping reduces the space of candidate entity pairs by ranking, detecting, and removing unlinkable entities through outlier algorithms and reusable self-supervised autoencoders, leaving intact the set of true linkages. Evaluations on multi-sourced schemas show that autoencoders perform best in schemas relevant to each other, where they reduce entity collections to 77% and still contain all linkages.
dc.description.urihttps://www.scitepress.org/Link.aspx?doi=10.5220/0012607500003690
dc.format.extent9 pages
dc.genreconference papers and proceedings
dc.identifierdoi:10.13016/m2geet-tgsz
dc.identifier.citationTraeger, Leonard, Andreas Behrend, and George Karabatis. “Scoping: Towards Streamlined Entity Collections for Multi-Sourced Entity Resolution with Self-Supervised Agents,” In Proceedings of the 26th International Conference on Enterprise Information Systems - Volume 1: ICEIS, SciTePress, 107–15, 2024. https://www.scitepress.org/Link.aspx?doi=10.5220/0012607500003690.
dc.identifier.urihttps://doi.org/10.5220/0012607500003690
dc.identifier.urihttp://hdl.handle.net/11603/34855
dc.language.isoen_US
dc.publisherSciTePress
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Student Collection
dc.relation.ispartofUMBC Faculty Collection
dc.relation.ispartofUMBC Information Systems Department
dc.rightsATTRIBUTION-NONCOMMERCIAL-NODERIVS 4.0 INTERNATIONAL
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/4.0/
dc.titleScoping: Towards Streamlined Entity Collections for Multi-Sourced Entity Resolution with Self-Supervised Agents
dc.typeText
dcterms.creatorhttps://orcid.org/0000-0002-2208-0801

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
126075.pdf
Size:
726.2 KB
Format:
Adobe Portable Document Format