SEALM: Semantically Enriched Attributes with Language Models for Linkage Recommendation
| dc.contributor.author | Traeger, Leonard | |
| dc.contributor.author | Behrend, Andreas | |
| dc.contributor.author | Karabatis, George | |
| dc.date.accessioned | 2025-03-11T14:42:50Z | |
| dc.date.available | 2025-03-11T14:42:50Z | |
| dc.date.issued | 2025-02-02 | |
| dc.description | International Conference on Enterprise Information Systems (ICEIS 2025). Porto, Portugal, April 4-6, 2025 | |
| dc.description.abstract | Matching attributes from different repositories is an important step in the process of schema integration to consolidate heterogeneous data silos. In order to recommend linkages between relevant attributes, a contextually rich representation of each attribute is quite essential, particularly when more than two database schemas are to be integrated. This paper introduces the SEALM approach to generate a data catalog of semantically rich attribute descriptions using Generative Language Models based on a new technique that employs six variations of available metadata information. Instead of using raw attribute metadata, we generate SEALM descriptions, which are used to recommend linkages with an unsupervised matching pipeline that involves a novel multi-source Blocking algorithm. Experiments on multiple schemas yield a 5% to 20% recall improvement in recommending linkages with SEALM-based attribute descriptions generated by the tiniest Llama3.1:8B model compared to existing techniques. With SEALM, we only need to process the small fraction of attributes to be integrated rather than exhaustively inspecting all combinations of potential linkages. | |
| dc.description.sponsorship | Leonard Traeger was partially supported by a Technology Catalyst Fund TCF24KAR11131049602 by UMBC and a grant project PLan CV (reference number 03FHP109) by the German Federal Ministry of Education and Research (BMBF) and Joint Science Conference (GWK). | |
| dc.format.extent | 12 pages | |
| dc.genre | conference papers and proceedings | |
| dc.genre | preprints | |
| dc.identifier | doi:10.13016/m2jexn-kacy | |
| dc.identifier.uri | http://hdl.handle.net/11603/37779 | |
| dc.language.iso | en | |
| dc.relation.isAvailableAt | The University of Maryland, Baltimore County (UMBC) | |
| dc.relation.ispartof | UMBC Faculty Collection | |
| dc.relation.ispartof | UMBC Information Systems Department | |
| dc.relation.ispartof | UMBC Student Collection | |
| dc.rights | This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author. | |
| dc.subject | UMBC Cybersecurity Institute | |
| dc.title | SEALM: Semantically Enriched Attributes with Language Models for Linkage Recommendation | |
| dc.type | Text | |
| dcterms.creator | https://orcid.org/0000-0002-2208-0801 |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Semantically_Enriched_Entity_Linkages_with_Language_ModelsKopie.pdf
- Size:
- 2.84 MB
- Format:
- Adobe Portable Document Format
