Cyber-Security Knowledge Graph Generation by Hierarchical Nonnegative Matrix Factorization

dc.contributor.authorBarron, Ryan
dc.contributor.authorEren, Maksim E.
dc.contributor.authorBhattarai, Manish
dc.contributor.authorWanna, Selma
dc.contributor.authorSolovyev, Nicholas
dc.contributor.authorRasmussen, Kim
dc.contributor.authorAlexandrov, Boian S.
dc.contributor.authorNicholas, Charles
dc.contributor.authorMatuszek, Cynthia
dc.date.accessioned2024-04-10T19:05:40Z
dc.date.available2024-04-10T19:05:40Z
dc.date.issued2024-03-26
dc.description.abstractMuch of human knowledge in cybersecurity is encapsulated within the ever-growing volume of scientific papers. As this textual data continues to expand, the importance of document organization methods becomes increasingly crucial for extracting actionable insights hidden within large text datasets. Knowledge Graphs (KGs) serve as a means to store factual information in a structured manner, providing explicit, interpretable knowledge that includes domain-specific information from the cybersecurity scientific literature. One of the challenges in constructing a KG from scientific literature is the extraction of ontology from unstructured text. In this paper, we address this topic and introduce a method for building a multi-modal KG by extracting structured ontology from scientific papers. We demonstrate this concept in the cybersecurity domain. One modality of the KG represents observable information from the papers, such as the categories in which they were published or the authors. The second modality uncovers latent (hidden) patterns of text extracted through hierarchical and semantic non-negative matrix factorization (NMF), such as named entities, topics or clusters, and keywords. We illustrate this concept by consolidating more than two million scientific papers uploaded to arXiv into the cyber-domain, using hierarchical and semantic NMF, and by building a cyber-domain-specific KG.
dc.description.sponsorshipThis research was funded by the Los Alamos National Laboratory (LANL) Laboratory Directed Research and Development (LDRD) grant 20230067DR and LANL Institutional Computing Program, supported by the U.S. Department of Energy National Nuclear Security Administration under Contract No. 89233218CNA000001.
dc.description.urihttp://arxiv.org/abs/2403.16222
dc.format.extent6 pages
dc.genrejournal articles
dc.genrepreprints
dc.identifierdoi:10.13016/m2gsuh-hsbz
dc.identifier.urihttps://doi.org/10.48550/arXiv.2403.16222
dc.identifier.urihttp://hdl.handle.net/11603/32978
dc.language.isoen_US
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Faculty Collection
dc.relation.ispartofUMBC Computer Science and Electrical Engineering Department
dc.relation.ispartofUMBC Student Collection
dc.rightsThis work was written as part of one of the author's official duties as an Employee of the United States Government and is therefore a work of the United States Government. In accordance with 17 U.S.C. 105, no copyright protection is available for such works under U.S. Law.
dc.rightsPublic Domain
dc.rights.urihttps://creativecommons.org/publicdomain/mark/1.0/deed.en
dc.subjectComputer Science - Artificial Intelligence
dc.titleCyber-Security Knowledge Graph Generation by Hierarchical Nonnegative Matrix Factorization
dc.typeText
dcterms.creatorhttps://orcid.org/0009-0005-5045-9527
dcterms.creatorhttps://orcid.org/0000-0002-4362-0256
dcterms.creatorhttps://orcid.org/0000-0001-9494-7139
dcterms.creatorhttps://orcid.org/0000-0003-1383-8120

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
2403.16222.pdf
Size:
21.64 MB
Format:
Adobe Portable Document Format