Matrix Factorization for Inferring Associations and Missing Links

dc.contributor.authorBarron, Ryan
dc.contributor.authorEren, Maksim
dc.contributor.authorTruong, Duc P.
dc.contributor.authorMatuszek, Cynthia
dc.contributor.authorWendelberger, James
dc.contributor.authorDorn, Mary F.
dc.contributor.authorAlexandrov, Boian
dc.date.accessioned2025-04-23T20:32:01Z
dc.date.available2025-04-23T20:32:01Z
dc.date.issued2025-03-06
dc.description.abstractMissing link prediction is a method for network analysis, with applications in recommender systems, biology, social sciences, cybersecurity, information retrieval, and Artificial Intelligence (AI) reasoning in Knowledge Graphs. Missing link prediction identifies unseen but potentially existing connections in a network by analyzing the observed patterns and relationships. In proliferation detection, this supports efforts to identify and characterize attempts by state and non-state actors to acquire nuclear weapons or associated technology - a notoriously challenging but vital mission for global security. Dimensionality reduction techniques like Non-Negative Matrix Factorization (NMF) and Logistic Matrix Factorization (LMF) are effective but require selection of the matrix rank parameter, that is, of the number of hidden features, k, to avoid over/under-fitting. We introduce novel Weighted (WNMFk), Boolean (BNMFk), and Recommender (RNMFk) matrix factorization methods, along with ensemble variants incorporating logistic factorization, for link prediction. Our methods integrate automatic model determination for rank estimation by evaluating stability and accuracy using a modified bootstrap methodology and uncertainty quantification (UQ), assessing prediction reliability under random perturbations. We incorporate Otsu threshold selection and k-means clustering for Boolean matrix factorization, comparing them to coordinate descent-based Boolean thresholding. Our experiments highlight the impact of rank k selection, evaluate model performance under varying test-set sizes, and demonstrate the benefits of UQ for reliable predictions using abstention. We validate our methods on three synthetic datasets (Boolean and uniformly distributed) and benchmark them against LMF and symmetric LMF (symLMF) on five real-world protein-protein interaction networks, showcasing an improved prediction performance.
dc.description.sponsorshipThis manuscript has been approved for unlimited release and has been assigned LA-UR-25-22115. This work was funded by a grant HDTRA1242032(CB11198) of BSA, from the Defense Threat Reduction Agency (DTRA) of the U.S. Department of Defense (DoD). The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication. Funds for the demonstration and/or assessment work were provided by the Los Alamos National Laboratory Technology Evaluation & Demonstration program. The research was also supported by LANL Institutional Computing Program, and by the U.S. DOE NNSA under Contract No. 89233218CNA000001.
dc.description.urihttp://arxiv.org/abs/2503.04680
dc.format.extent35 pages
dc.genrejournal articles
dc.genrepreprints
dc.identifierdoi:10.13016/m2rdke-2kzy
dc.identifier.urihttps://doi.org/10.48550/arXiv.2503.04680
dc.identifier.urihttp://hdl.handle.net/11603/38095
dc.language.isoen_US
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Computer Science and Electrical Engineering Department
dc.relation.ispartofUMBC Faculty Collection
dc.rightsThis work was written as part of one of the author's official duties as an Employee of the United States Government and is therefore a work of the United States Government. In accordance with 17 U.S.C. 105, no copyright protection is available for such works under U.S. Law.
dc.rightsPublic Domain
dc.rights.urihttps://creativecommons.org/publicdomain/mark/1.0/
dc.subjectUMBC Interactive Robotics and Language Lab
dc.subjectComputer Science - Logic in Computer Science
dc.subjectComputer Science - Machine Learning
dc.subjectComputer Science - Artificial Intelligence
dc.titleMatrix Factorization for Inferring Associations and Missing Links
dc.typeText
dcterms.creatorhttps://orcid.org/0000-0003-1383-8120

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
2503.04680v1.pdf
Size:
2.46 MB
Format:
Adobe Portable Document Format