Combining Knowledge and Data in Symbolic Regression
Loading...
Links to Files
Permanent Link
Author/Creator
Author/Creator ORCID
Date
2022-01-01
Type of Work
Department
Computer Science and Electrical Engineering
Program
Computer Science
Citation of Original Publication
Rights
Access limited to the UMBC community. Item may possibly be obtained via Interlibrary Loan through a local library, pending author/copyright holder's permission.
This item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu
This item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu
Abstract
Symbolic regression (SR) is a machine learning tool that aims to generate models that fit data, by constructing equations from variables of interest, constants, and mathematical operators. SR has inspired applications for data-driven discovery of scientific laws and equation-based physical models. When traditional SR algorithms generate models, they attempt to achieve accuracy to data and short equation length, but they do not relate models to background knowledge. This work explores the augmentation of SR methods by incorporating domain knowledge (in the form of symbolic constraints on the equations) to guide the search through equation space. Specifically, we apply this to the chemistry problem of adsorption (when a gas sticks to a material), whose governing equations must satisfy certain thermodynamic constraints in the form of limiting behavior. This work explores how Bayesian SR and genetic algorithm-based SR can be augmented with these constraints to aid in the search. We use a computer algebra system to check constraint satisfaction for each generated expression, and we find this helps both SR algorithms generate more accurate, concise, and constraint-consistent models.