Combining Knowledge and Data in Symbolic Regression

Author/Creator ORCID

Date

2022-01-01

Department

Computer Science and Electrical Engineering

Program

Computer Science

Citation of Original Publication

Rights

Access limited to the UMBC community. Item may possibly be obtained via Interlibrary Loan through a local library, pending author/copyright holder's permission.
This item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu

Abstract

Symbolic regression (SR) is a machine learning tool that aims to generate models that fit data, by constructing equations from variables of interest, constants, and mathematical operators. SR has inspired applications for data-driven discovery of scientific laws and equation-based physical models. When traditional SR algorithms generate models, they attempt to achieve accuracy to data and short equation length, but they do not relate models to background knowledge. This work explores the augmentation of SR methods by incorporating domain knowledge (in the form of symbolic constraints on the equations) to guide the search through equation space. Specifically, we apply this to the chemistry problem of adsorption (when a gas sticks to a material), whose governing equations must satisfy certain thermodynamic constraints in the form of limiting behavior. This work explores how Bayesian SR and genetic algorithm-based SR can be augmented with these constraints to aid in the search. We use a computer algebra system to check constraint satisfaction for each generated expression, and we find this helps both SR algorithms generate more accurate, concise, and constraint-consistent models.