Josephson, Tyler RFox, Charles Elliott2023-04-052023-04-052022-01-0112611http://hdl.handle.net/11603/27364Symbolic regression (SR) is a machine learning tool that aims to generate models that fit data, by constructing equations from variables of interest, constants, and mathematical operators. SR has inspired applications for data-driven discovery of scientific laws and equation-based physical models. When traditional SR algorithms generate models, they attempt to achieve accuracy to data and short equation length, but they do not relate models to background knowledge. This work explores the augmentation of SR methods by incorporating domain knowledge (in the form of symbolic constraints on the equations) to guide the search through equation space. Specifically, we apply this to the chemistry problem of adsorption (when a gas sticks to a material), whose governing equations must satisfy certain thermodynamic constraints in the form of limiting behavior. This work explores how Bayesian SR and genetic algorithm-based SR can be augmented with these constraints to aid in the search. We use a computer algebra system to check constraint satisfaction for each generated expression, and we find this helps both SR algorithms generate more accurate, concise, and constraint-consistent models.application:pdfAdsorptionGenetic AlgorithmsMachine LearningMarkov Chain Monte CarloSymbolic RegressionCombining Knowledge and Data in Symbolic RegressionText