AI Descartes: Combining Data and Theory for Derivable Scientific Discovery

Cornelio, Cristina; Dash, Sanjeeb; Austel, Vernon; Josephson, Tyler R.; Goncalves, Joao; Clarkson, Kenneth; Megiddo, Nimrod; El Khadir, Bachir; Horesh, Lior

AI Descartes: Combining Data and Theory for Derivable Scientific Discovery

dc.contributor.author	Cornelio, Cristina
dc.contributor.author	Dash, Sanjeeb
dc.contributor.author	Austel, Vernon
dc.contributor.author	Josephson, Tyler R.
dc.contributor.author	Goncalves, Joao
dc.contributor.author	Clarkson, Kenneth
dc.contributor.author	Megiddo, Nimrod
dc.contributor.author	El Khadir, Bachir
dc.contributor.author	Horesh, Lior
dc.date.accessioned	2022-04-05T14:18:25Z
dc.date.available	2022-04-05T14:18:25Z
dc.date.issued	2021-10-08
dc.description.abstract	Scientists have long aimed to discover meaningful formulae which accurately describe experimental data. One common approach is to manually create mathematical models of natural phenomena using domain knowledge, then fit these models to data. In contrast, machine-learning algorithms automate the construction of accurate data-driven models while consuming large amounts of data. Ensuring that such models are consistent with existing knowledge is an open problem. We develop a method for combining logical reasoning with symbolic regression, enabling principled derivations of models of natural phenomena. We demonstrate these concepts for Kepler's third law of planetary motion, Einstein's relativistic time-dilation law, and Langmuir's theory of adsorption, automatically connecting experimental data with background theory in each case. We show that laws can be discovered from few data points when using formal logical reasoning to distinguish the correct formula from a set of plausible formulas that have similar error on the data. The combination of reasoning with machine learning provides generalizable insights into key aspects of natural phenomena. We envision that this combination will enable derivable discovery of fundamental laws of science. We believe that this is a crucial first step for connecting the missing links in automating the scientific method.	en
dc.description.sponsorship	We thank J. Ilja Siepmann for initially suggesting adsorption as a problem for symbolic regression. We thank James Chin-wen Chou for providing the atomic clock data. Funding: This work was supported in part by the Defense Advanced Research Projects Agency (DARPA) (PA-18-02-02). The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. TRJ was supported by the U.S. Department of Energy (DOE), Office of Basic Energy Sciences, Division of Chemical Sciences, Geosciences and Biosciences (DE-FG02-17ER16362), as well as startup funding from the University of Maryland, Baltimore County. TRJ also gratefully acknowledges the University of Minnesota Institute for Mathematics and its Applications (IMA)	en
dc.description.uri	https://arxiv.org/abs/2109.01634	en
dc.format.extent	26	en
dc.genre	journal articles	en
dc.genre	preprints	en
dc.identifier	doi:10.13016/m2ygad-ss9s
dc.identifier.uri	https://doi.org/10.48550/arXiv.2109.01634
dc.identifier.uri	http://hdl.handle.net/11603/24517
dc.language.iso	en	en
dc.relation.isAvailableAt	The University of Maryland, Baltimore County (UMBC)
dc.relation.ispartof	UMBC Chemical, Biochemical & Environmental Engineering Department Collection
dc.relation.ispartof	UMBC Faculty Collection
dc.relation.ispartof	UMBC Computer Science and Electrical Engineering Department
dc.rights	This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.	en
dc.title	AI Descartes: Combining Data and Theory for Derivable Scientific Discovery	en
dc.type	Text	en
dcterms.creator	https://orcid.org/0000-0002-0100-0227	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 2109.01634.pdf
Size:: 1.35 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 2.56 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

UMBC Chemical, Biochemical & Environmental Engineering Department
UMBC Computer Science and Electrical Engineering Department
UMBC Faculty Collection