A Machine Learning Methodology for the Generation of a Parameterization of the Hydroxyl Radical: a Tool to Improve Computational-Efficiency in Chemistry Climate Models

Date

2022-03-09

Department

Program

Citation of Original Publication

Anderson, Daniel C. et al. A Machine Learning Methodology for the Generation of a Parameterization of the Hydroxyl Radical: a Tool to Improve Computational-Efficiency in Chemistry Climate Models.Geoscientific Model Development. https://doi.org/10.5194/gmd-2022-44

Rights

This work was written as part of one of the author's official duties as an Employee of the United States Government and is therefore a work of the United States Government. In accordance with 17 U.S.C. 105, no copyright protection is available for such works under U.S. Law.
Public Domain Mark 1.0

Subjects

Abstract

We present a methodology that uses gradient boosted regression trees (a machine learning technique) and a full-chemistry simulation (i.e., training dataset) from a chemistry climate model (CCM) to efficiently generate a parameterization of tropospheric hydroxyl radical (OH) that is a function of chemical, dynamical, and solar irradiance variables. This surrogate model of OH is designed to allow for computationally-efficient simulation of nonlinear feedbacks between OH and tropospheric constituents that have loss by reaction with OH as their primary sinks (e.g., carbon monoxide (CO), methane (CH₄), volatile organic compounds (VOCs)). Such a model framework is advantageous for studies that require multi-decadal simulations of CH₄ or multi-year sensitivity simulations to understand the causes of trends and variations of CO and CH₄. The methodology that we present provides for the relatively easy creation of a new parameterization in response to, for example, changes in the underlying CCM chemistry and/or dynamics schemes. We show that a parameterization of OH generated from a CCM simulation is able to reproduce OH concentrations with a normalized root mean square error of approximately 5 %, as well as capturing the global mean methane lifetime within approximately 1 %. The accuracy of the parameterization is dependent on inputs being within the bounds of the training dataset. However, we show that the parameterization predicts large deviations in OH for an El Niño event that was not part of the training dataset, and that the spatial distribution and strength of these deviations are consistent with the event. This result gives confidence in the fidelity of the parameterization to simulate the spatial and temporal responses of OH to perturbations from large variations in the chemical, dynamical and solar irradiance drivers of OH. In addition, we discuss how two machine learning metrics, Gain feature importance and SHAP values, indicate that the behavior of the parameterization of OH generally comports with our understanding of OH chemistry, even though there are no physics- or chemistry-based constraints on the parameterization.