Improving Forecasting Ability of GITM Using Data‐Driven Model Refinement

At altitudes below about 600 km, satellite drag is one of the most important and variable forces acting on a satellite. Neutral mass density predictions in the upper atmosphere are therefore critical for (a) designing satellites; (b) performing adjustments to stay in an intended orbit; and (c) collision avoidance maneuver planning. Density predictions have a great deal of uncertainty, including model biases and model misrepresentation of the atmospheric response to energy input. These may stem from inaccurate approximations of terms in the Navier‐Stokes equations, unmodeled physics, incorrect boundary conditions, or incorrect parameterizations. Two commonly parameterized source terms are the thermal conduction and eddy diffusion. Both are critical components in the transfer of the heat in the thermosphere. Determining how well the major constituents (N2, O2, and O) are as heat conductors will have effects on the temperature and mass density changes from a heat source. This work shows the effectiveness of using the retrospective cost model refinement (RCMR) technique at removing model bias caused by different sources within the Global Ionosphere Thermosphere Model. Numerical experiments, Challenging Minisatellite Payload and Gravity Recovery and Climate Experiment data during real events are used to show that RCMR can compensate for model bias caused by both inaccurate parameterizations and drivers. RCMR is used to show that eliminating model bias before a storm allows for more accurate predictions throughout the storm.

Attitude control is a related topic that requires properly estimating the drag-induced torques on a satellite to control its orientation. This could be important for instrumentation to function properly. Part of the attitude control problem is bounding torques to ensure systems do not get overwhelmed. Alternatively, over-engineering a powerful attitude control system costs extra money. The accuracy of torque prediction is reliant on low-error density estimation too. Moorthy et al. (2021) describes the importance of attitude control and the potential impact to expand our ability to explore extremely low Earth orbits (150-250 km). This region of Earth's atmosphere is under-explored due to the large drag force causing short expected lifetimes.
Accurately predicting the density in the thermosphere is a difficult task and atmospheric models are often called upon to make these density-driven drag estimations, but can be inaccurate by 20% (Bruinsma et al., 2004;Kuang et al., 2014;Marcos, 1990). The errors in the prediction are amplified during a geomagnetic storm, largely due to poor density estimation (Pachura & Hejduk, 2016). Drag inaccuracies can create positioning errors on the order of 10 km after just 1 day. In a short period of time, the satellites' trajectory can change enough such that JSpOC may need to reacquire them.
One of the models available to estimate density is NRLMSISE-00 (referred to as Mass Spectrometer and Incoherent Scatter Radar (MSIS)). MSIS is an empirical model (Hedin, 1983(Hedin, , 1987(Hedin, , 1991Picone et al., 2002) that uses a spherical harmonic fitting of ground-based and satellite measurements to estimate neutral densities and temperatures of the thermosphere for given solar conditions (F10.7) and geomagnetic activity (A p ). Empirical models incorporate data from remote observations so they are able to capture background neutral densities well, but do not have the same success during a solar storm due to limited time periods of enhanced activity. Wang et al. (2022) analyzed 265 storms, showed that MSIS under-predicted the density during storms, and fit coefficients to improve MSIS's peak density prediction during weak, moderate and intense storms.
The Jacchia-Bowman 2008 Empirical Thermospheric Density Model (JB2008) (B. Bowman et al., 2008) is an empirical model that estimates total mass density. JB2008 is a series of improvements upon the Jacchia 70 model (Jacchia, 1970) changing the input for the geomagnetic indices (from A p to D st ) and adding to the input for the solar indices using orbit-based sensor measurements of solar data in the EUV and far EUV (FUV) wavelengths. As part of the change from Jacchia 70, B. R. Bowman (2004) concluded that a Fourier time series and an altitude dependent, quadratic function could accurately replace the existing Jacchia 70 density functions used to compute the semidiurnal density variation. B. Bowman et al. (2006) introduced EUV and FUV solar indices into their temperature equation, replacing the standard Jacchia temperature equation. The accumulation of these changes led to lower standard deviation in errors, particularly during solar minimum conditions and during major geomagnetic storms.
There are two common issues with models: (a) bias during background conditions where mean densities from the model differ from mean measurements over a period of several days or longer and (b) enhanced errors over periods of a couple of days, driven by space weather events like storms. There are many ways people have tried to address these issues of poor density estimation.
The High Accuracy Satellite Drag Model (Storz et al., 2005) is an extension of JB2008 used by the US Space Force Combined Space Operations Center which uses observed drag effects from approximately 75 Earth-orbiting spheres to compute diurnal and semidiurnal variations to the thermosphere density. Doornbos et al. (2008) has done work with two-line element (TLE) data to directly create altitude-dependent multiplication factors to scale the densities of empirical models. Brandt et al. (2020) created the Multifacted Optimization Algorithm which similarly uses TLE data to incrementally adjust the drivers for MSIS within the orbital propagator (SpOCK) (Bussy-Virat et al., 2018). MOA adjusts the drivers of MSIS when MSIS has a large bias or misrepresents a storm to bring SpOCK-predicted orbits in line with TLEs from several small satellites. Finally, Kalafatoglu Eyiguler et al. (2019) showed that debiasing a model's background density prior to a storm may lead to improved performance for some models and recommends a few calculations for assessing storm-time performance.
Physics-based models estimate the thermosphere state variables using approximations of the Navier-Stokes equations. The idea is that correctly implemented physics could more accurately reproduce typical and highly variable thermosphere conditions as observed during storms. Coupled Thermosphere Ionosphere Model (CTIM) (Fuller-Rowell & Rees, 1980), Thermosphere Ionosphere Electrodynamics General Circulation Model (TIEGCM) (Richmond et al., 1992) and Global Ionosphere Thermosphere Model (GITM) (A. J. Ridley et al., 2006) are examples of Earth-based, physics models. The different numerical approximations, source terms included (or not included), and drivers in each model generates different temperatures, wind structures and densities. TIEGCM and CTIM use the hydrostatic assumption, whereas GITM does not make the same hydrostatic equilibrium assumption and solves a more complete vertical momentum and energy equation, but takes significantly longer to run. GITM makes use of the Flare Irradiance Spectral Model (FISM) (Chamberlin et al., 2008) fluxes to better represent the solar EUV entering the atmosphere. Matsuo et al. (2013) used an ensemble Kalman filter to assimilate Challenging Minisatellite Payload (CHAMP) measurements in the TIEGCM and in turn back out solar forcing terms such as the F10.7 index. Their work also demonstrated that electron density profiles from COSMIC can infer neutral states better than a single satellite's measurements of in-situ neutral densities. Progressing on this work, indirect and direct measurements of electron densities were used to determine the effectiveness of orbit propagation and quantify the improvements to ionosphere-thermosphere states (Dietrich et al., 2022;Matsuo & Hsu, 2021). Matsuo and Hsu (2021) also pointed out that after removing the bias, forecasting neutral densities remained reliable for the next 3 days in geomagnetically quiet conditions. It is important that a balance be struck when assimilating data without addressing the model drivers/parameterizations because during active time periods, the fundamental physics is needed to capture the fast changing states. Sutton (2018) points this out and developed the Iterative Reinitialization, Driver Estimation, and Assimilation technique which has demonstrated that assimilating satellite measurements in TIEGCM to modify solar and geomagnetic indices (drivers for models) improve active time period errors.
This study presents work on debiasing the background density in GITM using observational data. It also shows the impact of debiasing a model prior to a geomagnetic storm using satellite measurements and the MSIS model.

The Global Ionosphere Thermosphere Model (GITM)
Understanding the parameters that affect the thermosphere's neutral density are critical for improving physics-based models like GITM. GITM is a 3D spherical model that is used for Earth (A. J. Ridley et al., 2006), Mars (Bougher et al., 2015), and Saturn's moon Titan (Bell et al., 2010). In this study, the resolution of GITM was 2° latitude and 4° in longitude.
A. J. Ridley et al. (2006) explains the capabilities of the model, including the chemistry and numerical schemes. The vertical energy equation in GITM, including source terms, is (A. J. Ridley et al., 2006): where the first term is the time rate of change for the normalized, neutral temperature,  = ∕̄ . The second term is the advection of temperature gradients, while the third term is the adiabatic heating, which is a result of the divergence of the velocity. This is only the vertical component which depends on the vertical velocity, u r , radius of the Earth, r, and the temperature gradient. γ is the adiabatic index that is attached to the change in energy from the expansion of the gas. On the right-hand side, c v is the specific heat, k is Boltzmann's constant, ρ is the mass density, and is the mean mass of the neutrals. The various source terms are given by: where: Q EUV is the contribution from the solar extreme ultraviolet irradiance; the Q NO and Q O terms are the cooling to space from the 5.3 and 63 μm bands respectively. The last term is the collisional frictional heating and heat transfer between ions and neutrals. This is a function of the ion density (n i ), mass of the ion (m i ), mass of the neutrals (m n ), the ion-neutral collision frequency (ν in ), the ion velocity (v i ), neutral velocity (u n ), ion temperature (T i ), and the neutral temperature (T n ). Finally, the fourth term is the thermal conductivity, where κ eddy and κ c are the conductivity coefficients due to eddy diffusion and molecular heat conductivity respectively, and is the focus of this study.

Thermal Conductivity in the Upper Atmosphere
Thermal conductivity uncertainty is a serious issue in physics-based models (Banks & Kockarts, 1973;Pawlowski & Ridley, 2009;Schunk & Nagy, 2004). Most of the literature describes thermal conductivity in a laboratory setting where it is expressed as a function of temperature alone for specific species (Vargaftik et al., 1993). The theoretical expression for the thermal conductivity coefficient (κ c ) are complex and so it has been useful to simplify the coefficient to be a parameterization (Banks & Kockarts, 1973;Schunk & Nagy, 2004) as: where N i /N total is a weighting factor by number density of each neutral species, T is the thermosphere temperature, A i and s are species specific thermal conductivity coefficients to fit the total conductivity as needed. The summation includes the three species with the largest concentrations in the thermosphere. From Figure 1, above about 200 km, O is a dominant neutral species whereas in the lower thermosphere O 2 and N 2 densities are more prevalent and must be considered in the contribution to the heat exchange process. The temperature profile shows that above about 250 km, the atmosphere is roughly isothermal, so the conduction term can be quite small. This is the region where O is dominant. This implies that the N 2 term in the thermal conductivity is probably a more important term since N 2 is dominant below ∼250 km where the vertical temperature gradient is largest. Pavlov (2017) gives approximations from tabulated values in Vargaftik et al. (1993) for thermal conduction (denoted as λ in Pavlov (2017)) experiencing pressures much less than 0.1 MPa in temperature ranges of 160-2500 K for N 2 and 160-1500 K for O 2 . The full expressions a = 46.7 ( 1 + 2.228 × 10 −5 − 5.545 × 10 −9 2 ) 0.77 (7) Figure 2 shows the Pavlov (2017) values of 2 , 2 , κ O , as well as the corresponding Schunk and Nagy (2004) conductivities (assuming s = 0.75). In the bottom subplot, "best fit" lines are shown using the same parameterization scheme in Equation 4. The estimation of the coefficients and exponent in the parameterization are derived from data and theoretical expressions of the thermal conductivity of individual gases from Hilsenrath (1960), Reid et al. (1977), Lide (1997), Barlier et al. (1969), and Banks and Kockarts (1973). While Vargaftik et al. (1993) describes more complex expressions that best fit to an exponent close to 0.8. Although the parameterizations of the atomic oxygen, O, and nitrogen, N 2 , seem to match fairly well, there is a great deal of discrepancy for the estimation of the O 2 .
As described in Pawlowski and Ridley (2009), model bias can originate from incorrectly defined parameters like the thermal conductivity, eddy diffusion, or photoelectron heating efficiencies. Certain quantities such as the eddy diffusion, and lower boundary density and temperature affect model bias such that the best modeled physics equations can still result in inaccurate mass density calculations. It is therefore quite difficult to identify the cause of data-model comparison discrepancies.
For example, Masutti et al. (2016) explored a time period in which F10.7 increased over the course of several days and showed that GITM's mass density at approximately 400 km altitude overresponded to this change. Overall, there was an underestimate of a mass density when F10.7 was low and an overestimate when F10.7 was high. Since GITM's performance was a function of the solar irradiance, improved performance could possibly be captured through thermal conductivity adjustments based on solar activity, but may be possibly masking other incorrectly modeled physics.
The thermal conductivity is the focus of this study because its parameterization is a possible deficiency in GITM and it significantly changes the density results needed for orbit prediction. This is an opportunity to settle the discrepancy of parameterizations and compensate for neutral density model bias that may be caused by other incorrectly modeled physics, boundary conditions or drivers. For instance, inaccurate modeling of a term like the eddy diffusion coefficient could also influence neutral density results (Qian et al., 2009). Handling the eddy diffusion has been a topic of previous research in GITM (Goel et al., 2018;Malhotra et al., 2017), but the eddy diffusion is a term that also controls the composition and ionospheric density due to the changed turbulent mixing and its inclusion in the continuity, vertical momentum, and energy equations.

Manually Debiasing the Thermal Conductivity
This section outlines the need for debiasing models by describing an attempt to choose a single constant, thermal conductivity coefficient that allows GITM's mass density to better match CHAMP observations. Nine runs with varying thermal conductivity coefficients (Table 1) were performed. Six different contour maps are shown for the six different time periods simulated ( Figure 3). For each run, the eddy diffusion coefficient was set to 500, and s was set to 0.69. The percent difference in mass density from CHAMP measurements and GITM calculations were examined. GITM was ran for 10 days, but only the last 5 days of each run were used to allow GITM to reach a quasi-diurnally reproducible state before comparison. CHAMP and GITM densities were averaged over the orbital period (∼90 min).
Contours of percent error for each time period are shown in Figure 3. The September 2002 and September 2004 time periods were selected to tune Note. Multiply A(i) by 10 −4 to yield J m −1 s −1 K −1 . GITM, keeping the season and geomagnetic conditions similar, but allowing the solar activity to vary (see Table 2).
As the thermal conductivity is increased, the gradient in temperature in the lower thermosphere decreases. Since the lower boundary condition fixes the temperature, the temperature in the upper thermosphere must decrease. Pressure and density profiles are strongly controlled by the temperature, so as the temperature decreases, the density at a fixed altitude in the upper thermosphere also decreases. This means that the neutral density in GITM decreases as the thermal conductivity increases. Figure 3 shows that the molecular coefficient has a stronger effect than the atomic oxygen coefficient. This is because the thermal conductivity multiplies ∇T, which is largest in the lower thermosphere (∼100-200 km), where the major species O 2 and N 2 are dominant ( Figure 1). Hence, the thermal conductivity in the lower thermosphere dictates the middle and upper thermosphere temperature and density.
The top two plots of Figure 3 indicate that, for these two intervals, there is a span of atomic and molecular coefficients that reduce the model bias to extremely low levels, even with different solar irradiance. However, when the study was expanded to include other seasons and other conditions, it became clear that no combination reduced the bias universally. Times outside of September 2002 and 2004 needed to be considered to see that this overlapping parameterization space does not provide unbiased results at different parts of the solar cycle. The yellow zone overlayed on each subplot is the parameter space from the September 2002 and 2004 runs where the error was within ±5% for both times. These yellow zones show that a debiased set of thermal conductivity parameters for one set of times do not necessarily reduce the error to zero for other time periods. Figure 4 shows the sensitivity of the thermal conductivity exponent, s, for 16-26 September 2002. The molecular and diatomic coefficients were held constant at A(O) = A(O 2 , N 2 ) = 4.6 × 10 −4 J m −1 s −1 K −1 while "s" varied from 0.63 to 0.75 in increments of 0.03. For the GITM conditions during this time period, an exponent of 0.68 minimized the absolute error between GITM and CHAMP. As indicated in the sensitivity study of the molecular and diatomic coefficients, this is not expected to be a universal value due to the uncertainty in other terms within GITM. These runs show that GITM is more sensitive to the exponent than the molecular and diatomic coeffi-  The causes of model bias varying from event to event in Figure 3 could stem from incorrect drivers (EUV, lower boundary condition, aurora, etc…) or incorrect physics (ion variability, small-scale structures, turbulent heating, etc…). This is the reason an automated debiasing mechanism is needed. The difference in performance to estimate other state variables (aside from the neutral density) between the three parameters within the thermal conductivity coefficient was not studied in this work.

Retrospective Cost Model Refinement (RCMR)
Retrospective cost model refinement (RCMR) is a technique developed for parameter estimation in nonlinear systems (Morozov et al., 2011). The technique is a variation of retrospective cost adaptive control that was primarily developed for adaptive control applications in aerospace engineering (Santillo & Bernstein, 2010). In this work, RCMR is used to estimate thermal conductivity coefficients in a system modeled by Navier-Stokes partial differential equations. RCMR minimizes a cumulative cost function that is based on the difference between the density computed self-consistently by GITM and the density specified externally, such as that measured by a real satellite or estimated by a different model. This technique has been applied for estimation of (a) the eddy diffusion coefficient using total electron content as the comparison variable (Goel et al., 2018), (b) NO x cooling using simulated space-based measurements (D'Amato et al., 2013), (c) the photoelectron heating coefficient based on real satellite measurements (Burrell et al., 2015), and (d) the thermal conductivity coefficients using simulated density measurements (Goel et al., 2020). Each of these studies successfully estimated the corresponding unknown parameter using RCMR. This method is different from the other data assimilation methods talked about in the introduction because it does not use an ensemble (Matsuo et al., 2013) or run restarts (Sutton, 2018) which saves considerably on computational time. RCMR has been applied to parameters within GITM rather than directly updating the model states or modifying the drivers. For a more complete description of RCMR, refer to Goel et al. (2020). Figure 5 shows the block diagram used to estimate the unknown parameter within RCMR. As shown by the top block in Figure 5, the external drivers, including the solar EUV, frictional heating and auroral precipitation, force the real thermosphere's density, ρ. Thermal conductivity serves to move the energy vertically. When trying to reproduce nature's physics with a model (GITM), there are assumptions that try to emulate the true relationships. The empirical formulations, boundary conditions and other model necessities result in error accumulation. This is seen when comparing the model estimated density, with in situ measurements, as shown in Figure 3.
Reducing the error (z) is ideally done by correctly implementing equations that accurately and completely capture all dynamics, boundary conditions and drivers within the model. Low error could also be obtained by incorrect physics within the models that cancel each other out, inadvertently matching the measurements. This can occur when multiple incomplete physics terms compensate for each other. For example, having too low solar EUV heating along with too high frictional heating at high-latitudes could result in an orbit-averaged mass density that is more or less correct. In the case of RCMR, intentionally adjusting thermal conductivity coefficient(s) changes the error by altering the thermal balance between sources and sinks.
In Figure 5, the top block represents the true physical system with real drivers and boundary conditions. In the real system, κ is driven by the states and dynamics, making a complex, nonlinear system. GITM approximates the drivers and boundary conditions as well as approximating the dependence of κ on the system state as described above (i.e., κ = ∑A i T s ). RCMR takes the difference between the "actual" ρ and the GITM-estimated , and alters the κ (through the values of A i and/or s) to minimize the difference. In order to validate the integration of RCMR within GITM, RCMR was used to estimate κ (A(O 2 , N 2 )) using simulated truth density data obtained from a GITM simulation with a known value of κ. The density data was recorded and serves as the satellite measurements. Next, GITM was re-run with an intentionally incorrect A(O 2 , N 2 ) and RCMR updated the estimate A(O 2 , N 2 ) using the simulated truth density data. If RCMR was implemented correctly, RCMR's estimated A(O 2 , N 2 ) would converge to the true value of A(O 2 , N 2 ) used to generate the simulated truth data, validating the technique. When this is true, it is a good indication that when actual truth data (i.e., CHAMP, Gravity Recovery and Climate Experiment (GRACE), MSIS) is used, the convergence will provide the real thermosphere thermal conductivity coefficients.

Automating the Model Debiasing Process via RCMR
RCMR estimates the thermal conductivity coefficients every 60 s using density measurements from the CHAMP and GRACE satellites as well as Naval Research Laboratory's (NRL) MSIS empirical model (Picone et al., 2002). In order to implement this, GITM was ran independent of RCMR to obtain global density values from 16 to 26 September 2002 forming a truth data set. The thermal conductivity coefficients of A(O) = 4.6 × 10 −4 J m −1 s −1 K −1 , A(O 2 , N 2 ) = 4.6 × 10 −4 J m −1 s −1 K −1 and the exponent s = 0.69 were used. In comparison to CHAMP satellite data, this provided a low-biased mass density result (Figure 3a).
The orbit of the CHAMP satellite was used to extract densities from the GITM run (ρ 4.6 ) at a 1 min cadence to match the update frequency of RCMR. Using GITM densities at the satellite-position as inputs for RCMR (see Figure 6), a GITM simulation was run again during the same time, but used RCMR to change the molecular coefficient. This work was different from Goel et al. (2020) which used the simulated global maximum, minimum and mean densities instead of the densities directly at the satellite position. The thermal conductivity coefficient A(O 2 , N 2 ) was initialized to 1.0 × 10 −4 J m −1 s −1 K −1 , while the A(O) and exponent S were held constant at their previously set values above. The densities modeled by GITM with RCMR is denoted as ρ RCMR . RCMR used the ρ 4.6 data and ρ RCMR data to compute an error (z) to update the thermal conductivity estimation while the simulation progressed. Figure 6 shows two RCMR simulations that demonstrate that the independent dynamic adjustments of A(O 2 , N 2 ) and s in RCMR debias GITM. In the left column, the error z decreased to zero, while A(O 2 , N 2 ) converged to 4.6 × 10 −4 J m −1 s −1 K −1 after around 3 days. The right column shows the same simulation where s was the free parameter for RCMR to estimate. It is expected that s would converge to 0.69 to match the thermal conductivity exponent used to generate the truth data. Instead, RCMR estimated a value closer to 0.72 which is due to different versions of GITM being used for the truth data run and that of the most recent RCMR run. From here forward, RCMR only updates s due to GITM's sensitivity to this parameter.
In addition to the truth data and RCMR-adjusted mass densities, the density and error is shown when the incorrect parameterization was used and not corrected. This provides a quantification of the level of improvement that can be gained using RCMR.
This example shows that RCMR can correct for an incorrectly set thermal coefficient, but model bias can be caused by a variety of issues, as described above. For a second example of idealized RCMR runs, illustrated in Figure 7, GITM was run with consistent thermal conductivity parameters but incorrect drivers.
F10.7, the daily solar flux at wavelength 10.7 cm, is a proxy for solar spectra (Richards et al., 1994). An alternative to the F10.7 proxy is using FISM to describe the spectrum (Chamberlin et al., 2008). Near real time and for predictions, F10.7 is approximate and one of the only ways to describe the solar spectrum. If F10.7 is not right or does not describe the spectrum correctly, model bias could result. This second test explores whether RCMR can compensate for an incorrect specification of the F10.7. The RCMR estimated parameter for this run and future runs was the exponent s, with an initial value of s as 0.1.
Similarly to the previous run, the truth data being used was an extraction of GITM results where the F10.7 was updated based on the actual F10.7, which varied from 190 to 150 solar flux units. The RCMR run was intentionally run with an incorrect constant F10.7 of 125 solar flux units. Over time, the RCMR-debiased run converged to the truth data and the error decreased dramatically. The time it took to converge was longer than the first test by roughly 2 days. This was due to the densities being similar between the two runs for the first 2 days despite the very different run settings. In this case, a low F10.7 incorrect driver caused a low density, having a negative bias. At the same time, a low initial value of s caused a high density since the thermal conduction would be reduced leading to a high temperature. In this case, a positive bias would result. In combination, the biases mostly canceled and RCMR was relatively ineffective for the first 2 days. After this, RCMR was able to track the error and produced an "s" that adequately compensated for the incorrect specification of F10.7.

RCMR With CHAMP and GRACE Satellite Densities
In the previous section, the simulated densities generated from a GITM run represented the "true" thermosphere.
In this section, tests of RCMR with real satellite data are described. Initial tests were done using data from September of the years 2002 and 2004 as sample months for high and moderate F10.7 fluxes, respectively, since these were used for manual debiasing earlier in the study. Both time periods had relatively low levels of activity, with |D st | being less than 50 nT during each time period. The error (middle row) and thermal conductivity coefficient (bottom row) from using simulation data at CHAMP locations at a 1 min cadence is shown in blue for the retrospective cost model refinement assisted run, red for a constant, purposefully biased, constant parameterization, and black for a constant parameterization matching the truth data parameters. The orbit averaged errors are shown with a thicker line of their corresponding color.
The estimation of the thermal conductivity exponent s was explored using CHAMP and GRACE individually. Figures 8 and 9 show the 16-26 September 2002 period comparing the results of GITM with a constant thermal conductivity to the RCMR adjusted values against the satellite observations.
The RCMR and non-RCMR runs both converge to the CHAMP and GRACE measurements. With RCMR, the convergence is much faster with large improvements in mass density after around two to 3 days. As observed in Figures 8 and 9, the free parameter s converged to 0.70 which is similar to the constant value of 0.69 used in a typical GITM run. This set of thermal conductivity coefficients (4.6e−4, 4.6e−4, 0.69) matched the results found in the manual debiasing process.
Normalized root mean square (nRMS) and percent error are shown on the bottom right of Figures 8 and 9 to quantify the improvement with RCMR. These values were computed based on orbit-averaged densities for the final 5 days of the run (marked as t 0 on the figure). This gave sufficient time for RCMR to debias the model and allow GITM to reach a roughly diurnally reproducible state. In Figure 9, the nRMS and percent difference show improvement of ±33% percent error and nRMS to less than 3%.
Switching to the time period in 2004, a similar simulation was performed using CHAMP data to check the robustness of RCMR under different solar conditions. The F10.7 was considerably lower for this run mostly being between 90 and 110 W m −2 Hz −1 , while the seasonality and geomagnetic activity was similar. Recall that debiasing between September 2002 and 2004 was possible with similar thermal conductivity coefficients, and so running this time period gave RCMR the opportunity to demonstrate this. As shown in Figure 10, the RCMR and non-RCMR mass densities converged to CHAMP measurements with RCMR reducing the time to converge by nearly 7 days. In the bottom subplot, the estimated thermal conductivity exponent converges to right around 0.70 which is consistent with the RCMR test performed in 2002 and the manual debiased simulations. nRMS and percent error were used to quantify the improvement with RCMR. They showed a much larger improvement from a roughly −20% error and nRMS to less than 5%.

Storm-Time Debiasing and Forecasting
In this section, GITM was debiased by RCMR before the storm in August 2005. The F10.7 for this time period was lower than the previous runs shown. , and retrospective cost model refinement (RCMR). In the middle subplot, the errors are plotted over one another to observe how RCMR compares to a constant thermal conductivity typically used in GITM. The bottom subplot shows the consequent thermal conductivity exponent estimated in blue. In red is the constant value used when RCMR was not applied. The local time of ascending node for CHAMP was 13.4 LT. The altitude range and orbital inclination of the real satellites during this time period are shown in Table 3. It varied between 70 and 100 W m −2 Hz −1 . Comparisons between the typical GITM run, a purposefully biased GITM run, an RCMR-assisted GITM run with purposefully biased F10.7, and CHAMP data were made in an effort to improve forecasting of density enhancements during and after the storm. Figure 11 shows the interplanetary magnetic field (IMF), solar wind velocity, hemispheric power, and Dst prior to and through the storm on 24-25 August 2005. During the quiet time, the Dst never went below ∼−25 nT, while the hemispheric power was quite low most of the time. On 24 August the IMF B z turned negative as well as the solar wind speed increasing dramatically. This drove a large increase in the aurora and a significant development of the ring current as indicated by the nearly −200 nT Dst.
The storm took place between 24 and 26 August 2005. In the RCMR run, the debiasing took place from 14 to 21 August. The run continued through the storm from 21 to 28 August without the assistance of RCMR. During the storm, the exponent "s" was held constant at its last value specified by RCMR on 21 August. In Figure 12, the debiasing was done prior to the storm using CHAMP measurements. As was done before, the densities, errors, and dynamic thermal conductivity exponent are shown in comparison to the static runs.
As expected, the biased run with a constant F10.7 of 150 W m −2 Hz −1 was very different than the CHAMP measurements and a GITM run using real F10.7 measurements. It is important to point out that the parameter estimation from RCMR showed that the best exponent s was around 0.8 which was considerably larger than the other runs. The F10.7 of 150 W m −2 Hz −1 is higher than the true conditions artificially increasing mass densities.
To counteract this, an increased thermal conductivity was needed to dissipate this excess energy, reducing the mass density.  Figure 13 shows the runs proceeding through the storm and storm recovery. For the 3 days after RCMR was turned off, the densities stayed debiased. The storm was better represented because of this, although GITM with RCMR under predicted the storm response during the peaks. This is most likely due to the increased thermal conductivity, which pulled energy out of the thermosphere too quickly during the storms. This is relatively minor compared to the biased model results though. The RCMR run matched the recovery density after the storm quite well. Additional performance assessment metrics are shown in Table 4. The formulations for each metric is shown in Kalafatoglu Eyiguler et al. (2019). When comparing the RCMR run to the biased run, the RCMR run performed better in every metric. Each of these statistics help quantify the improvements that can be had to the mean and variability of the mass densities.
On the other hand, the calibrated model of GITM also performed better than the biased run. Comparing the RCMR run and the calibrated model of GITM, the Ratio avg of the default GITM simulation performed better than the RCMR run. RCMR was capable of improving the time delay (TD) of the storm peak, the mean average error, and nRMS error (NRMSE).

Debiasing Using an Empirical Model
Satellite measurements of the thermosphere are not always available, especially during real-time operations. For this reason, an empirical model such as MSIS may be useful as a source of "truth data." Whereas empirical models are not always skilled at correctly predicting highly perturbed events, like solar storms, they are useful for obtaining information on the background state. Further, satellite orbits may not be ideally placed to represent the global conditions, while an empirical model can be sampled anywhere (or everywhere). While satellite data is the ideal choice for debiasing, using an empirical model may help in some situations. For these reasons, a final test was run to attempt to debias GITM under conditions where satellite data was (in theory) not available.
In this run, MSIS mass densities at the subsolar point at 400 km altitude were used as the source of "truth data." The same time period in August 2005 was used for this. RCMR was allowed to debias GITM for 7 days and then proceed through the storm. During the storm, RCMR was turned off and the storm-time performance evaluation of GITM was checked against CHAMP data, as in the previous case.
In Figure 14 shows the mass density for different runs at the subsolar point at 400 km altitude, which is where the MSIS data was extracted. The biased run (labeled ρ 0.69 ) and RCMR run no longer had error induced by the F10.7. The only source of error in the RCMR run (ρ RCMR ) was the initial value of 0.1 given to the thermal conductivity exponent s. The thermal conductivity exponent s in the wrong tuning run was 0.69, whereas the best tuning had an exponent of 0.72 (ρ 0.72 ). At the 400 km, subsolar point each run converged to MSIS results within a few days of the run. As shown in the bottom subplot, RCMR estimated the "s" to be 0.71, using the MSIS results. Figure 15 shows the same runs proceeding through the storm and storm recovery, but now at the CHAMP positions. These densities are quite different than the subsolar density, since CHAMP is a high inclination satellite sampling the high latitudes, where the energy balance can be quite different. In this case, the biased run performed worst of the three runs. In Table 4, the same performance tools from Kalafatoglu Eyiguler et al. (2019) are shown. The RCMR run performed similarly or better than the biased run, but considerably worse than the calibrated GITM run. This is due to the difference between MSIS and CHAMP during the proceeding time period. Since RCMR was debiasing toward MSIS, the debiasing improvement is subject to the accuracy of MSIS. It is possible that debiasing with MSIS at locations other than at 400 km altitude at the subsolar point could improve this, but it was not explored in this work. This simulation does show that debiasing with an empirical model improves the performance of the biased model, but then is subject to other limitations.

Summary and Conclusion
In this work, GITM used RCMR with CHAMP and GRACE satellite measurements to correct for uncertain parameters and incorrect drivers. During these runs, it was shown that after sufficient error accumulation, RCMR was able to reduce the bulk of the error and nRMS to below 5% within 2-3 days. This work also showed the effectiveness of debiasing GITM prior to a storm in August 2005 with CHAMP measurements and MSIS. When debiasing was applied before a storm, the results during the storm were shown to improve in all metrics except the TD between a measured storm peak and the model-predicted peak (where they performed identically with and without RCMR). It was demonstrated that RCMR could use empirical models within GITM to debias the model, but this was reliant on MSIS results having low error during the pre-storm time period and the choice of where to sample the empirical model. Future work will show more runs and have a statistical approach to address how beneficial using MSIS for parameter estimation can be. This work also implied that reducing the model bias improved the forecasting performance along a specified path. Lower model bias in neutral densities help make GITM a more feasible model to assist in satellite drag calculations. Getting the acceleration due to drag correct is important for accurate estimation of future satellite positions and potential collision detection and prevention.  Note. The first two are dimensionless quantities. TD is the time difference between storm peak as seen from data and from the model computed in hours. The mean average error (MAE) has units of kg/m 3 . The normalized root mean square error (NRMSE) is shown as a percentage. The prediction efficiency (PE) is also a non-dimensional statistic. The columns are separated by run-type the first three columns being associated with debiasing with CHAMP data and the final three columns are associated with debiasing with MSIS. Figure 14. The densities and errors compared to Mass Spectrometer and Incoherent Scatter Radar at the 400 km altitude sub-solar point during August 2005 with retrospective cost model refinement (RCMR) on (blue) and RCMR off in two conditions. One run is with manually calibrated thermal conductivity values included (orange) and the other is with a constant, biased thermal conductivity exponent of 0.69 (red). The bottom subplot shows the consequent thermal conductivity coefficient estimated.