Cosmology with the Wide-Field Infrared Survey Telescope -- Multi-Probe Strategies

We simulate the scientific performance of the Wide-Field Infrared Survey Telescope (WFIRST) High Latitude Survey (HLS) on dark energy and modified gravity. The 1.6 year HLS Reference survey is currently envisioned to image 2000 deg$^2$ in multiple bands to a depth of $\sim$26.5 in Y, J, H and to cover the same area with slit-less spectroscopy beyond z=3. The combination of deep, multi-band photometry and deep spectroscopy will allow scientists to measure the growth and geometry of the Universe through a variety of cosmological probes (e.g., weak lensing, galaxy clusters, galaxy clustering, BAO, Type Ia supernova) and, equally, it will allow an exquisite control of observational and astrophysical systematic effects. In this paper we explore multi-probe strategies that can be implemented given WFIRST's instrument capabilities. We model cosmological probes individually and jointly and account for correlated systematics and statistical uncertainties due to the higher order moments of the density field. We explore different levels of observational systematics for the WFIRST survey (photo-z and shear calibration) and ultimately run a joint likelihood analysis in N-dim parameter space. We find that the WFIRST reference survey alone (no external data sets) can achieve a standard dark energy FoM of>300 when including all probes. This assumes no information from external data sets and realistic assumptions for systematics. Our study of the HLS reference survey should be seen as part of a future community driven effort to simulate and optimize the science return of WFIRST.


INTRODUCTION
In the current ΛCDM paradigm cosmic acceleration is caused by the Λ-term in the Einstein field equations (Einstein 1917). In terms of physical interpretation, Λ can be associated with the Universe's geometry or it can describe a new energy component of the universe, so-called dark energy. In 1998 two teams (Riess et al. 1998;Perlmutter et al. 1999) measured the energy density of Λ, Ω Λ , to be consistent with a value close to 0.7. To date, the science community lacks a convincing physics model for cosmic acceleration; constraining its properties and testing it against alternative theories is one of the main science drivers of ongoing and future surveys.
Major progress on this topic is made by the current (Stage 3) generation of photometric surveys, such as Kilo-Degree Survey (KiDS 1 ), the Hyper Suprime Cam (HSC 2 ), the Dark Energy Survey (DES 3 ) and spectroscopic surveys, such as the Baryon Oscillation Spectroscopic Survey (BOSS 4 ). These low redshift constraints of the (ΛCDM) model can be contrasted with CMB measurements from the early Universe made e.g., by the Planck 5 satellite, the Atacama Cosmology Telescope (ACT 6 ), and the South Pole Telescope (SPT 7 ). An emerging tension between these high and low redshift (ΛCDM) constraints may be indicative of new physics.
The potential tension between measurements, and with it the probability to discover new physics, increases with decreasing statistical uncertainty and better systematics control. With the advent of so-called Stage 4 surveys, e.g., the Dark Energy Spectroscopic Instrument (DESI, DESI Collaboration et al. 2016), the Prime Focus Spectrograph (PFS, Takada et al. 2014), the Large Synoptic Survey Telescope (LSST 8 , Ivezić et al. 2019), Euclid 9 (Laureijs et al. 2011), the Spectro-Photometer for the History of the Universe, Epoch of Reionization, and Ices Explorer (SPHEREx 10 , Doré et al. 2014), and the 4-metre Multi-Object Spectroscopic Telescope (4MOST, de Jong 2019) the science community can expect an abundance of data to study the late-time Universe at increased precision. Similarly, the next generation of CMB surveys, such as the Simons Observatory (SO, Ade et al. 2019) and CMB-S4 (Abazajian et al. 2016) will enable us to contrast high and low redshift at increased precision and to combine information from both eras to increase the constraining power on cosmological models.
The Wide-Field Infrared Survey Telescope (WFIRST 11 , Spergel et al. 2015) is a successor mission to NASA's groundbreaking telescope endeavors such as the Hubble Space Telescope (HST 12 ), the Spitzer Space Telescope 13 , and in the near future the James Webb Space Telescope (JWST 14 ). WFIRST's science portfolio ranges from exoplanets to astrophysics to cosmology, building on a variety of standalone survey components: a microlensing survey, direct imaging of exoplanets, a supernovae survey, a guest observer program, and the High Latitude Survey (HLS). The latter Figure 1. An example WFIRST survey strategy as taken from the SDT 2015 report (Spergel et al. 2015) and computed from the ETC v0.13 (Hirata et al. 2012). The individual survey components are colored into the timeline graphic: blue for the SN survey, Magenta for Microlensing, Red and Yellow for the HLIS and HLSS, respectively. The remaining time will be allocated as guest observer proposals to the science community.
is the main focus of this paper, in particular, we aim to quantify the HLS' constraining power on physics driving the late-time accelerated expansion of the Universe through a combination of multi-band imaging and spectroscopy.
WFIRST is designed as a highly versatile missions that can flexibly react to findings of the aforementioned surveys. Its launch is planned for the mid 2020's into an L2 orbit with a nominal mission length of 5 years, however, this primary survey can be extended given that there are no consumables that prevent a 10+ year mission. The exact composition of the survey, i.e. the time allocation for the different science cases and the survey strategy within each science case is one of the most important topics that the community will discuss over the coming years prior to launch. Figure 1 shows an example WFIRST survey scenario composed of a 1.6 year High Latitude Survey (HLS), 6 months of SN observations distributed over 2 years, an exoplanet and microlensing survey component, and a competed guest observer program that encompasses 25% of the overall observing time. For the purpose of this paper we mainly focus on the HLS component, which can be divided further into the HLIS (High Latitude Imaging Survey) and the HLSS (High Latitude Spectroscopic Survey).
The reference survey of the WFIRST HLS covers 2000 deg 2 with high-resolution, multi-band photometric imaging in four near infrared bands (HLIS) and deep grism spectroscopy (HLSS). This combination allows us to measure a variety of cosmological probes, e.g. weak lensing, galaxy clustering, galaxy clusters, redshift space distortions (RSD), Baryon Acoustic Oscillations (BAO). Together with the supernova survey, the reference HLS is designed to con-trol systematics with minimal uncertainties; it will place tight constraints on the expansion history and structure growth in the Universe addressing questions about the nature of cosmic acceleration, neutrino physics, modified gravity, and dark matter.
In this paper we develop a framework to simulate multi-probe strategies specifically for WFIRST. We outline the top-level concepts of combining cosmological probes including inference and covariance implementation in Sect. 2, where we also show the main results of the paper, i.e. the WFIRST forecast that includes weak lensing, galaxy-galaxy lensing, galaxy clustering (photometric and spectroscopic), galaxy clusters number counts, cluster weak lensing, and SNIa. We consider subsets of this joint analysis and explore the impact of systematics in Sects. 3, 4, 5. We conclude in Sect. 6.

MULTI-PROBE LIKELIHOOD ANALYSES
Contrasting and subsequently combining multiple probes is one of the most promising avenues to constrain cosmology: different probes are sensitive to different physics in the Universe, and they are affected differently by astrophysical uncertainties and observational systematics. Corresponding multi-probe strategies are relatively straightforward to implement if the observables are independent, e.g. when combining CMB temperature and polarization with BAO and SNIa, however, the story is much more complex when combining correlated probes. In the latter case one cannot simply combine the most sophisticated version of the single probe analyses a posteriori, but instead the analysis requires a joint covariances matrix that includes the statistical correlations and one must ensure the consistent modeling of systematics that affect the probes considered.
WFIRST's combination of spectroscopic and imaging instrumentation enables measuring a variety of LSS probes, such as weak lensing, galaxy clusters, galaxy clustering, and SNIa. The latter can be treated as independent information, though SN magnification in overdense regions could become non-negligible at some point in the future. The other probes however are tracers of the same underlying density field, where modes are significantly correlated due to nonlinear evolution of the late time density field. A corresponding likelihood analysis requires a multi-probe covariance matrix.

Analysis Choices
Designing a multi-probe analysis for the galaxies observed with the WFIRST reference survey can be broadly split into the following steps: (i) Choose broad categories of cosmological probes that are to be combined: For our WFIRST reference survey these are weak lensing, galaxy clusters, galaxy clustering (photometric and spectroscopic).
(ii) Define specific probe combinations and summary statistics that make up the data points of the data vector, which in our case are one-point functions and two-point functions that represent the corresponding probes. We do not consider higher-order correlation functions.
(iii) Define the galaxy samples that are associated with the aforementioned probes. We use the WFIRST exposure time calculator (ETC) (Hirata et al. 2012) to compute realistic survey scenarios for WFIRST's coverage of area and depth in a given band. We fix the time per exposure and vary the number of exposures to build up depth over the survey area of a given scenario. For the HLS Reference Survey this area is 2,000 deg 2 . The total survey time for a given number of exposures includes a simple prescription for overheads and is correct to approximately 10%. In order to obtain accurate redshift distributions we closely follow Hemmati et al. (2019) in applying the ETC results to the CANDELS data set (Grogin et al. 2011;Koekemoer et al. 2011), which is the only data set available that is sufficiently deep in the near-infrared to model WFIRST observations. The ETC has a built-in option to obtain a weak lensing catalog based on an input catalog of detected sources. The criteria for galaxies to be considered suitable for weak lensing are S/N>18 (J+H band combined, matched filter), ellipticity dispersion σ < 0.2, and resolution factor R>0.4, where we used the Bernstein & Jarvis (2002) We apply these selections to the CANDELS catalog and obtain our source sample for the WFIRST HLS 4 NIR band survey. For the lens sample we select CANDELS galaxies with S/N > 10 in each of the 4 WFIRST bands. Our WFIRST analysis assumes LSST photometry from the ground, hence we further down-select both samples by imposing a S/N > 5 cut in each LSST band except for u-band (we note that 50% of our galaxy sample has S/N > 5 in the u-band as well).

9.
10. Figure 3. The multi-probe covariance matrix for the HLIS survey, calculated under the Limber approximation, where we have highlighted some parts of the matrix to illustrate the correlation structure: (1) depicts the cosmic shear covariance matrix, comprised of 55 tomographic combinations of source bins, each with 20 fourier l-bins. (5) shows one of the tomographic combinations, and the individual l 1 , l 2 elements are clearly visible. (2) is the galaxy-galaxy lensing tomography covariance with (8) being the galaxy-galaxy combinations of the 4 th lens bin with all the non-overlapping source bins at higher redshifts. (3) is the clustering auto-probe matrix with 10 tomographic bins. (4) corresponds to the cluster number counts auto-probe matrix, which is comprised of 4 cluster redshift bins each with 4 richness bins (hardly distinguishable within in the 4 yellow squares). (5) is the auto-probe covariance of the cluster weak lensing part of the data vector, which uses the 4 cluster redshift bins as lens bins and the source sample as source bins. (10) zooms into the covariance of the 4 th cluster redshift bin, which again is split into 4 richness bins, all of which are then correlated with the highest 4 source galaxy redshift bins. One can see that the diagonal structure consists of 16 blocks that are each composed of 5x5 elements. The latter correspond to the covariance of the 5 cluster weak lensing l-bins, which range from l ∈ [4000 − 15000]. Zoom-in box (6) is a zoom into the first tomographic bin combination cosmic shear covariance matrix, (7) shows the cross-probe covariance of cosmic shear and galaxy-galaxy lensing. The impact of the k max scale cuts causes the blocks to be non-quadratic. The Limber approximation leads to non-Gaussian terms only for specific combinations of lens and source tomographic bins (all 3 source bins need to be behind the lens bin). (9) is the cross-probe covariance of galaxy clustering and cluster number counts, which only has non-zero elements when both probes overlap in redshift, i.e. in the range z ∈ [0.2 − 1.2]. The shape of the yellow rectangles is determined by the number of l-bins used in the clustering data vector, i.e. 20, and the number of richness bins in cluster number counts, i.e. four.
where Ω s is the WFIRST survey area. We impose a z min = 0.25 for the lens sample and define 10 tomographic bins for each sample such thatn i x =n x /10. These tomographic bins are then convolved with a Gaussian distribution, which is further described in Sect. 3.3.
We consider two different Gaussian photo-z scenarios: an optimistic variation with mean zero and narrow width of σ z = 0.01 and a more pessimistic scenario with broader kernel of σ z = 0.05. The resulting redshift distributions are depicted in Fig. 2.
• Source galaxy sample, for which we require position, photometric redshift, and galaxy shape measurements.
• Lens galaxy sample, for which we require position and photometric redshift measurements.
• Galaxy clusters, for which we require position, photometric redshift and optical richness estimates for galaxy clusters that are identified in the overall galaxy catalog. • Spectroscopic galaxy sample, which requires measurements of positions and spectroscopic redshifts.
(iv) Define exact analysis choices: Given that we are looking at 2-point functions as summary statistics, we need to decide on the exact auto and cross-galaxy samples that constitute a cosmological probe. Further, we need to define the exact binning within each Individual probes considered in this analysis, i.e. weak lensing, galaxy clustering, galaxy cluster number counts calibrated through cluster weak lensing, redshift space distortions power spectra including the BAO scale, and SNIa. Right: Multi-probe analyses starting from weak lensing only, then adding clustering and galaxy-galaxy lensing (3x2), then adding cluster number counts and cluster weak lensing, then adding RSD and BAO information, and lastly adding in SNIa based on the findings of (Hounsell et al. 2018). The FoMs for the individual and multi-probe chains can be found in Table 1. probe, in particular which angular scales and tomographic redshift binning are considered. The decision tree for these choices is complex and takes into account our ability to accurately model physics and systematics at specific angular scales and redshifts, and in particular our ability to model the correlations across all data points in the covariance matrix. For the WFIRST data vector that we use to simulate the HLS Reference Survey, we choose: • Source galaxies -cosmic shear: In terms of angular binning we universally choose 25 logarithmically spaced Fourier mode bins ranging from l min = 30 to l max = 15000 for all two-point functions in our data vector, however we impose different scale cuts for the different probes. The idea of universal binning across probes is driven by the desire to avoid computing cross-covariances of probes with different l-binning. For the cosmic shear part of the data vector we impose a scale cut of l max (cosmic shear) = 4000, which leaves 20 bins that carry information. The ten tomographic bins translate into 55 auto-and cross power spectra.
• Lens galaxies -photometric clustering: The redshift distribution for the lens sample is further detailed in Sect. 3.3 and divided into 10 tomographic bins. We exclude l−bins, if scales below R min = 2π/k max = 21 Mpc/h contribute to the Limber integral (see Eq. (5)), which imposes a redshift dependent scale cut in the l-binning.
• Lens × source galaxies -photometric galaxy-galaxy lensing: The galaxy-galaxy lensing part of the data vector assumes the lens galaxy sample as foreground and the source galaxy sample as background galaxies; we only consider source-lens combinations where the source bin is fully behind the lens bin in redshift. We again impose a cut-off at R min = 21Mpc/h.
• Galaxy cluster number counts: This is the one one-point function we include in our data vector. We split our cluster sample into four cluster redshift bins (0.4-0.6,0.6-0.8,0.8-1.0,1.0-1.2) and 4 cluster richness bins between λ min = 40 and λ max = 220 in each redshift bin.
• Galaxy clusters × source galaxies -cluster weak lensing: In order to calibrate the cluster mass-richness relation (Eq. 26), we consider the stacked weak lensing signal from all combinations of cluster redshift and richness bins with source galaxies, with the restriction that source galaxies are located at higher redshift than the galaxy clusters. Specifically, we use the cluster lensing power spectrum in the angular range 4000 < l < 15000, which corresponds mostly to the 1-halo cluster lensing signal.
• Spectroscopic × spectroscopic -spectroscopic galaxy clustering: While our analysis considers all cross-covariance terms for the 5 cosmological probes above, WFIRST's spectroscopic clustering is treated as an independent probe whose cosmological information is determined separately and added a posteriori. This is an approximation, however the derivation of a 2D+3D joint covariance is beyond the scope of this paper and deferred to future work. Our spectroscopic clustering data vector is comprised of 3D power spectrum fourier modes P(k, µ) and we select 100 logarithmic bins ranging from k min = 0.001 to k max = 0.3 h/Mpc, 10 linearly spaced µ bins from 0 − 1.0, and 7 density weighted redshift bins that start at 0.83 and range out to 3.7. This data vector captures both the BAO and RSD information.

Inference, Likelihoods, Covariances
Given the data vector D, we sample the joint parameter space of cosmological p c and nuisance parameters p n using the emcee 15 (Foreman-Mackey et al. 2013), which is based on the affine-invariant sampler of Goodman & Weare (2010). At each step we compute the posterior using Bayes' theorem (3) P r (p c , p n ) denotes the prior probability which in our case is based on the WFIRST SNIa survey forecast from (Hounsell et al. 2018). Specifically, we reran the "Imaging: Allz (optimistic)" scenario (c.f. Sect. 5.4 and Table 13 in Hounsell et al. 2018) centered it on the fiducial cosmology of our analysis (see Table 2). We did not include any information from CMB or BAO experiments, which explains the different contours compared to (Hounsell et al. 2018). The cosmological information from the HLS enters our simulations through the second term in Eq. (3), i.e. the likelihood, . We assume that the errors of this data vector are distributed as a multi-variate Gaussian terms reflecting our approximation that the cosmological information from HLSS and HLIS is independent. We note that future work should explore correlations between HLIS and HLSS and develop a joint covariance matrix for these measurements. N is a normalization constant.
Based on the analysis choices (probes, redshifts, scales) described in Sect. 2.1 we compute the data vectors and covariance matrices for HLIS and HLSS at the fiducial cosmology and systematics parameters (see Tables 2, 5, 7, for the different probes). In case of the HLSS survey the covariance matrix is diagonal and further described in Sect. 5, in case of the HLIS the matrix has significant off-diagonal terms. Figure 3 illustrates the structure of the matrix with the autoprobe matrices denoted as numbers 1-5 corresponding to cosmic shear (1), galaxy-galaxy lensing (2), galaxy clustering (3), cluster number counts(4), and cluster weak lensing (5). Calculation of the individual terms of the covariance can be found in the Appendix (Eqs: A2-A14) of .
Since this covariance matrix is calculated analytically and not estimated from either simulations or data, it can be considered noisefree and is easily invertible. It does not inherently limit the number of data points that can enter our analysis, which would be the case if the covariance were computed from a limited set of realizations (see e.g., Taylor et al. 2013;Dodelson & Schneider 2013, for details on these constraints).

COSMIC SHEAR AND GALAXY CLUSTERING
We start exploring WFIRST multi-probe analyses by looking at the HLIS weak lensing and photometric galaxy clustering probes, which when combined with galaxy-galaxy lensing form a so-called 3x2pt analysis. Here, we summarize the computation of angular (cross) power spectra for the different probes and the computation of galaxy cluster number counts. We use capital Roman subscripts to denote observables, A, B ∈ κ, δ g , δ λ α , where κ references lensing, δ g the density contrast of (lens) galaxies. The density contrast of galaxy clusters in richness bin α, δ λ α , will be considered in Sect. 4.

Modeling of observables
We calculate the angular power spectrum between redshift bin i of observable A and redshift bin j of observables B at projected Fourier Table 2. Fiducial parameters, flat priors (min, max) for cosmology and galaxy bias, and Gaussian priors (µ, σ) for observational systematics. We consider optimistic and pessimistic scenarios in this paper, which is indicated in the corresponding sections of the   mode l, C i j AB (l), using the Limber and flat sky approximations (we refer to e.g. Fang et al. 2019, for the potential impact when analyzing data): where χ is the comoving distance, q i A ( χ) are weight functions of the different observables given in Eqs. (6-7), and P AB (k, z) the , Ω m and S 8 (middle), and on modified gravity parameters Σ 0 and µ 0 for optimistic and pessimistic systematics scenarios for a 3x2 analysis. Note that the likelihood analysis in the left two panels assume GR to be the correct theory, only in the right panel we vary Σ 0 and µ 0 . The relative loss in information depicted here is quantified as FoMs in Table 3. three dimensional, probe-specific power spectra detailed below. The weight function for the projected galaxy density in redshift bin , is given the normalized comoving distance probability of galaxies in this redshift bin with n i lens (z) the redshift distribution of galaxies in (photometric) galaxy redshift bin i (c.f. Eq. 17), andn i lens the angular number densities of galaxies in this redshift bin (c.f. Eq. 1). For the convergence field, the weight function q i κ ( χ) is the lens efficiency, with n i source (z) the the redshift distribution of source galaxies in (photometric) source redshift bin i (Eq. 17),n i source the angular number densities of source galaxies in this redshift bin (Eq. 1), and a( χ) the scale factor.
The three-dimensional power spectra P AB (k, z) can be expressed through the matter density power spectrum P mm (k, z). For the purpose of this section P mm (k, z) corresponds to the density power spectrum P δδ (k, z), where we use the Takahashi et al. (2012) fitting formula to model nonlinear evolution. Noting that P AB = P B A , we describe the different cases in Eqs. (8,9,23). For A = κ, this is trivial, For quantities related to the galaxy density, we note that we only consider the large-scale galaxy distribution, where it is valid to assume that the galaxy density contrast on these scales can be approximated as the non-linear matter density contrast times an effective galaxy bias parameter b g (z)

Modified Gravity modeling
Since there is no compelling model of modified gravity, we adopt phenomenological modified gravity parameters (µ 0 , Σ 0 ) which we define similar as e.g., Simpson et al. (2013).
In this parameterization the expressions for the Newtonian potential Ψ and the curvature potential Φ that govern the perturbed Friedmann-Robertson-Walker metric are altered. Within general relativity Ψ = Φ holds. The (µ, Σ) parameters give additional freedom to the Newtonian gravitational potential Ψ experienced by non-relativistic particles and the lensing potential (Φ + Ψ) experienced by relativistic particles, specifically We assume that µ(a) and Σ(a) are both scale independent. Furthermore, since their motivation was to explain the dark energy phenomenon, we assume that the modified gravity parameters scale with the dark energy density, i.e., where Ω Λ is the present day dark energy density. Note that in the case of general relativity, µ 0 = Σ 0 = 0. The µ 0 parameter modifies the growth of linear density perturbation such that which changes the growth function, and consequently the densitydensity power spectrum P δδ and all projected power spectra described in Eq. (5).
The Σ 0 parameter only affects lensing related quantities, which in a 3x2pt analysis means the galaxy-shear and shear-shear power spectrum. Specifically, Eq. (5) is modified as  Fig. 6 as a baseline. The center panel shows the difference when only considering photo-z uncertainties and the right panel shows results when only considering shear calibration uncertainties. There are two main findings: 1) In the optimistic scenario, shear calibration and photo-z uncertainties are equally (un)important; 2) In the pessimistic case, we find that photo-z uncertainties are a significantly larger contribution to the systematics budget compared to shear calibration.

Systematics
We parameterize uncertainties arising from systematics through nuisance parameters, which are summarized with their fiducial values and priors in Table 2. Our default likelihood analysis includes the following systematics: Photometric redshift uncertainties The true redshift distribution as measured from the CANDELS data (c.f. Fig. 2) is convolved with a Gaussian photometric redshift uncertainty model to obtain the distribution within tomographic bin i where p z ph |z, x is the probability distribution of z ph at given true redshift z for galaxies from population x The resulting Gaussian tomographic bin is parameterized through scatter σ z (z) and bias between z − z ph , i.e. ∆ i z (z). The bias ∆ i z (z) has fiducial value of zero; the fiducial value for σ z is assumed to be the same for the lens and source sample and we choose σ z = 0.01 for the optimistic and σ z = 0.05 for the pessimistic scenario. The resulting distributions are shown in Fig. 2.
In this analysis we only consider Gaussian photometric redshift uncertainties, which are characterized by scatter σ z (z) and bias ∆ z (z). While these may in general be arbitrary functions, we further assume that the scatter can be described by the simple redshift scaling σ z, x (1 + z) and allow one (constant) bias parameter ∆ i z, x per redshift bin. For our 10 lens and source galaxy redshift bins, this model results in 22 parameters describing photo-z uncertainty, 10 photo-z bias, and one photo-z scatter parameter for each lens and source sample. Linear galaxy bias is described by one nuisance parameter per tomographic lens galaxy redshift bin, which is marginalized over using conservative flat priors in a likelihood analysis. The fiducial values of galaxy bias in lens bin i follow the simple description 1.3 + i × 0.1. We note that the actual fiducial value is not important for the constraining power; important is the range over which we marginalize (flat priors from 0.8-3.0) and the fact that we use one free parameter per redshift bin instead of a parameterized redshift evolution.
Future efforts should investigate several aspects of galaxy bias: 1) perturbative or simulation based parameterizations that allow the analyst to push to smaller scales; 2) improved parameterizations, in particular such that parameterize the redshift evolution with fewer parameters; 3) informative priors.
Multiplicative shear calibration is modeled using one parameter m i per redshift bin, which affects cosmic shear and galaxy-galaxy lensing power spectra via where the cluster lensing power spectra are affected analogously to the galaxy-galaxy lensing spectra. We marginalize over each m i independently with Gaussian priors (10 parameters). Similar to the photo-z scenarios we are looking at optimistic and pessimistic prior information shear calibration (which can come from either simulations or external data such as in Schaan et al. 2017).
Other systematics In this paper we only consider observational uncertainties (and galaxy bias), but neglect astrophysical systematics most notably baryonic physics uncertainties ( In the context of 3x2pt analyses for WFIRST and LSST, we explore the impact of baryonic physics and intrinsic alignment in a companion paper (Eifler et al. 2020).

GALAXY CLUSTERS
This section summarizes the halo model for galaxy cluster observables employed in this analysis. We consider galaxy clusters stacked in bins of optical richness, λ α , and relate their properties to dark matter halos using the probability distribution function p(ln λ|M, z), which describes the probability that a dark matter halo of mass M at redshift z hosts a cluster with richness λ. We will specify and explain our specific choice of cluster mass observable relation (MOR) further in Sect. 4.2. Throughout this paper we define halo properties using the overdensity ∆ = 200, which is defined with respect to the mean matter density, and employ the Tinker et al. (2010) fitting function for the halo mass function.

Modeling of observables Cluster Number Counts
The expected cluster count in richness bin α, with λ α,min < λ < λ α,max , and redshift bin i with z i λ,min < z < z i λ,max is given by where d 2 V/dzdΩ is the comoving volume element, dn/dM the halo mass function in comoving units for which we omitted the redshift dependence.
Galaxy cluster weak lensing Starting again from the Limber and flat-sky expression for projected power spectra, i.e. Eq. (5) we can express the weight function for the projected cluster density similar to Eqs. (6, 7) with Θ(x) the Heaviside step function. Note, that we neglect variations of the cluster selection function within redshift bins, as well as uncertainties in the cluster redshift estimate. Within the halo model, the cross power spectrum between cluster centers and matter density contrast can be written as the usual sum of two-and one-halo term, with P lin (k, z) the linear matter power spectrum. The mean linear bias of clusters in richness bin α reads where b h (M) the halo bias relation, for which we use the fitting function of Tinker et al. (2010). The Fourier transform of the radial matter density profile within a halo of mass M,ũ m (k, M), is modeled assuming the Navarro-Frenk-White (NFW) profile (Navarro Table 5. Fiducial parameters, flat priors (min, max), and Gaussian priors centered on the fiducial value with the σ given in brackets.

Systematics
Cluster mass-observable relation We chose to implement the MOR scatter defined in Murata et al. (2019) and further extend their parameterization to account for possible redshift dependence in the scatter of the mass-richness relation. Specifically, we assume a log-normal distribution with massand redshift-dependent mean and scatter σ ln The mean relation is defined as with normalization A, slope B, redshift dependence C, and the pivot mass M pivot = 3 × 10 14 M /h. The mass-and redshift dependent MOR scatter is defined as We assume fiducial values for (A, B, σ 0 , q M ) = (3.207, 0.993, 0.456, 0.0), which correspond to the findings in Murata et al. (2019). For the redshift-dependent MOR parameters which are newly introduced in this paper (C and q z ) we assume fiducial values of 0.
Our fiducial priors for σ 0 and q M are from the posterior distributions derived in Murata et al. (2019), i.e., a Gaussian prior centered at the fiducial values described above and with the width of 0.045 and 0.03, respectively, and a prior for q z is centered at 0 with the broader width of 0.1.
We note that this is conservative, since prior information on the MOR is expected to grow substantially in the coming years, nearterm with the full HSC survey, which will be one of the deepest imaging surveys yielding the most stringent constraints on galaxy cluster physics before the LSST and WFIRST era.
For example, the full HSC survey will have 20,000 opticallyselected clusters with a mean galaxy density of background sources of 20 arcmin −2 . Scaling the product of the number of clusters and the source number density in Murata et al. (2019), 8,000 clusters and 1 arcmin −2 , respectively, to the product of these numbers for the full HSC survey, translates into a factor of 7 improvement on In Fig. 7 we investigate the gain in constraining power for a perfectly known MOR, i.e. when fixing all the parameters in Table 5 to their fiducial values. The gain in information from blue contours to red serves as an upper limit for this particular choice of MOR parameterization. We note that we expected a larger improvement when assuming perfect knowledge of the MOR but we note that the redshift scaling in Eq. (26) is likely the reason to diminish the science return on dark energy.
Studying the most promising cluster MOR parameterization to optimize the cluster cosmology component of the WFIRST survey further will be important future work as the mission preparation progresses.
Other systematics We do not consider galaxy cluster miscentering, assembly bias and stochasticity in this paper but instead postpone studies of these effects to future work.

THE HIGH LATITUDE SPECTROSCOPIC SURVEY
In this section, we study the trade space of area versus depth for the High Latitude Spectroscopic Survey, starting from a baseline survey of 2000 deg 2 and a wavelength range of 1.05-1.85 microns. The section is split into two parts, where the first part focuses on dark energy parameter constraints using MCMC and the second part is a Fisher analysis of how well WFIRST will be able to measure the BAO scale s and the parameter combination f σ 8 for RSD. The assumptions and systematics modeling differ slightly but are clearly explained in each subsection.

Dark energy forecasts
We use the WFIRST exposure time calculator (ETC) version 16 of Hirata et al. (2012) to compute galaxy densities and redshift distributions for our baseline scenario (c.f. Table 6) and then consider doubling (halving) the survey area, doubling (halving) the galaxy number density, and decreasing the minimum scale which we include in our analysis (see Fig. 8).
Following (Seo & Eisenstein 2003;Wang et al. 2013) we model the cosmological information from redshift space distortions (RSD) and Baryon Acoustic Oscillations (BAO) through features in the  Figure 8. The impact of variations in area, depth, and scales to which we assume to be able to model P δ (k) for the HLSS part of the reference survey (0.6 months). We summarize the FoMs in Table 8. Table 7. Spectroscopic Survey: Fiducial parameters, flat priors (min, max), and Gaussian priors centered on the fiducial value with the σ given in brackets.

Parameter
Fiducial Prior where we assume that the 3D Fourier mode k can be decomposed into a line-of-sight k and a transverse k ⊥ component with µ = k /|k| as the cosine of the angle between the 3D vector and the line-of sight. The arguments for the observed power spectrum k ref ⊥ and k ref are computed at a reference cosmology, indicated through the superscript ref . In order to relate the observed power spectrum to the true underlying power spectrum a correction factor which accounts for the volume difference between the two cosmologies is introduced. The P shot term describes residual uncertainties that remain after subtracting the shot noise term computed from the inverse number density of galaxies. These residuals occur, e.g., because of galaxy clustering bias (Seljak 2000). Equation (28) accounts for residual redshift uncertainty in our measurement, e.g. from fitting emission lines, through the damping factor e −k 2 µ 2 σ 2 r, z . Following Wang et al. (2013) we consider the dewiggeled power spectrum where P 0 defines the normalization of the linear power spectrum at redshift zero, n s is the spectral index, and the (dewiggeled) transfer function T 2 dw (k, z) is given by where , c.f.) and f g (z) being the linear growth factor. The BAO transfer function is defined as the difference between the linear matter transfer functions with and without baryons, and the exponential damping due to nonlinear effects is only applied to the transfer function associated with BAO. The uncertainty in nonlinear effects that are still present in the power spectrum even after reconstruction Padmanabhan et al. 2012) is paramterized through In case no reconstruction algorithm is applied nonlinear effects in structure growth, galaxy bias, and redshift space distortions are fully present and p NL = 1.0. We assume an optimistic reconstruction algorithm in line with Wang et al. (2013) of p NL = 0.5, which corresponds to k * = 0.24h/Mpc. We allow for uncertainty in the reconstruction algorithm through varying k * and marginalize over a Gaussian prior with 10% uncertainty in the fiducial value. The dewiggled model characterized through Eq. 30 will break down on small scales where RSD couples with the damping factor but has been shown to work well on quasi-linear scales (Angulo et al. 2008).
We bin the observable power spectrum linearly in k (100 bins between k min = 0.001 and k max = 0.3) and µ (10 bins between 0 and 1) and assume 7 bins in redshift (c.f. Table 6). We model the fractional error of said power spectrum as detailed in Seo & Eisenstein (2003) σ(k, µ) = 2π 2 where n refers to the galaxy number density within a given redshift bin, which again are specified in Table 6.   Figure 8 shows the variation of the WFIRST and BAO and RSD measurements on w 0 and w a . We again use the emcee sampler to cover the parameter space; each chain is > 3M steps and, in addition to the cosmological parameters mentioned in Table 2, we sample the 11 systematics parameters specified in Table 7. Specifically, we account for uncertainties in the level of shot noise P shot (1 parameter), uncertainties in galaxy bias modeling parameterized through one free parameter b i in each redshift bin (7 parameters), uncertainties in redshift measurements σ 2 r,z (1 parameter), uncertainties in modeling peculiar velocities σ p in each redshift bin (7 parameters), uncertainty in residual nonlinear effects k * (1 parameter). Figure 8 shows the change in constraining power when increasing/decreasing the survey area (left), increasing/decreasing the number density of galaxies (middle panel) and when changing our fiducial k max from 0.3 to 0.25 and 0.2. Note that the observing time is not held fixed in the left and middle panel (as opposed to the calculations in Sect. 5.2), which means that when considering twice the area in the left panel this implies doubling the observing time compared to reference HLSS survey. We summarize the FoMs in Table 8 and find that the difference for different k max is negligible, and that there is a slight preference for going deeper compared to going wider.
We note that including an absolute measurement of the BAO scale imprinted in the CMB would notably increase the information compared to the HLSS survey alone. In Fig. 9 we include information from (Planck Collaboration et al. 2018) on the acoustic angular scale θ * = r * /(1 + z)D a , where r * is the comoving sound horizon at recombination and D a is the comoving angular diameter distance to the CMB. The combined likelihood of Planck TT, TE, EE, low-E measurements gives θ * = 0.0104109 ± 0.0000030, which we re-center to our fiducial cosmolgy and use as a prior in Fig. 9.

BAO scale and RSD measurement Fisher forecasts
In addition to the MCMC analysis in the previous subsection we explore the science return of the HLSS using a Fisher analysis on constraining the BAO scale s and RSD parameter combination f σ 8 as a function of redshift.
For this analysis we run the ETC in BAO survey mode, using either galaxies observed in Hα and The default scenario (black) has A = 2000 deg 2 , the wide scenario (green) has twice the area but half the depth, whereas the deep scenario (blue) is twice deeper but half the area. For both the BAO and RSD probes, a wider but shallower survey improves the constraints for z 2 whereas a deeper but narrower survey improves at z 2. Middle row shows results when varying the S/N cutoff (3.5, 5, 6.5) for the default area and depth scenario. A lower S/N cutoff yields better constraints everywhere in z, with more improvement at higher z. Bottom row shows results when covering a larger area of 13559 deg 2 corresponding to an extended spectroscopic survey time of 2 years at half the default depth. We vary again the S/N cutoff: 3.5, 5 and 6.5. luminosity functions given in Pozzetti et al. (2016), which were derived specifically for Euclid and WFIRST; in all cases, the [N II ] luminosity function (used to enhance the signal-to-noise of detected galaxies) is assumed to be 0.37 times the Hα luminosity function. For the [O III ] detections we use model 1992, an average of three luminosity functions: Mehta et al. (2015) and Colbert et al. (2013), two different analyses of the WFC3 grism, and Khostovan et al. (2015), based on ground-based narrow-band surveys. In both the Hα +[ N II ] and [O III ] scenarios, we use an updated galaxy size dis-tribution from a mock catalog based on COSMOS data originally based on Jouvel et al. (2009). The resulting redshift distributions are shown in Fig. 10, which are used as input for the trade-off studies of the forecast results in this section. In each panel, we vary the survey depth from 0.5x, 1x to 2x the fiducial depth (shown in thick, normal and thin lines respectively), while the different panels show the distributions obtained with different S/N cutoffs (6.5, 5 and 3.5). As expected, the number densities increase significantly when lower S/N cutoffs are chosen. Note that in this section, we explore the impact of survey depth at fixed observation time, so the area of the survey is scaled proportionally in what follows.
Using each of the above-mentioned distributions, we compute the fractional error σ p i /p i = [F −1 ] ii on parameter p i , where the Fisher information matrix for parameters p i and p j is given by assuming spatially constant galaxy density n, we have There are two separate Fisher matrices, one for the RSD cosntraints on f σ 8 , and another for the BAO constraints on s. For the RSD constraint, we follow McDonald & Seljak (2009) (using only one tracer) and model the observed galaxy power spectrum as in Eq.
(28) but without the distance ratios for changing cosmology as we fix the background cosmology and we marginalize over σ r,z = σ r,v (1 + z)/H(z). We adopt the fiducial value of σ r,v = 0.001 which is dominated by the observational redshift uncertainty of the grism. Furthermore, for the RSD forecast, we assume perfect reconstruction with k * = ∞.
For the BAO constraints, we calculate errors for the Hubble parameter H and the angular diameter distance D and report their best constrained combination s. Again we use Eq. (33) but this time, modeling the galaxy power spectrum as defined in Eq. 35 with the following differences: First, the fractional reconstruction capability p NL is set by how well the displacements can be determined given the level of shot noise in the data in linear theory. Second, the nonlinear damping of the BAO wiggles is modelled with a slightly different k −1 * = 8.0 Mpc/h × (σ 8 /0.8) p NL . Finally σ r,z is not marginalized for the BAO forecast but is fixed at the same fiducial value mentioned above.
For both BAO and RSD forecasts, we use the inverse galaxy number density for the galaxy shot noise, and the same linear bias model as in DESI Collaboration et al. (2016) for emission line galaxies (ELG) as is appropriate for the WFIRST GRS: b ELG (z)D(z) = 0.84, where D(z) is the growth factor normalized at z = 0.
The Fisher matrices are computed at a fixed flat cosmology consistent with Planck 2015 best-fit (baseline model 2.6) Ade et al. (2016) and we separately evaluate fractional errors on parameters for the H α + [N II ] and [O III ] samples before inverse-variance combining them. In Fig. 11 we show the combined fractional error on the BAO scale s (left) and RSD parameter f σ 8 (right). Note that the H α is the dominant sample up to z ≈ 1.9, beyond which the [O III ] sample becomes the only available sample.
We consider different survey strategies varying depth (top row of Fig. 11) and S/N (middle row) starting from a pilot survey with default area A = 2000 deg 2 and S/N cutoff 5. We fix the total HLSS observation time to 0.6 years in all cases. In the top panels, we show results for a deeper (twice deeper, half the area) and a wider survey (twice the area, half the depth) compared to the pilot survey. For both s and f σ 8 , the wide survey would improve the low-z constraints, whereas the deep survey is more powerful at higher z, as expected.
Since the aggregate constraint (shown in text beside each curve) is dominated by better errors at low-z, the wide survey would improve on the total constraint on parameters compared to the deep survey (e.g. 0.3% vs 0.4% for s and 0.7% vs 1.1% for f σ 8 ). On the other hand, if dark energy behaviour at higher z becomes an important science case, the deep survey improves constraining power by almost a factor of 2 − 3 over the wide option.
In the middle row of Fig. 11, we also show the impact of different S/N cutoffs for galaxy detections at fixed area and depth. We compare our default case of S/N = 5 with a conservative S/N = 6.5, and a more optimistic S/N cutoff of 3.5. As expected, a lower S/N cutoff yields better constraints everywhere in z, with more improvement at higher z as fainter and distant galaxies are more affected by the cut. There is factor of 2 improvement at high z between the curves at S/N = 6.5 and 5. The same is true for 5 and 3.5, we however note that S/N = 3.5 is not likely going to be a realistic value for reliable detections.
We perform a similar analysis but for an extended HLSS survey that lasts 2 years instead of 0.6 years and at only half the depth of the pilot survey, which allows us to survey 13,559 deg 2 (see bottom row of Fig. 11). We show results for 3 different S/N cut and again find unsurprisingly that a S/N cut of 3.5 improves constraining power substantially compared to the more realistic S/N = 5 and the conservative S/N = 6.5 cuts.

CONCLUSIONS
WFIRST's wide-field instrument will join the concert of cosmological endeavors after DESI, LSST, SPHEREx, and Euclid have already made initial measurements. These measurements will inform the design of an optimal WFIRST survey, which can be finalized shortly before launch. The unique versatility of WFIRST's wide-field instrument, ranging from multi-band imaging to highresolution slitless spectroscopy, in combination with the fact that WFIRST carries enough propellant for at least 10 years of observations with no active cryogens, make it an ideal observatory to flexibly target the most interesting science aspects after its launch in the mid 2020s.
In this paper we study the WFIRST reference survey's science return on dark energy, structure growth, and modified gravity accounting for a variety of observational systematics. We present results for the joint analysis of weak lensing, galaxy clustering (photometric), galaxy cluster number counts, BAO and RSD features in the spectroscopic clustering power spectrum, and combine this with SNIa information from WFIRST (as detailed in Hounsell et al. 2018). We outline strategies for optimizing WFIRST's science return and to identify and retire risks from systematic effects early.
For each cosmological probe examined in this paper we identify important areas of future research to further increase the level of realism of our WFIRST simulations, to improve the parameterization of systematics, or to shrink the prior range on existing parameterizations. For example, we postpone modeling and mitigation of baryons (e.g., van Daalen et al. 2011;Eifler et al. 2015;Chisari et al. 2018;Huang et al. 2019;Chisari et al. 2019) or intrinsic galaxy alignment (e.g., Hirata & Seljak 2004;Mandelbaum et al. 2006;Joachimi & Bridle 2010;Krause et al. 2016;Vlah et al. 2019;Blazek et al. 2019;Samuroff et al. 2019) for lensing based measurements to future studies; a decision that is in part driven by the fact that these uncertainties have different levels of modeling maturity for the different probes considered in this paper. We explore corresponding uncertainties in a companion paper Eifler et al. (2020), which focusses on 3x2 (weak lensing and photometric galaxy clustering) synergies of WFIRST and LSST.
We impose conservative scale cuts on photometric clustering information due to uncertainties in modeling galaxy bias. Improved galaxy bias modeling for the spectroscopic and photometric galaxy clustering to include small scale information (see e.g., Ivanov et al. 2019;Salcedo et al. 2020;Wibking et al. 2020) should become another important area for WFIRST optimization.  have explored a Halo Occupation Density model to access small scale information in a similarly high-dimensional parameter space (but simulating an LSST 3x2 analysis), and found that tapping into corresponding information is worth the increased modeling complexity.
Our modeling of the cluster mass observable relation is based on Murata et al. (2019) but extended to account for possible redshift dependence in the scatter of the mass-richness relation. This again is a conservative choice and tightening priors on the existing parameterization or improving the parameterization itself can significantly change the constraining power from galaxy clusters. Precise modeling of cluster cosmology is an active research field (e.g. see Costanzi et al. 2019;DES Collaboration et al. 2020) and studying multi-wavelength strategies including external data sets will be important.
We quantify all statements in this paper using the well-known FoM metric, however we note that the FoM metric reduces a complex answer to a one-dimensional statement. This compression of information is not lossless, for example the FoM depends on analysis choices: scales considered and excluded in the analysis, redshift distribution binning choices, cosmological parameters and priors, systematics parameterization and priors, which covariance and cross-correlations to include, and how to model the covariance in general, which external data sets to include, are all choices by the analyst. Multiple options are justifiable and for some the impact on the FoM can be significant.
While the decision on the optimal WFIRST survey strategy can be made shortly before launch, it is critical to develop realistic survey simulation capabilities now in order to characterize the trade space of statistical power and systematic dangers accurately. Some of these systematics will have subdominant uncertainties, which means they can be corrected and need no further parameterization in a likelihood analysis. This type of systematics will hardly change the error bars presented in this paper, it will only move the best-fit value in a likelihood analysis based on data.
It is important to note that complexity of modeling and covariance code such as the one used in this paper will become a challenge for the community. Increased complexity in a prediction and later in an analysis framework does not automatically increase the precision but it certainly increases the potential for errors. Increased model complexity for systematics must to be rigorously justified by residual uncertainties that are non-negligible, given the constraining power of the survey. This requires a demonstration of the impact of the systematic effect in the presence of a realistic systematics budget overall; it is not sufficient to demonstrate the impact of the systematic as a standalone effect on cosmological parameters. This work contributes to developing such a framework for WFIRST, but several extensions are forthcoming in future work. More realistic systematics models, best informed by actual observations and realistic synergy studies across the whole spectrum of multi-messenger astronomy, which includes optical NIR imaging and spectroscopy but also Cosmic Microwave Background, gravitational waves, and radio observations, should be considered to design a survey that fully utilize WFIRST's potential.