# Computational Issues: EnKF - 4dvarenkf.cima.fcen.uba.ar Computational Issues: An EnKF Perspective Jeff Whitaker NOAA Earth System Research Lab ENIAC 1948 Roadrunner 2008 EnKF cycle 1) Run ensemble forecast for each ensemble member to get xb for next

analysis time. 2) Compute Hxb for each ensemble member. 3) Given Hxb, xb compute analysis increment (using LETKF, EnSRF etc) EnKF Cycle (2) Step 1: Background Forecast 4DVar - a single run of the (high-res) nonlinear forecast model for each outer loop,

many runs of (low-res) TLM/adjoint in inner loop in sequence. EnKF - N simultaneous runs of the non-linear forecast model (embarassingly parallel). Bottom line - total cost similar, but EnKF may scale better. Step 2: Forward operator 4DVar - compute full nonlinear Hxb in each outer loop. In each inner loop, use linearized

H (faster, especially for radiances). EnKF - compute full nonlinear Hxb once for each ensemble member simultaneously. Could use linearized H for ensemble perturbations. Bottom line - total cost similar, but EnKF may be scale better. Step 3: Calculating the increment For EnKF, depends on algorithm

Perturbed obs EnKF (Env. Canada - obs processed serially in batches) ? Local Ensemble Transform KF (LETKF developed at U. of Md, being tested at JMA and NOAA) Serial Ensemble Square-Root Filter (EnSRF NCARs DART, NOAA ESRL, UW real-time WRF) Serial EnSRF algorithm Whitaker and Hamill, 2002: MWR, 130, 1913-1924 Anderson, 2003: MWR, 131, 634-642

Assume ob errors uncorrelated (R diagonal). Loop over all L obs (m=1,L). K = Ens. size 1) Update Nloc nearby state variables with this observation. Covariance (PbHT) costs O(K* Nloc) 2) Update Lloc-m nearby observation priors (for obs not yet processed) with this observation. Covariance (HPbHT) costs O(K*(Lloc -m)) Total cost estimate O(K*L*Nloc) + O(K*L*Lloc) where Lloc=av. # of nearby ob priors and

Nloc=av. # of nearby state elements (for each ob). EnSRF parallel implementation Anderson and Collins, 2007: Journal of Atmospheric and Oceanic Technology A, 24 1452-1463 Update subset of model state and observation priors on each processor. Loop over all obs on each processor get ob priors from processor on which it is updated via MPI_Bcast of K values.

LETKF Algorithm Ob error in local volume is increased as a function of distance from red dot, reaching infinity at edge of circle. LETKF cost estimate (Szyunogh et al 2008: Tellus, 60A, 113-130) Each state variable can be updated independently (perfectly parallel, no communication needed).

Assume diagonal R. Most expensive step is YbR-1YbT, where Y is K x Lloc matrix of observation priors. Lloc is average number of obs in each local region. Cost is O(K2*Lloc*N) vs O(K* L*Nloc) + O(K*L*Lloc) for EnSRF (neglecting communication cost) For L <= N, EnSRF faster For L > K*N, LETKF faster For N~L, LETKF is should be about O(K*Llocal/L) slower.

Benchmarks Compares only cost of computing increment (no I/O, no forward operator). 2100 km, 1.5 scale height localization, K=64 ensemble members. Two cases: 384x190 (T126) analysis grid, two tracers updated. N=23420160, L=33301. 128x64 analysis grid, no tracers updated . N=449820, L=949352. 8 core intel cluster, infiniband, mvapich2, intel fortran

10.1/MKL. Load balancing using Grahams algorithm - assign each grid pt to processor with least work assigned so far. Case 1: N = O(100L) LETKF scales perfectly, but is 37 times slower than EnSRF. EnSRF scales

better than linear (better cache coherence when # of state vars per proc gets small). Case 2: N = O(L) LETKF scales perfectly. EnSRF doesnt

scale when # of variables updated on each proc is too small. Cost of running ensemble dominates as resolution increases Because of CFL condition, cost of running model increases by a factor of 8 when horizontal resolution doubles. This affects calculation of increment in 4DVar.

Calculation of increment in EnKF scales like number of grid points, goes up by a factor of 4. Even for modest global resolutions (100-200 km) we find that ensemble forecast step dominates computational cost. For EnKF model forecast step scales perfectly, for 4D-Var it depends on model scaling. Serial EnSRF

Loop over observations (yn, n=1..N) nth observation prior (jth ens member) b = H b, yjnb = H xjb, where <..> = M-1j=1..M (1st moment) or (M-1)-1j=1..M (2nd moment) Letddn = yjnb yjnb> + Rn, n = (1 + {Rn/dn} 1/2 )-1 For the ith state variable xijb = b + xijb Kin = xjib yjnb >/dn Kalman Gain b = b + Kin(yn - b) update mean for ith state var xijb = xijb - nKin yjnb update perturbations for ith state var For the mth observation prior (yjmb, m=n..N)

Kmn = yjmb yjnb >/dn Kalman Gain b = b + Kmn(yn - b) update mean for mth ob prior yjmb = yjmb - nKmn yjmb update perturbation for mth ob prior Go to (n+1)th observation (xb now includes info from obs 1 to n).