Rapid solution of the cryo-EM reconstruction problem by frequency marching

Determining the three-dimensional structure of proteins and protein complexes at atomic resolution is a fundamental task in structural biology. Over the last decade, remarkable progress has been made using"single particle"cryo-electron microscopy (cryo-EM) for this purpose. In cryo-EM, hundreds of thousands of two-dimensional images are obtained of individual copies of the same particle, each held in a thin sheet of ice at some unknown orientation. Each image corresponds to the noisy projection of the particle's electron-scattering density. The reconstruction of a high-resolution image from this data is typically formulated as a nonlinear, non-convex optimization problem for unknowns which encode the angular pose and lateral offset of each particle. Since there are hundreds of thousands of such parameters, this leads to a very CPU-intensive task---limiting both the number of particle images which can be processed and the number of independent reconstructions which can be carried out for the purpose of statistical validation. Here, we propose a deterministic method for high-resolution reconstruction that operates in an ab initio manner---that is, without the need for an initial guess. It requires a predictable and relatively modest amount of computational effort, by marching out radially in the Fourier domain from low to high frequency, increasing the resolution by a fixed increment at each step.

1. Introduction. Cryo-electron microscopy (cryo-EM) is an extremely powerful technology for determining the three-dimensional (3D) structure of individual proteins and macromolecular assemblies. Unlike X-ray diffraction-based methods, it does not require the synthesis of high quality crystals, although there are still significant sample preparation issues involved. Instead, hundreds of thousands of copies of the same particle are held frozen in a thin sheet of ice at unknown, random orientations. The sample is then placed in an electron microscope and, using a weak scattering approximation, the resulting two-dimensional (2D) transmission images are interpreted as noisy projections of the particle's electron-scattering density, with some optical aberrations that need to be corrected. Much of the recent advance in imaging quality is a result of hardware improvements (direct electron detectors), motion correction, and the development of new image reconstruction algorithms. Resolutions at the 2-4Å level are now routinely achieved.
There are a number of excellent overviews and texts in the literature to which we refer the reader for a more detailed introduction to the subject [8,32,27,30,51,52]. One important issue to keep in mind, however, is that the field is currently lacking a rigorous method for assessing the accuracy of the reconstructed density map, although there are a variety of best practices in use intended to avoid overfitting [8,37,19,21].
In this paper, we concentrate on the reconstruction problem, that is, converting a large set of noisy experimental cryo-EM images into a 3D map of the electron-scattering density. As a rule, this is accomplished in two stages: the generation of a low-resolution initial guess, either experimentally or computationally, followed by an iterative refinement step seeking to achieve the full available resolution permitted by the data. It is generally the refinement step which is CPU-intensive. Although our ab initio method does not require an initial guess, we briefly review some of the existing approaches for the sake of completeness.
For methods requiring a good, low-resolution initial guess in which the user has not supplied one experimentally, many methods are based on the common lines principle. This exploits the projection-slice theorem (see Theorem 2.3 and Figure 1) and was first introduced in [50,49]. In the absence of noise, it is possible to show that any three randomly oriented experimental images can be used to define a coordinate system (up to mirror image symmetry). Given that coordinate system, the angular assignment for all other images can be easily computed. Unfortunately, this scheme is very sensitive to noise. Significant improvement was achieved in the common lines approach in [35], where all angles are assigned simultaneously by minimizing a global error. However, this requires a very time-consuming calculation, as it involves searching in an exponentially large parameter space for all possible orientations of all projections. More recently, the SIMPLE algorithm [12] used various optimization strategies, such as simulated annealing and differential evolution optimizers, to accelerate this search process. Singer, Shkolnisky, and coworkers [45,9,46,42] proposed a variety of rigorous methods based on the eigendecomposition of certain sparse matrices derived from common line analysis and/or convex relaxation of the global least-squares error function. Wang, Singer, and Wen [55] subsequently proposed a more robust self-consistency error, based on the sum of absolute residuals, which permits a convex relaxation and solution by semidefinite programming.
Another approach, introduced in the late 1980s, is the method of moments [14,15], which is aimed at computing second order moments of the density from second order moments of the 2D experimental images. Around the same time, Provencher and Vogel [36,53] proposed representing the electron-scattering density as a truncated expansion in orthonormal basis functions in spherical coordinates and estimating the maximum likelihood of this density. Recently, a machine-learning approach was suggested, using a sum of Gaussian "pseudoatoms" with unknown locations and radii to represent the density, and Markov chain Monte Carlo sampling to estimate the model parameters [23].
Once a low-resolution initial guess is established, the standard reconstruction packages EMAN2 [2,47], SPIDER [13], SPARX [22], RELION [39,40], and FREALIGN [17,18,26] all use some variant of iterative projection matching in either physical or Fourier space. Some, like RELION, SPARX, and EMAN2, make use of soft matching or a regularized version of the maximum likelihood estimation (MLE) framework introduced by Sigworth [43,44]. For this, a probability distribution for angular assignments is determined for each experimental image. Others, like FREALIGN, assign unique angular assignments to each image. Two drawbacks of these methods are that they can be time-consuming, especially those based on MLE, and that it can be difficult to determine if and when a global optimum has been reached.
In order to accelerate the MLE-based methods, Dvornek, Sigworth, and Tagare suggested in [11] that particle images and structure projections be represented in low-dimensional subspaces that permit rotation, translation, and comparison by defining suitable operations on the subspace bases themselves. They demonstrated 300-fold speedups in reconstruction. Recently, Punjani, Brubaker, and Fleet [5] introduced a new scheme based on a probabilistic generative model, marginalization over angular assignments, and optimization using stochastic gradient descent and importance sampling. They demonstrated that their method was both efficient and insensitive to the initial guess. Finally, there has been significant effort aimed at harnessing high-performance computing hardware including GPUs for the most computation-intensive tasks in the cryo-EM reconstruction pipeline [25].
In this paper, we will focus on a new method for refinement, assuming that all particle images are drawn from a homogeneous population. One of our goals is the creation of a method for refinement that is sufficiently fast that it can be run multiple times on the same data, opening up classical jackknife, bootstrap, or cross-validation statistics to be used for validation and resolution assessment. At present, the gold standard is based on methods such as "Fourier shell correlation" [20,33,37], which can be viewed as a jackknife with two samples.
The basic intuition underlying our approach is already shared with many of the standard software packages, such as RELION and FREALIGN, as well as the stochastic optimization method of [5], namely, that the best path to a refined structure is achieved by gradually increasing resolution. The main purpose of the present paper is to propose a deterministic and mathematically precise version of this idea, carried out in the Fourier domain. Unlike existing schemes, it involves no global optimization. Instead, we use resolution (defined by the maximum frequency content of the current reconstruction) as a homotopy path. For each small step along that path, we solve only uncoupled projection matching problems for each experimental image. More precisely, let k ∈ [0, K] denote the band-limit of the model at the current step of the reconstruction algorithm, where K denotes the maximum resolution we seek to achieve. For the model, we generate a large number of "templates"-simulated projections of the model-at a large number of orientations. For each experimental image, we then find the template that is a best match and assign the orientation of that template to the image. Given the current angular assignments of all experimental images, we solve a linear least-squares problem to build a new model at resolution k + ∆k and repeat the process until the maximum resolution K is reached (see section 6).
With M images, each at a resolution of K × K pixels, our scheme requires O(M K 4 ) or O(M K 5 ) work, depending on the cost of template-matching. If this is done by brute force, the second estimate applies. If a hierarchical but local search strategy is employed for template-matching, then the cost of the least-squares procedures dominates and the O(M K 4 ) complexity is achieved. (The memory requirements are approximately 8M K 2 + 8K 3 bytes.) Remark 1. Some structures of interest have nontrivial point group symmetries. As a result, there may not be a unique angular assignment for each experimental image. In the most extreme case, one could imagine imaging perfect spheres, for which angular assignment makes no sense at all. Preliminary experiments with noisy projection data have been successful in this case, using the randomized assignment scheme discussed in section 4.2. We believe that our procedure is easily modified to handle point group symmetries as well but have not explored this class of problems in detail (see [38] for further discussion).
Our frequency marching scheme was inspired by the method of recursive linearization for acoustic inverse scattering, originally introduced by Chen [6,7] (also see [1]). We show here that high resolution and low errors can be achieved systematically, with a well-defined estimate of the total work required. As noted above, unlike packages such as RELION and EMAN2, we do not address the issues that arise when there is heterogeneity in the data sets due to the presence of multiple quasi-stable conformations of the particles being imaged. We also assume that the experimental images have known in-plane translations. Although fitting for such translations would be desired in a production code, we believe that this can be included in our algorithm with only a small constant factor increase in computation time (see the concluding section for further discussion). We do, however, include in our algorithm a known contrast transfer function (CTF) which models realistic aberrations of each experimental image.
In sections 2-5, we introduce the notation necessary to describe the algorithm in detail, as well as the various computational kernels that will be needed. The frequency marching procedure is described in section 6. Section 7 presents our numerical results, which use simulated data derived from known atomic positions for three relevant protein geometries of interest. The use of simulated data allows us to investigate the effect of signal-to-noise ratio (SNR) on reconstruction quality. We draw conclusions in section 8.
2. Mathematical preliminaries. We begin by establishing some notation. Throughout this paper, the unknown electron-scattering density will be denoted by f (x), where x = (x, y, z) in Cartesian coordinates. We assume, without loss of generality, that the unit of length is chosen so that the particle (support of f ) fits in the unit ball at the origin. The Fourier transform of f (x) will be denoted by F (k), where k = (k 1 , k 2 , k 3 ) in Cartesian coordinates and k = (k, θ, φ) in spherical coordinates, with k 1 = k sin θ cos φ, k 2 = k sin θ sin φ, and k 3 = k cos θ. Since we will use a spherical discretization of Fourier space, we write the standard Fourier relations in the following form: f (x, y, z)e ik(x sin θ cos φ+ y sin θ sin φ+ z cos θ) dx dy dz x sin θ cos φ+ y sin θ sin φ+ z cos θ) k 2 sin θ dθ dφ dk.
We will make use of Clenshaw-Curtis (Chebyshev) quadrature in cos θ for the inner integral (see section 2.2), since this is spectrally accurate for smooth integrands and results in nodes that are equispaced in the parameter θ, which simplifies the task of interpolation.

Rotation and projection operators.
Let (1, α, β) denote the spherical coordinates of a point on the unit sphere, which we will also view as an orientation vector. With a slight abuse of notation, we will often identify (α, β) with the vector (1, α, β). Rather than assuming the electron beam orientation is along the z-axis and that the particle orientations are unknown, it is convenient to imagine that the particle of interest is fixed in the laboratory frame and that each projection obtained from electron microscopy corresponds to an electron beam in some direction (α, β).
There is a simple connection between the projection of a function f , namely P α,β [f ], and its Fourier transform F , given by Theorem 2.3 and illustrated in Figure 1. For this, we will need to define a central slice of F .  . Let f (x) denote a compactly supported function in R 3 , let F (k) denote its Fourier transform, and let (α, β) denote an orientation vector.
Let P α,β [f ] denote the corresponding projection of f, and let P α,β [f ] denote its 2D Fourier transform (with normalization analogous to (2.1)-(2.2)). Then P α,β [f ] corresponds to an equatorial slice through F (k), with orientation vector (α, β). That is, The proof of this theorem is straightforward (see, for example, [29]). There is a third angular degree of freedom which must be taken into account in cryo-EM, easily understood by inspection of Figure 1. In particular, any rotation of the image plane about the orientation vector (α, β) could be captured during the experiment.
To take this rotation into account, we need to be more precise about the polar coordinate system (r, ψ) in the image plane. We will denote by ψ the angle subtended in the image plane relative to the projection of the negative z-axis in the fixed laboratory frame. (In the case of projections with directions passing through the poles α = 0 and π, the above definition becomes ambiguous, and one may set ψ to be the standard polar angle in the xy plane.) Then γ specifies in-plane rotation of a projection. Thus, the full specification of an arbitrary projection takes the form Remark 2. The angles (α, β, γ) correspond to a particular choice of Euler angles that define an arbitrary rotation of a rigid body in three dimensions. The notation introduced here is most convenient for the purposes of our method (corresponding to an extrinsic rotation of α about the y-axis, an extrinsic rotation of β about the z-axis, and an intrinsic rotation of γ about the new z-axis).
We will denote the set of experimental images obtained from electron microscopy (the input data) by M, with the total number of images given by M = |M|. Each image M (m) ∈ M has support in the unit disc. We will often omit the superscript (j) when the context is clear. Representing the image in Cartesian coordinates, we let M (m) denote its 2D Fourier transform in polar coordinates as follows: y)e ik(cos ψ x+ sin ψ y) dx dy.

Discretization.
We discretize the full 3D Fourier transform domain (k-space) in spherical coordinates as follows. We choose N r equispaced quadrature nodes in the radial direction, between zero and a maximum frequency K. The latter sets the achievable resolution. On each sphere of radius k, we form a 2D product grid from N φ equispaced points in the φ direction, and the following N θ equispaced points in the θ direction: Writing the volume element k 2 sin θ dθ dφ dk as k 2 dµ dφ dk, where µ = cos θ is the scaled zcoordinate, we note that the scaled z-coordinates of the nodes µ j = cos θ j are thus located at the classical (first kind) Chebyshev nodes on [−1, 1]. The advantage of this particular choice for θ (or µ) nodes is that they will be convenient later for local interpolation in θ near the poles. The complete spherical product grid of N r N θ N φ nodes is used for k-space quadrature, with the weight associated with each node being (2π/N φ )k 2 w j , where w j is the weight corresponding to the node µ j on [−1, 1]. (Here the µ quadrature is sometimes known as Fejér's first rule [56].) In particular, this quadrature will be used in computing the final density in physical space by means of the inverse Fourier transform (2.2).
It is important to note that because the support of f is in the unit ball, we have very precise bounds on the smoothness of F (k). Namely, F , as a function of k, is band-limited to unit "frequency" (note that here, and only here, we use "frequency" in the reverse sense to indicate the rate of oscillation of F with respect to the Fourier variable k). Furthermore, for targets x in the unit ball, the same is true of the exponential function in (2.2). Thus, the integrand is band-limited to a "frequency" of 2. This means that the above quadrature scheme is superalgebraically convergent with respect to N φ and N θ , due to well-known results on the periodic trapezoid and Chebyshev-type quadratures [56]. Furthermore, for oscillatory periodic band-limited functions, the periodic trapezoid rule reaches full accuracy once "one point per wavelength" is exceeded for the integrand [48] (note that this is half the usual Nyquist criterion). These results are expected to carry over to the spherical sections of 3D band-limited functions that we use, allowing for superalgebraic error terms. Since the most oscillation with respect to θ and φ occurs on the largest k sphere, the above sampling considerations imply N φ ≥ 2K and N θ ≥ K. In practice, to ensure sufficient accuracy, we choose values slightly larger (i.e., up to a factor of 1.5) than these bounds.
Remark 3. In practice, the sphere can be sampled more uniformly and more efficiently by choosing N θ to vary with k and by reducing the number of azimuthal points N φ near the poles. For simplicity of presentation, we keep them fixed here (see section 8 for further discussion).
Quadrature in the k direction is only second order accurate, as we are relying on the trapezoidal rule with an equispaced grid. Spectral convergence could be achieved using Gauss-Legendre quadrature, but we find that accuracy with a regular grid is sufficient using a node spacing δk := K/N r ≈ 2, so that N r = O(K).
In our model, rather than using the spherical grid points themselves to sample F (k), we will, for the most part, make use of a spherical harmonic representation. That is, for each fixed radial value k, we will represent F (k, θ, φ) on the corresponding spherical shell in the form Here, where P n (x) denotes the standard Legendre polynomial of degree n, the associated Legendre functions P m n are defined by the Rodrigues formula The functions Y m n are orthonormal with respect to the L 2 inner product on the unit sphere Remark 4. Note that the degree of the expansion in (2.5) is a function of k, denoted by p(k). It is straightforward to show that spherical harmonic modes of degree greater than k are exponentially decaying on a sphere of radius k given that the original function f (x, y, z) is supported in the unit ball in physical space. This is the same argument used in discussing grid resolution above for N r , N φ , and N θ . In short, an expansion of degree O(k) is sufficient, and, throughout this paper we simply fix the degree p(k) = k + 2.
Remark 5. The orthogonality of the Y m n allows us to obtain the coefficients of the spherical harmonic expansion f nm above via projection. Moreover, separation of variables permits all the necessary integrals to be computed in O(p(k) 3 ) operations.
3. Template generation. In any projection matching procedure, a recurring task is that of template generation. That is, given a current model defined by f (x) or F (k), we must generate a collection of projection images of the model for a variety of orientation vectors. We will denote those orientations by (α i , β j ), leaving (θ, φ) to refer to the angular coordinates in Fourier space for a fixed frame of reference. From the projection-slice theorem (Theorem 2.3), we have In fact, our projection matching procedure will work entirely in 2D Fourier space, i.e., using where (k, ψ) are polar coordinates in the projection plane, so our task is simply to compute the central slice S α i ,β j [F ] on a polar grid for each i and j (see Figure 2). The real-space projections P α i ,β j [f ] will not be needed.
In order to generate all templates efficiently, let a discrete set of desired orientation vectors be denoted by We have found that, to achieve acceptable errors at realistic noise levels, these grid sizes N θ and N φ can be chosen to be the same as for the quadrature points via the Nyquist criterion in the previous section. Each α corresponds to an initial rotation about the y-axis in the laboratory frame, while each β corresponds to a subsequent rotation about the z-axis in the laboratory frame. (The fact that projection matching can be simplified by creating a sufficiently fine grid on the sphere was already discussed and used in [31,34].) To generate the points at a given radius k within each slice, we must parametrize points in a polar coordinate system on the equatorial plane in the laboratory frame that have been rotated through the preceding actions. Using Cartesian coordinates for the moment, it is easy to see that Note that the z-coordinate (and therefore the spherical coordinate θ) in the laboratory frame is independent of β, while the x-, y-coordinates in the laboratory frame depend on α, β, and ψ.
For each sampled normal orientation vector (α i , β j ) for the slice, let {ψ l |l = 1, . . . , N φ } denote equispaced points on [0, 2π]. From above, the sample points with radius k, which are located at (k cos ψ l , k sin ψ l , 0) before rotation, move to Fixing the radius k for now, we denote the spherical coordinates of these points (x ijl , y ijl , z il ) by (k, θ il , φ ijl ). To generate the template data, we now need to evaluate the spherical harmonic expansion at these points, namely, Note that only the degrees at least as large as the magnitude of the order |m| contribute. For each radius k, the cost of computing the set , both terms have the asymptotic complexity O(K 4 ). This generates O(K 3 ) template data points on a single spherical shell. Then, summing over all k shells, the cost to generate the full spherical grid templates up to a resolution of K requires O(K 5 ) work and generates O(K 4 ) data.
Letting k q ∈ {k 1 , . . . , k Nr } denote the discretization in k, the above procedure evaluates the O(K 2 ) samples comprising the polar grid (k q , ψ l ) for each of the O(K 2 ) central slices. However, it is more convenient to store these templates in terms of their angular Fourier series for each slice, i.e., with 2N q +1 Fourier modes used on the qth ring. By band-limit considerations of the spherical harmonics, one need only choose N q = p(k q ), the maximum degree for each k shell. The coefficients S ij n (k q ) are evaluated by applying the fast Fourier transform (FFT) to the template data along the ψ grid direction, requiring O(K 4 log K) work.

Projection matching.
Given an experimental image M ∈ M (or, more precisely, its 2D Fourier transform M(k, ψ) discretized on a polar grid), we seek to rank the templates defined by (α i , β j ) and the rotational degree of freedom γ in terms of how well they match the image. For this we need a generative model for images. We will then present an algorithm for matching γ.
4.1. Image model with contrast transfer function correction. For reasons having to do with the physics of data acquisition, each particle image M (m) is not simply a projection of the electron-scattering density, but may be modeled as the projection of the electron-scattering density convolved with a CTF [28], plus noise. The CTF includes diffraction effects due to the particle's depth in the ice sheet, linear elastic and inelastic scattering, and detection effects. In the simplest case, the CTF is radially symmetric and its Fourier transform is real-valued and of the form C (m) (k), depending only on k. For the purposes of the present paper, we will assume that C (m) (k) is known for each experimental image. By the convolution theorem, the CTF acts multiplicatively on the Fourier transform image. Thus, the expectation for the mth Fourier image given by beam direction (α, β) and in-plane rotation γ, deriving from a k-space scattering density F , is To this is added measurement noise, which is assumed to be Gaussian and i.i.d. (independent and identically distributed) on each pixel in the image.

4.2.
Fitting the best orientation for each image. It follows from the above noise model that the correct quantity to minimize when searching for the best match would be the L 2 norm of the difference between the template and the image, which is equivalent to the L 2 norm of their difference in the Fourier image plane. This may be interpreted as minimizing the negative log likelihood, i.e., finding the maximum likelihood. Let M = M (m) be the current experimental image in question, and let C = C (m) be its corresponding known CTF. We will denote by M γ the rotated Fourier image For the template index pair (i, j), a quantity to be minimized over the rotation γ would thus be where, for a function g(k, γ) in polar coordinates, the L 2 norm (up to a band-limit of K) is defined by However, typically there is uncertainty about the overall normalization (a multiplicative prefactor) associated with experimental images (as discussed, for instance, by Scheres et al. [41]). It is straightforward to check that minimizing (4.1) over i, j, γ including an unknown normalization factor for the image is equivalent to maximizing over i, j, γ the normalized inner product Here the inner product corresponds to the norm (4.2), and (4.3) may be interpreted as the cosine of the "angle" between the image and the template in the abstract vector space with norm (4.2). We refer to maximizing (4.3) over i, j, and γ as projection matching.
For each k q ∈ {k 1 , . . . , k Nr }, we precompute a Fourier series representation of the image, with 2N q + 1 Fourier modes on the qth ring: This requires O(K 2 log K) work, done once for each experimental image.
To rank the template matches, we loop over all projection directions indexed by i, j. For each of these directions (α i , β j ), since the denominator of (4.3) is fixed, we need to maximize the inner product as a function of γ, Elementary Fourier analysis shows that  In practice we implement two useful adjustments to the above procedure as follows: (1) We search first over a coarse grid of template directions (with angle resolution five times coarser than the grid defined by N θ and N φ ), and then only search over the set of grid points that are within one coarse grid point of the global maxima found on the coarse grid. This improves efficiency by a constant factor, and in our tests does not degrade accuracy noticeably.
(2) We define a "randomization parameter" f rand . If f rand = 0, we return the single best-fitting orientation: (α m , β m , γ m ) defined above. However, if f rand > 0, we instead choose A m randomly from the set of all discrete orientations (α, β, γ) that produce a normalized inner product (4.3) greater than 1 − f rand . Typically we choose f rand small, e.g., 0.02. This uniformizes the distribution of orientations over the set which fit the image almost equally well. However, it is sometimes beneficial to increase f rand for improved convergence rate in the least-squares procedure of the following section; see also section 7.3.
It should be noted that the idea of using Fourier methods to find the optimal third Euler angle γ can be found in [10], and the complexity of various alignment schemes using polar grids in physical space is discussed in [24]. Remark 6. In the simplest version of the frequency marching scheme that we present, the c n coefficients in (4.4) are evaluated using all frequencies from zero to κ, since F , and hence the templates, get updated over this frequency range at each iteration. We believe that updating the model F only in the current k shell could result in a faster algorithm without sacrificing accuracy. In that case, the c n values are already known for a smaller κ, and they can be incremented according to the formula c n (κ 2 ) = c n (κ 1 ) + Remark 7. In our present implementation, the values of (α m , β m , γ m ) are selected from the set of computed template angles, as discussed in the previous section. A more sophisticated projection matching procedure could generate "off-grid" values for α, β, and γ. The reconstruction procedure below works in either case.
We seek to reconstruct a (new) model F (k, θ, φ) that is consistent with all of the data in a least-squares sense. For each fixed spherical shell of radius k, we use the representation (2.5), and data M From now we shall use a single index i = 1, . . . , N tot to reference these entries, and use (θ i , φ i ) for the spherical coordinates for the corresponding point on the great circle. (In our present implementation we ignore the use of image normalization factors that could be extracted from the orientation fitting algorithm presented in section 4.2. This is justified since our synthetic noisy images will be generated without varying normalization factors.) Let f denote the vector of "unrolled" spherical harmonic coefficients in the representation for F on the current k sphere, so that f is a complex vector of length (p(k) + 1) 2 . Then we define the complex-valued matrix S, of dimension N tot × (p(k) + 1) 2 , by Thus, S evaluates the spherical harmonic expansion with coefficients f at all sphere points.
We also let C i (k) be the corresponding CTF for each image at frequency k (note that C i (k) is the same for all i belonging to the same image) and let C be the diagonal matrix with diagonal entries C i (k). We find the desired solution by solving the problem CSf = d in a least-squares sense. Note that the use of the l 2 norm corresponds to the same assumption of i.i.d. Gaussian noise on the images as in the previous section. For this, we use conjugate gradient (CG) iteration on the normal equations It is sufficient to use q in the range 5-10. We denote by S reg the mapping from spherical harmonic coefficients to values on the N φ × N θ grid, and by T the (sparse) interpolation matrix from the N φ × N θ grid points to the N tot arbitrary locations. In other words, up to interpolation error, Sf ≈ T S reg f , and applying S in this manner requires only O(M q 2 K + K 3 ) work. Clearly, where S H reg can be applied by projection from a regular grid to a spherical harmonic expansion, which is also O(K 3 ). Thus each CG iteration requires only O(M q 2 K + K 3 ) work. As long as the N tot points have reasonable coverage of the sphere (specifically, there are no large "pockets" on the sphere which are empty of points), the system is well conditioned and requires only a modest number of iterations, around 20-50, to achieve several digits of accuracy. It is worth noting (as in other Fourier-based schemes, such as FREALIGN [17,18,26]) that this leastsquares procedure is the step in the reconstruction that is responsible for denoising. The more experimental images that are available (with poor SNRs but correctly assigned angles), the more accurately we are able to estimate f and hence F (k) and the electron-scattering density f (x).
The above describes the computation of F on a single k shell. One advantage of our representation is that to build the complete new model F , the least-squares solve for each k shell in the grid {k 1 , . . . , k Nr } may be performed independently. This gives an overall complexity for the reconstruction of O(M q 2 K 2 + K 4 ).
Definition 5.1. Let F indicate the entire set of model coefficients {f (k 1 ), . . . , f (k Nr )}. We will refer to its least-squares approximation (on all k shells) by F * . We summarize this by where C (m) is the CTF for the mth image and, by analogy with Here, F is determined from the coefficients F via the spherical harmonic representation (2.5).
6. The full inverse problem. As discussed in the previous section, if the angular assignments A = {A 1 , . . . , A M } of the experimental images were given, F would be recovered by solving the least-squares problem (5.3). Since the angles A are unknown, however, it is standard to write the cryo-EM reconstruction problem in terms of the nonlinear and nonconvex optimization task in the joint set of unknowns: As noted above, for simplicity of presentation of the scheme, we omit image normalization prefactors discussed in section 4.2.
One well-known approach to solving (6.1) is to start with an initial guess F (0) for the representation F, and then to iterate as follows: Classical iterative refinement. 3. i ← i + 1. Compute F from the final F via (2.5); then take the inverse Fourier transform (2.2) to recover f . This iteration can be viewed as coordinate descent, alternating between fitting the best angle assignment and fitting the best model density. It will converge, but to a local minimumnot necessarily the correct solution [8]. Various attempts to overcome this convergence failure have been proposed, including annealing strategies and stochastic hill climbing (see, for example, [12,8]), but robustness has remained an issue.
6.1. Frequency marching (recursive linearization). As in the iteration above, we alternate between projection matching to obtain estimates for the angles A and solving leastsquares problems to determine the best set of density coefficients F. However, by continuously increasing the resolution-measured in terms of the maximal spatial frequency used in the representation for F-we bypass the difficulties associated with multiple minima in existing attempts at iterative refinement. In the language of optimization, this can be viewed as a homotopy method using the maximum spatial frequency (resolution) as the homotopy parameter. The basic intuition underlying our scheme is motivated by the success of recursive linearization in inverse acoustic scattering [1,4,6,7].
More precisely, let M([0, k]) denote the set of Fourier transforms of all experimental images restricted to the disk of radius k, and let F([0, k]) denote the density coefficients only up to frequency k, i.e., using the shells for which k q ≤ k. The full objective function minimization restricted to maximum frequency k is which is still a nonlinear and nonconvex optimization problem. However, if F([0, k]) is known and we only seek to find F([0, k+δk]) for sufficiently small δk, then the solution can be reached by a suitable linearization. Moreover, at low frequency, say for k ≤ k 1 = 2, the landscape is extremely smooth so that the global minimum is easily located. Thus, we propose simply assigning random angles A m = (α m , β m , γ m ) to the images at k min = k 1 = 2 and iterating as follows: Solution by frequency marching.
On input, we define a sequence of frequency steps from k 1 to k Nr = K with a step of δk = k i+1 − k i .
• Set A (0) to uniform random values over the allowed ranges. The key feature of refinement by recursive marching is that it is a deterministic procedure involving only linear solves and angular assignments of images. The overall complexity, combining the estimates from sections 4 and 5, and summing over k, is O(M K 5 + K 6 + q 2 M K 3 ), where the first two terms come from angle matching and the last from the least-squares solution. Since in current cryo-EM applications, K ∼ 10 2 , while M ∼ 10 5 to 10 6 , and q 2 < K, the dominant cost is O(M K 5 ), assuming we use the global angle matching procedure described above. However, we find that in our examples, due to the number of CG iterations, the time for least-squares fitting is actually quite similar to that for angle matching, i.e., they are close to balanced.
While we are not able to provide a proof in the general case, we believe that under fairly broad conditions this iteration will converge with high probability for sufficiently small δk (the radial frequency grid spacing). Informally speaking, we believe that a solution near the global minimum is often reached at low k in the first few iterations and that a path to the global minimum at K is then reached as k increases continuously.
In the experiments below we show that, even with very noisy data, a harsh random start at k min = 2 is sufficient. We will return to this question in the concluding section.
In practice, we have implemented a further acceleration to the above scheme: if the mean absolute change in angles between A (i+1) and A (i) is less than 10 −3 , the next increment of the index i is set to five rather than its usual value of one. This greatly reduces the number of steps in the marching procedure once the majority of angles has locked in with sufficient accuracy.
7. Numerical experiments with synthetic data. We implemented the above frequency marching algorithm and performed experiments using simulated images to reconstruct electronscattering densities for three proteins.
7.1. Choice of ground-truth densities and error metric. The three proteins we used for numerical experiments were (see Figure 3) spinach rubisco enzyme (molecular weight 541kDa, 133Å longest dimension), lipoxygenase-1 (weight 95kDa, 107Å longest dimension), and scorpion protein neurotoxin (weight 7.2kDa, 33Å longest dimension). Atomic locations inÅ are taken from the protein data bank (PDB) entries [3], with codes 1RCX, 1YGE, 1AHO, respectively, but were shifted to have their centroid at the origin. Only in the last of the three are hydrogen atom locations included in the PDB. Thus, the number of atoms used were 37456, 6666, and 925, respectively. Note that the last of these, the neurotoxin, is much smaller than the ≈ 100 kDa lower weight limit that can be imaged through cryo-EM with current detector resolution and motion-correction technology; our point in including it is to show that, given appropriate experimental image resolution, reconstruction of small molecules does not pose an algorithmic problem.
For each protein, a ground-truth electron-scattering density f (x) was produced using the following simple model. First, a nondimensionalization length D was chosen giving the number of physicalÅ per unit numerical length, such that the support of f in such numerical units lies within the unit ball. For rubisco, D = 70Å; for lipoxygenase-1, D = 60Å; and for the neurotoxin, D = 25Å. Thus, all physical distances in what follows are divided by D in the numerical implementation. We summed 3D spherically symmetric Gaussian functions, one at each atomic location, giving each Gaussian a standard deviation (0.5r) 2 + b 2 , where r is the atomic radius for the type of atom (which varies from 0.42Å to 0.88Å). The factor 0.5 results in around 74% of the mass of the unblurred Gaussian falling within radius r, and b is a convolution (blurring) radius used to make f smooth enough to be accurately reconstructed using the image pixel resolution. The radii b used were 2.5Å for rubisco, 2.0Å for lipoxygenase-1, and 1.0Å for scorpion toxin, chosen to be around twice the simulated pixel spacing (see the next section). Thus, when we assess reconstruction errors, we are doing so against an appropriately smoothed ground-truth f . For simplicity, each such Gaussian was given a unit peak amplitude.
Our error metric for a reconstructed densityf is the relative L 2 -norm, which is estimated using quadrature on a sufficiently fine uniform 3D Cartesian grid (we used 100 points in each dimension).

Rubisco
Lipoxygenase-1 Neurotoxin Figure 3. Top row: the three protein structures (derived from X-ray crystallography). Bottom row: the corresponding atomic densities f (x) used as a ground truth for the numerical experiments, shown via an isosurface at approximately half the maximum density. Note that the visual length scale for the three molecules is different.
Remark 8. Sincef generally acquires an arbitrary rotation (α, β, γ) relative to the groundtruth f , we must first rotate it to best fit the original f . This is done by applying the procedure of section 4 to a small number (typically 10) of random projections.f is then rotated using the 3D nonuniform FFT [16] to evaluate its Fourier transformF at a rotated set of spherical discretization points as in section 2.2. A second nonuniform FFT is then applied to transform back to real space. Finally, the error (7.1) is evaluated.
Some researchers use a normalized cross-correlation metric to report errors (e.g., [23]). We note that, when errors are small, this metric is close to 1 − 2 /2, with given by (7.1).

Generation of synthetic experimental images.
For each of the three proteins, M (of order 50,000) synthetic experimental images were produced as follows. For each image m we first defined a realistic radially symmetric CTF function C (m) (k) using the standard formulae [28,54] (7.2) Here χ is called the phase function; θ 0 = 0.002 sets the microscope acceptance angle; w 2 = 0.07 controls the relative inelastic scattering, with w 2 1 + w 2 2 = 1; and the spherical aberration is C s = 2 × 10 7Å . The defocus parameter z was different for each image, chosen uniformly at random in the interval [1,4]×10 4Å . Finally, in the above, angles θ are related to wavenumbers k in the numerical experiments via where λ = 0.025Å is the free-space electron wavelength (a typical value corresponding to a 200 keV microscope), and for convenience the distance scaling D has been included. In our setting, θ is of order 10 −5 times the numerical wavenumber k. The images were sampled on a uniform 2D grid at a standard resolution of 100×100 pixels covering the numerical box [−1, 1] 2 . The pixel spacing thus corresponded to D/50, or between 0.5Å for the neurotoxin (this is smaller than what is currently achievable experimentally) and 1.4Å for rubisco. The noise-free signal images were produced by inverse Fourier transforming (via the 2D nonuniform FFT) slices taken at random orientations through the ground-truth Fourier density F (k), after multiplication by the radial CTF C (m) (k) particular to each image. Finally, i.i.d. Gaussian noise was added to each pixel, with variance chosen to achieve a desired SNR. The standard definition of SNR in imaging is the ratio of the squared L 2 -norm of the signal to that of the noise (for this we used the domain [−1, 1] 2 since the molecule images occupy a large fraction of this area). We generated images at SNR values of ∞ (no noise), 0.5, 0.1, and 0.05. A typical value in applications is 0.1.

Results.
All numerical experiments were run on desktop workstations with 14 cores, Intel Xeon 2.6GHz CPU, and 128 GB RAM. Our implementation is in Fortran, with OpenMP parallelization. We used frequency marching with step size δk = 2, starting with random angular assignment at k 1 = 2, and marching to full resolution at K = 70. This K was sufficient to resolve the decaying tails of F without significant truncation error, given our choice of blurring radius b at approximately two pixels. We used interpolation order q = 7, and we started with randomization factor f rand = 0.02. The latter was changed adaptively during marching to balance solution time and accuracy as follows: if at some point the leastsquares CG did not converge after 100 iterations, f rand was doubled and the least-squares solve repeated. If instead CG required fewer than 50 iterations, f rand was halved. The speed of our algorithm allowed us to carry out multiple runs for each protein, using either the same or a fresh set of synthetic images. In the present paper, we carried out five runs for each protein.
To assess how close our relative L 2 error was to the best achievable, given the image sampling, Fourier representation, and SNR, we computed for comparison the best possible Table 1 Relative L2 errors (see (7.1)) of the reconstructions using frequency marching, compared to the best possible reconstruction using known angles for the experimental images. The frequency marching results are given as averages over five runs, with estimated standard deviation. achieved by reconstructing f through a single application of the least-squares procedure (section 5) with all image orientations set to their true values. Table 1 shows that the errors resulting from applying our proposed frequency marching algorithm exceed this best possible error by only around 10 −2 for all molecules and noise levels.  Table 2 shows that it took approximately two hours to do the reconstructions using frequency marching on 14 cores for all three models. This table also shows that the time is not significantly affected by the noise levels in the data. In addition, we expect that the quality of the reconstruction will improve as we increase the number of experimental images. This effect is visible in Figure 7, where the median squared error of the runs is surprisingly well fit by the functional form of a constant (accounting for Fourier and image discretization errors) plus a constant times 1/M , accounting for the usual reduction in statistical variance with a growing data set size M .

Conclusions.
We have presented a fast, robust algorithm for determining the threedimensional (3D) structure of proteins using "single particle" cryo-electron microscopy (cryo-EM) data. This problem is typically formulated in the language of global optimization, leading to a nonconvex objective function for the unknown angles and lateral offsets of each particle in a collection of noisy projection images. With hundreds of thousands of images, this leads to a very CPU-intensive task. By using a recursive method in the Fourier domain, we have shown that an essentially deterministic scheme is able to achieve high-resolution reconstruction with predictable and modest computing requirements; in our initial implementation, 50,000 images can be processed in about one hour on a standard multicore desktop workstation.
One important consequence is that, with sufficiently fast reconstruction, one can imagine making use of multiple runs on the same data, opening up classical jackknife, bootstrap, or cross-validation statistics for quality and resolution assessment. In this work, we illustrate the power of doing so with only five runs.
Practical considerations. There are several features which need to be added to the algorithm above for it to be applicable to experimental data. Most critically, we need to extend the set of unknowns for each image to include the translational degrees of freedom (unknown lateral offsets for the particle "center"). We believe that frequency marching can again be used to great effect by defining a fixed search region (either a 3 × 3 or a 5 × 5 offset grid) SNR 0.5 SNR 0.05 whose spatial scale corresponds to the current resolution. That is, we define a spatial step in each lateral direction of the order O(1/k), so that large excursions are tested at low resolution and pixel-scale excursions are tested at high resolution. This would increase the workload by a constant factor. If done naively, the increase would correspond to a multiplicative factor of 9 or 25 on a portion of the code that currently accounts for 50% of the total time. We suspect that improvements in the search strategy (using asymptotic analysis) can reduce this overhead by a substantial factor and that other code optimizations will further reduce the run time (described below). Thus, we expect only a modest net increase in cost over our current timings.
Code optimization. Significant optimizations are possible to further reduce the run time of our template-matching and least-squares reconstruction steps. These include the following: 1. Using more uniform and sparser sampling in Fourier space. At frequency k, only O(k 2 ) templates are needed to resolve the unknown function F (k, θ, φ). Our initial implementation uses O(K 2 ) templates, where K is the maximum frequency of interest. A factor of order 5 speedup is available here. Moreover, by reducing the number of azimuthal points near the poles, the number of templates can be reduced by an additional factor of π/2.

2.
Updating only the last shell in Fourier space via the least-squares solve. Our initial implementation rebuilds the function F (k, θ, φ) on all spheres out to radius k at every iteration (see Remark 6). 3. Using coarser grids at low resolution in frequency marching. At present, we use the same full resolution spherical grid for every k. 4. Restricting the angle search in (α, β) to O(1) angles in the neighborhood of the best guess at the previous frequency k. This would reduce the overall complexity of the step from O(M K 5 ) to O(M K 3 ). Similar strategies have been found to be very effective in algorithms that marginalize over angles [5]. While it would increase the cost, it is also worth exploring the use of marginalization over angles, as in full expectation-maximization (EM) approaches, as well as the use of subgrid angle fitting in the projection matching step. One could also imagine exploring the effectiveness of multiple iterations of frequency marching (although preliminary experiments with noisy data did not show significant improvements).
Finally, as noted above, our random initialization is subject to occasional failure. We will investigate the possibility of enforcing convergence both by carrying out a more sophisticated nonlinear search for a low resolution initial guess and by finding metrics by which to rapidly discard diverging trajectories. On a related note, we also plan to develop metrics for discarding images that appear to correspond to erroneously included "nonparticles." In frequency  marching, we suspect that the hallmark of such nonparticles (at least in the asymmetric setting) will be the failure of template-matching to find more and more localized matches. On a more speculative note, we suspect that frequency marching will be able to handle structural heterogeneity without too many modifications, at least in the setting where there is a finite number of well-defined conformations, say R, present in the data set. The simplest approach would be to assign a conformation label (1, . . . , R) as an additional parameter for each image and to reconstruct R Fourier space models in parallel, with random initialization of the labels, and assignments of the best label recomputed at each stage. We will report on all of the above developments at a later date.