Blind Ptychographic Phase Retrieval via Convergent Alternating Direction Method of Multipliers

Ptychography has risen as a reference X-ray imaging technique: it achieves resolutions of one billionth of a meter, macroscopic field of view, or the capability to retrieve chemical or magnetic contrast, among other features. A ptychographyic reconstruction is normally formulated as a blind phase retrieval problem, where both the image (sample) and the probe (illumination) have to be recovered from phaseless measured data. In this article we address a nonlinear least squares model for the blind ptychography problem with constraints on the image and the probe by maximum likelihood estimation of the Poisson noise model. We formulate a variant model that incorporates the information of phaseless measurements of the probe to eliminate possible artifacts. Next, we propose a generalized alternating direction method of multipliers designed for the proposed nonconvex models with convergence guarantee under mild conditions, where their subproblems can be solved by fast element-wise operations. Numerically, the proposed algorithm outperforms state-of-the-art algorithms in both speed and image quality.

1. Introduction. Ptychographic phase retrieval (Ptycho-PR) [33,35,26] is an increasingly popular imaging technique used in scientific fields as diverse as condensed matter physics, cell biology or materials science, among others. In a ptychog- Fig. 1. Ptychographic phase retrieval (Far-field): A stack of amplitudes a j = |F (ω • S j u)| is collected, with ω being the localized coherent probe, and u being the image of interest (sample). The white dots represent the scanning lattice points, with D denoting the sliding distance between centers of two adjacent frames. raphy experiment (Figure 1), a localized coherent X-ray probe (illumination) ω scans through an image (sample) u, while a 2D detector collects a sequence of phaseless intensities in the far field. Throughout the paper, we consider Ptycho-PR in a discrete setting as follows: A 2D image is expressed as u ∈ C n with √ n × √ n pixels, ω ∈ Cm is a localized 2D probe with √m × √m pixels (u and ω are both written as a vector by a lexicographical order), and a j ∈ Rm + ∀0 ≤ j ≤ J − 1 corresponds to a stack of phaseless measurements, collected with a j = |F(ω • S j u)|. Operation | · | represents the element-wise absolute value of a vector, • denotes the element-wise multiplication, S j ∈ Rm ×n is a binary matrix that defines a small window with the index j and sizem over the entire image u, and F denotes the normalized discrete Fourier transformation. In practice, as the probe is almost never completely known, one has to solve a blind ptychographic phase retrieval (BP-PR) problem [21]: BP-PR: To find ω ∈ Cm and u ∈ C n , s.t. |A(ω, u)| = a, where bilinear operators A : Cm × C n → C m and A j : Cm × C n → Cm ∀0 ≤ j ≤ J − 1, are denoted as follows: A(ω, u) := (A T 0 (ω, u), A T 1 (ω, u), · · · , A T J−1 (ω, u)) T , A j (ω, u) := F(ω • S j u), and a := (a T 0 , a T 1 , · · · , a T J−1 ) T ∈ R m + . 1.1. State-of-the-art algorithms. When the probe is exactly known, Ptycho-PR corresponds to a general phase retrieval (PR) problem. Multiple efficient algorithms have been designed during the last years to solve both general phase retrieval [27,36], and non-blind (no probe retrieval) Ptycho-PR problems [35,19,41,29,34,39,30,11,43]. An early work by Chapman [9] to solve the blind problem used the Wigner-Distribution deconvolution method to retrieve the probe (zone-plate pupil). The more recent algorithms that focus on the blind Ptycho-PR problem (1.1) are described below.
1.1.1. Projection Algorithms. In the optics community, projection algorithms such as Alternating Projections (AP) are very popular for non-blind PR problems [27]. Some projection algorithms have also been applied to BP-PR problems, e.g. Douglas-Rachford (DR) algorithm by Thibault et al. [37], extended ptychographic engine (ePIE) by Maiden and Rodenburg [26], and relaxed averaged alternating reflections [25] based projection algorithm by Marchesini et al. [28]. Projection algorithms for PR involve a projection onto a nonconvex modulus constraint set, and therefore, their convergence studies are very challenging. Although some progress has been made for the general PR problem using projection algorithms [20,30,10], the corresponding convergence for the blind problem is still unclear. Arguably, the most popular projection algorithms for BP-PR are ePIE and DR, described in detail below.
ii) ePIE. This iterative algorithm can be expressed as an AP method for BP-PR as follows: To find Ψ n k belonging to the intersection as Ψ n k ∈ {|FΨ n k | = a n k } ∩ {Ψ n k : ∃ω ∈ Cm, u ∈ C n , s.t. ω • S n k u = Ψ n k }, with a random frame index n k . By first computing the projection ψ k n k by P n k 1 (ψ k n k ), and then updating ω k+1 and u k+1 by the gradient descent algorithm (inexact projection), the ePIE algorithm [26] can be expressed by updating ω k+1 and u k+1 in parallel as with frame index n k ∈ {0, 1, · · · , J −1} generated randomly, and positive parameters d 1 and d 2 , where ω k , u k are the solutions in the k th iteration, (·) * denotes the complex conjugate of a vector pointwise, and ω ∞ := max t |ω(t)|.
Under the assumption of boundedness of iterative sequences, the convergence of PALM to stationary points of (1.5) was proved [21]. The parallel version of PALM was also provided, presenting multiple similarities with ePIE. The main difference between them is that for ePIE, only the gradient of F w.r.t. a randomly selected single frame is adopted to update ω and u per outer loop, while for PALM, each block of ω and u can be updated in parallel by employing the gradient w.r.t. all adjacent frames. Therefore, the parallel version of PALM is numerically more robust than ePIE.

Motivations and Contributions.
Existing algorithms for BP-PR still have several limitations. The convergence speed of PALM [21] is not satisfactory (See the convergence curves in cyan in the first row of Figure 6). AP algorithms for BP-PR as DR [37] and ePIE [26] tend to be unstable: visible drifts of the probe happen during iterations of DR (See Figure 7 (a)), and the iterative sequence of ePIE could diverge if the measurements are few or noisy. In addition, the existing algorithms reviewed in section 1.1 for BP-PR were derived based on a least squares model that ignored the impact of experimental noise, e.g. Poisson noise.
The alternating direction method of multipliers (ADMM) [17,42,4] was adopted to solve non-blind PR problems in [41,6], where it demonstrated fast convergence and robustness. However, it has never been applied to BP-PR problems (1.1). On the theoretical side, only the subsequence convergence of ADMM was proved for non-blind PR problems in [41], which relied on a relatively strong assumption of the convergence for the successive error of the multiplier.
In this paper we propose a more efficient method that employs the generalized form of ADMM [15,47,18,13], a typical extension of standard ADMM 2 . The main contributions of this paper are summarized below: • We propose a nonlinear least squares model for BP-PR with constraints on the image and probe, driven by the maximum likelihood estimation of the Poisson noise. We also formulate a variant optimization model by incorporating the Fourier magnitudes of the probe to attenuate artifacts caused by the periodical lattices based scanning. • We design fast generalized ADMMs to solve the proposed models, whose subproblems can be efficiently solved with comparatively few elementwise operations. With suitable stepsizes, we prove their convergence to stationary points for the proposed models under mild conditions. • We conduct numerous experiments to show the convergence and efficiency of the proposed algorithms. Compared with the state of the art, the proposed algorithms present a significantly better convergence speed: their R-factor decreases faster, while producing higher quality images. When using additional information from the probe, the proposed algorithms also successfully attenuate artifacts from the periodical lattice based scanning. We report more than 10× acceleration by using a non-optimized GPU implementation in Matlab, which demonstrates how the method can be trivially parallelized, showing great potential for large-scale problems. The reminder of the paper is organized as follows: Section 2 gives the nonlinear least squares models for BP-PR, as well as the Lipschitz property. Fast generalized ADMMs are given in section 3 with convergence guarantees shown in section 4. Numerous experiments are conducted to verify the effectiveness of proposed algorithms in section 5. Section 6 summarizes this work.
2. Proposed Models. Instead of directly solving the quadratic multidimensional systems in (1.1), following [34] for non-blind Ptycho-PR, a nonlinear least squares model for BP-PR can be given as min ω∈Cm,u∈C n 1 2 |A(ω, u)| − a 2 . In order to deal with data contaminated by different types of noise, based on the maximum likelihood estimation (MLE), more general mappings [8] to measure the distance between the recovered intensity g ∈ R m + and the collected noisy intensity f ∈ R m + , have been extensively studied for the PR problem: Consequently, a nonlinear optimization model for BP-PR is given as: Model I: min ω∈Cm,u∈C n G(A(ω, u))+I X1 (ω) + I X2 (u), with G(z) := B(|z| 2 , f ), where the amplitude constraints of the probe and image as [21,28] are incorporated, where X 1 := {ω ∈ Cm : ω ∞ ≤ C ω } and X 2 := {u ∈ C n : u ∞ ≤ C u } with two positive constants C ω , C u . If the collected data is noise free, the minimizer to (2.3) exists if the solution set {(ω, u) ∈ X 1 × X 2 : |A(ω, u)| = a} = ∅. For the data possibly contaminated by noise, the following proposition shows the existence of the minimizer to Model I. Proposition 1. Model I admits at least one minimizer.
It is standard to show the existence of a minimizer to Model I, and we omit the details. The generalized derivative for complex-valued variables are adopted as in [21,6] by separating the real and imaginary parts of the variables and operators defined in the complex space. Below we denote the Lipschitz property of the function G defined on C m .
Definition 1 (Lipschitz Property). The function Φ : C m → R has the Lipschitz property in C m with a Lipschitz constant L if 1) It is Lipschitz differentiable (its gradient is Lipschitz continuous), i.e.
2) It has the descent property as Following [1], letting Φ be smooth in C 1 , the second relation (2.5) holds if it is Lipschitz differentiable as (2.4). It is well known that the Lipschitz property of the objective functional is of high importance to prove the convergence of first-order splitting algorithms for a nonconvex minimization problem [44,3,40,22]. We show the Lipschitz property of G in the following lemma.
Lemma 2.1. The function G(·) has the Lipschitz property. Proof. The function G(z) with each metric in (2.2) is C ∞ smooth, and we just need to show the Lipschitz continuity of its gradient. The corresponding first order derivatives are given below: A basic estimate is given below: In order to prove the first property in (2.4), ∀ v 1 , v 2 ∈ C m , by (2.6), we have Similarly, in the case of pIPM, we have For BP-PR (1.1), or Model I in the noise free case (without constraints), trivial solutions always exist, i.e. scaling, reflection or translation. Moreover, there exists nontrivial solutions, which possibly depends on the scanning geometry. Especially, with periodical-lattice based scanning, there exists evident structural artifacts in the recovered images [37]. Specifically, assuming that (ω, u) is a solution to (1.1) for the noiseless case, the nontrivial solution (ω,û) can be generated as (ω,û) := (p 1 • ω, p 2 • u) with two periodical functions p 1 , p 2 (not constants) satisfying S j1 p 2 = S j2 p 2 ≡ 1m p1 ∀0 ≤ j 1 , j 2 ≤ J − 1, such that p 1 • S j p 2 ≡ 1m. Readily, it can be seen how (ω,û) is a solution to (1.1), sincê Due to the existence of nontrivial solutions to (1.1), the existing algorithms may get trapped into finding a solution with periodical structures (Figure 4 (b, k) and Figure 5 (b, k)), which completely breaks features of the image of interest. In order to deal with nontrivial ambiguities, non-periodical lattices based scanning can be considered experimentally to remove the periodicity of the scanning geometry, e.g. adding a small amount of random offsets to a set of square-lattice (or raster grid) [26], using a circular scan geometry [14] or Fermat spiral scan geometry [23]. However, in practice, square lattice or other periodical lattices based scanning are very popular for being straightforward and easy to implement, and we have to explore more a-priori information of the unknown probe to attenuate artifacts with the periodical scanning geometry.
In real experiments, the specimen can be simply removed to collect the diffraction pattern of the unknown probe in far-field. Hence, in the following optimization problem, we leverage the additional data c ∈ Rm + to eliminate structural artifacts and further increase the reconstruction accuracy: where τ is a positive parameter, the additional measurement c is the diffraction pattern (absolute value of Fourier transform of the probe) as c := |Fu|, and G(z) := B(|z| 2 , c 2 ), with c 2 denoting the pointwise square of the vector c. For simplicity, assume that G andĜ adopt the same metric. We remark that such information can be used as a constraint condition as in [28]. Similarly to Proposition 1, the existence of a minimizer to Model II can be readily derived.
3. Numerical Algorithms. The proposed models are non-convex and non-smooth, and they can be solved by a block coordinate descent method with proximal linear version [44], or PALM [3], equivalently. We conduct an experiment using PALM [3] for Model I. Equivalent results can be obtained using the block coordinate descent method, BCD [44], and the detailed iterative scheme can be found in Appendix A. The recovery results using PALM (BCD) are presented in Figure 2. By observing the recovery images in Figure 2 (b, c), obvious artifacts including the periodical artifacts and ringing artifacts of the edges can be observed. Moreover, one can readily see that PALM (BCD) for Model I converges very slow inferred from Figure 2 (d), since in order to make these iterative algorithms be stable, very small stepsizes are selected manually, which results in slow convergence speed.
Alternatively, the generalized ADMM will be adopted to solve the proposed models, which permits bigger stepsizes by avoiding directly calculating the gradient of the objective functional such that fast convergence speed is gained. By introducing new variables, the proposed optimization problems are decoupled by operator-splitting of ADMM and the resulted subproblems can be easily solved. Additionally, proximal terms in the subproblems are introduced in order to guarantee the sufficient decrease of the augmented Lagrangian, such that the convergence of the proposed algorithm can be derived under mild conditions.
3.1. Generalized ADMM for Model I. An auxiliary variable z = A(ω, u) ∈ C m is introduced such that an equivalent form of (2.3) is formulated as follows: The corresponding augmented Lagrangian reads with the multiplier Λ ∈ C m and a positive parameter β, where ·, · corresponds to the L 2 inner product in complex Euclidean space and (·) denotes the real part of a complex number. Consequently, instead of minimizing (2.3) directly, one seeks a saddle point of the following problem: A natural scheme to solve the above saddle point problem is to split them, which consists of four-step iterations for the generalized ADMM (only the subproblems w.r.t. ω or u have proximal terms), as follows: Step 2: u k+1 = arg min Step 3: z k+1 = arg min Step 4: with diagonal positive semidefinite matrices M k 1 ∈ Rm ×m + and M k 2 ∈ R n×n + (we call them preconditioning matrices as in [47]) and two penalization parameters α 1 , α 2 > 0, where ω 2 given the approximated solution (ω k , u k , z k , Λ k ) in the k th iteration. Here we remark that these two matrices M k 1 , M k 2 are assumed to be diagonal so that subproblems in Step 1 and Step 2 have closed form solutions. In practice, these two matrices are chosen heuristically, and a special strategy to guarantee the convergence will be specified in section 4 (See (4.12)). 3.1.1. Subproblems w.r.t. ω and u. First, we consider the subproblem in Step 1 of (3.4) w.r.t. the probe ω: where ω(t) denotes the t th element of ω∀0 ≤ t ≤m − 1. Essentially this subproblem w.r.t. each element ω(t) is independent, such that one just needs to solve the following one-dimension constraint quadratic problem: The derivative of ρ k t (x) is calculated as the close form solution of Step 1 is given as with the projection operator 3 onto X 1 defined as Proj(ω; C ω ) := min{C ω , |ω|} • sign(ω). Second, we consider the subproblem in Step 2 of (3.4) w.r.t. the variable u as Similarly to (3.6), letting M k 2 satisfy where the proximal operator [32] and it is simplified by Prox G (z) in the case of β = 1.
For the case of pIPM, we have The solution can be expressed as The first optimality condition of the above optimization problem without constraints yields a quartic equation, such that it is possible to compare the function values at all non-negative roots. The root with the minimal value is the exact solution to this optimization problem. However, the computation cost to calculate all roots of a quartic equation is still relatively high. Alternatively, the gradient projection scheme can be expressed as: with the stepsize δ > 0, and x 0 := |z k (t)|. For the case of pAGM, similarly to the case of pIPM, the solution z k+1 can be given by (3.10), and the iterative scheme for ρ (t) is given as: Based on the above calculations, the generalized ADMM for Model I is summarized in Algorithm 1.
3 It is the close form expression for the minimizer to the problem min ω ∞≤Cω

5:
Update the multiplier as Step 4 of (3.4). 6: end for 3.2. Generalized ADMM for Model II. By introducing two auxiliary variables z 1 ∈ C m and z 2 ∈ Cm, an equivalent form of (2.8) can be obtained as: The corresponding augmented Lagrangian reads (3.14) with two multipliers Λ 1 ∈ C m , Λ 2 ∈ Cm and positive parameters β 1 and β 2 . Consequently, one needs to solve a saddle point problem as follows: The generalized ADMM is also used to solve the above problem, and in the following discussion, we focus on the subproblems of ADMM. First, we consider the subproblem w.r.t. ω: with the penalization parameterᾱ 1 > 0 and the preconditioning matrixM k 1 . Similarly to (3.6), by computing the first-order derivative, one readily obtains the solution: Next, we consider the subproblem w.r.t. u. According to (3.8), its solution is directly given below: with the penalization parameterᾱ 2 > 0, and the preconditioning matrixM k 2 . Regarding the subproblems w.r.t. z 1 and z 2 , their closed form solutions can be directly derived as follows: where the proximal operators can be solved similarly to (3.10). The generalized ADMM for Model II is summarized in Algorithm 2.
Since F is an orthogonal operator, and the operator A is block-wise defined based on element-wise multiplications of the probe and image for Ptycho-PR, their subproblems w.r.t. ω and u have closed form solutions. We remark that, for the z−subproblem, very few iterations are needed to guarantee the convergence of the entire ADMM in practice, due to the error-forgetting property [46].
1: for k = 0 to Iter M ax − 1 do

5:
Update multipliers as Λ k+1 6: end for 4. Convergence Analysis. First, we briefly review the framework for the convergence analysis of an iterative algorithm for nonconvex optimization problems. Consider an optimization problem min v Φ(v), with the functional Φ being proper, lower semi-continuous, and bounded below. Let an iterative sequence {v k } ∞ k=0 be generated by some algorithm to solve the above minimization problem. In light of [1,3,22,40,24,31,45], in order to prove that the iterative sequence converges to stationary points of Φ, the following three conditions should be satisfied: i) Sufficient decrease condition as ii) Relative error condition as , and a positive constant c 2 .
The first two conditions can guarantee that each limiting point of the iterative sequence is a stationary point of Φ, which demonstrates the subsequence convergence. Furthermore, by the KL property of Φ, the iterative sequence can be proved to be a Cauchy sequence and therefore convergence can be reached. Along this line, the convergence for ADMM was established for noncovex problems [22,40,24,31,45].
In this section we analyze the convergence of the proposed generalized ADMM algorithm for this nonconvex optimization problem with the bilinear constraint. The sufficient-overlap condition of iterative sequences {ω k } and {u k }, which will be assumed to be bounded below by a positive constant, the difficulty brought by the bilinear constraint can be overcome. Thanks to the Lipschitz property of pAGM/pIPM based G (in Lemma 2.1), and the KL property of the objective functional of the proposed models, we shall be able to prove the convergence of the proposed algorithms.
The following notation is used for the successive error of the iterative sequences:
We first estimate the differences of the augmented Lagrangian between two successive iterations to show the sufficient decrease as in (4.1). Lemma 4.1 (Sufficient decrease condition). For all k ≥ 1, we have where I k u ∈ R + and I k ω ∈ R + are defined below: The corresponding proof can be found in Appendix B. We show the boundedness of the iterative sequence {X k } in the following lemma: L, the sequence {X k } generated by Algorithm 1 is bounded. Furthermore, the sequence of augmented Lagrangian {Υ β (X k )} is bounded and non-increasing.
The proof can be found in Appendix C. In order to make sure that the decreasing quantity of the augmented Lagrangian is bounded below by X k+1 − X k 2 , we need the following assumption: Assumption 1 (Sufficient-Overlap). The sequences {I k u } ⊆ R + , {I k ω } ⊆ R + , defined in (4.9), are bounded below by two constants C 1 and C 2 , respectively, i.e.
Remark 4.1. We discuss the above assumption for standard ADMM by setting M k 1 = 0, M k 2 = 0 in Algorithm 1, since it can work well in practice (further details can be found in Section 5). Assuming that the iterative sequences {ω k } and {u k } converge to the ground truth solutions ω and u for (2.3), which are also assumed to be nonzeros, under Assumption 1, the following inequalities hold: which is a necessary condition for Assumption 1. Equation (4.11) holds when the sample is scanned with a small enough sliding distance so that each scan position sufficiently overlaps with the next. There must exist a positive number k 0 ≥ 0, such that for all k ≥ k 0 the iterative sequences {I k u } k≥k0 and {I k ω } k≥k0 are bounded below by some strictly positive constants. That explains the reasonability of Assumption 1. In practice, in order to guarantee the convergence for Ptycho-PR, scanning with smaller enough sliding distances is always needed so that enough redundancy between measurements can be attained. However, such assumption for standard ADMM is difficult to verify.
Due to the introduction of the proximal terms, these matrices can also be given by the following strategy: ∞ , s 1 , s 2 are positive safeguard parameters, and r 1 , r 2 are positive constants. One readily knows that Assumption 1 holds with C j = 2sj αj β min {1, r j } > 0, j = 1, 2 in (4.10). This strategy also guarantees that (3.5) and (3.7) hold, and the sequences of these two preconditioning matrices are bounded due to the boundednesses of {ω k } and {u k }.
The relative error condition, similarly to (4.2), is derived in the following lemma:  The proof can be found in Appendix E. We remark that Assumption 1 can be verified when setting the preconditioning matrices as in (4.12), as well as the boundedness of the sequences of preconditioning matrices. Therefore, with suitable stepsizes and careful selection of the preconditioning matrices, one can derive the convergence of the proposed algorithms for Model I.

Convergence of Algorithm 2.
Similarly to Assumption 1: Assumption 2 (Sufficient-Overlap). There exists a constant C 1 independent of k, such that I k ω defined in (4.9) is bounded below, i.e. I k ω ≥ C 1 > 0. Based on the above two assumptions, we list the theorem below without proof, since it directly follows the proof of Theorem 1.
In order to derive the convergence of Algorithm 2, only the positive lower bound of I k ω is assumed. Hence, the convergence analysis of Algorithm 2 needs weaker conditions, compared to Algorithm 1. For the convergence guarantee of Algorithm 2, the boundedness of I k ω can be verified after the careful selection of the preconditioning matrixM k 2 , similarly to (4.12).

Numerical Experiments.
In the following experiments we consider that the probe scans the image of interest with periodical boundary condition on square, hexagonal or random lattices. The random lattice is generated based on the square lattice with additional random offsets within one pixel. The ground truth image and probe employed in this experiments can be found in Figure 4 (a) and Figure 13.
In order to evaluate performances of different algorithms, we introduce two criteria. First, the R-factor for Algorithms 1 and 2, defined as: If the R-factor of the iteration solutions of Algorithm 1 or 2 tends to zero as k → ∞ for the noiseless case, they converge to global minimizers of the proposed models, which are also the stationary points. Therefore, in the case of noiseless data, R-factors can be used to verify the convergence. Secondly, the SNR (Signal-to-Noise Ratio) of u k is also employed, denoted as: with the ground truth image u g , where the error is computed up to the translation T * , phase shift and scaling factor ζ * determined by (ζ * , T * ) := arg min Similarly, we can denote the SNR(ω k , ω g ) for the probe, with ω g being the ground truth. We also consider measurements contaminated by Poisson noise. In order to simulate different peak values, a factor η > 0 is introduced [7] on the image, such that the Poisson noise corrupts the clean intensity as f (t) i.d.d.
In the following tests, we set u 0 = 1 n and ω 0 = 1 J F −1 ( j f j ), where the initial guess of the probe has zero phase for Algorithm 1-2. The parameters C ω , C u need to be sufficiently large, and we employ C ω = C u = 1.0 × 10 8 for the constraint sets in Models I and II. The same initial values of u and ω are used for other compared algorithms. Other variables including auxiliary variables and multipliers are initialized as in the initialization stage in Algorithms 1-2. The parameters for the proposed algorithms are problem-dependent, which will be specified for different experiments. The codes are implemented in Matlab.
We remark that no obvious acceleration of the proposed algorithms can be observed when using more inner iterations for the subproblems of z k in Algorithm 1 and z k 1 , z k 2 in Algorithm 2. Due to the page limit, we do not report further details. Therefore, to reduce the overall computation cost, only one inner iteration is adopted.

Performance of Algorithm 1.
We show the performance of Algorithm 1 with pAGM (ε = 1.0 × 10 −8 × f ∞ ), and then compare it with the state-of-the-art algorithms: DR [37], ePIE [26] and PALM [21]. The comparison algorithms are tuned and executed with the parameters that maximize their performance.

Comparison with DR, ePIE and PALM.
We conduct experiments in the case of both noiseless and noisy measurements.
i) Noiseless case. We conduct the experiments with two different sliding distances D = 24, 16. Two different scanning lattices including square and random lattices are considered. We show the performances of Algorithm 1 in the following two cases: the standard ADMM with M k 1 = M k 2 = 0 (referred below as ADMM) and the generalized version (4.12) with two additional proximal terms (ADMM-Prox), with r 1 = 1.0 × 10 −6 , r 2 = 1.0 × 10 −3 , s 1 = s 2 = 1.0 × 10 −6 , and α 1 = α 2 = β. First of all, in order to verify the sufficient-overlap condition, we plot the histories of I k u and I k ω in Figure 3 (D = 24). One readily knows that they quickly become stable, without approaching infinity or zero. Remarkably, without the proximal terms, the iterative sequences {I k u } and {I k ω } by the standard ADMM also quickly tend to non-zero values. With the additional proximal terms, the sequences of ADMM-Prox get stable faster than those of the standard ADMM. The results of the recovered images and probes are shown in Figure 4, and their corresponding zoom-in views in Figure  5. The recovery results demonstrate how ePIE and PALM are more sensitive to the number of measurements. With D = 24, ePIE fails due to blow-up of its iterative solutions. The recovered images using PALM (Figure 4 (c, g), and its zoom-in Figure 5 (c, g)) present obvious stripe structures and ringing artifacts. With fewer number of measurements (or larger sliding distances), DR and Algorithm 1 work better than ePIE and PALM. In the first and third rows of  (Figure 4 and 5, fourth row), all algorithms produce higher quality reconstructions. However, it can be noted in Figure 5 (zoom-in views) how ADMM, ADMM-Prox and DR can recover much sharper edges than ePIE and PALM.
The convergence analysis is presented in Figure 6. The R-factor in the case of ADMM and ADMM-Prox decreases faster than with the other compared algorithms w.r.t. the number of iterations. Furthermore, within one thousand iterations, the proposed algorithms reach the given tolerance. Those results experimentally demonstrate the convergence of the proposed algorithms, which is consistent with the previous convergence analysis. Specifically, inferred from the first row of Figure 6, ADMM and ADMM-Prox converge the fastest among all the compared algorithms: neither ePIE, PALM nor DR reach the desired tolerance within one thousand iterations. Due to the property of fast convergence of ADMM and ADMM-Prox, the SNRs of recovered images and probes are much higher than those by other algorithms. Execution times for the compared algorithms can be found in Table 1. Due to the additional update of the multipliers, the single step complexity of ADMM and ADMM-Prox is slightly higher than the compared algorithms. Because the proposed ADMM algorithms require fewer iterations, they achieve considerable lower execution times compared with DR, ePIE and PALM. ADMM and ADMM-Prox are 1.7 and 2 times faster, on average, for square-and random-lattice scanning, respectively.
The recovery images of ADMM-Prox with a careful selection of the preconditioning matrices have higher quality than those of standard ADMM: about 0.6 and 2.0 dB increase of SNRs from the images in Figure 6 (e, g) is gained. For the random-lattice scanning, they have very similar performances. In the following results, we run proposed algorithms simply setting the preconditioning matrices to zeros. Note that ADMM-Prox require more parameters to be tuned to gain the best performance but ADMM produces comparable results by tuning a single parameter. Table 1 Runtimes of all compared algorithms. The format of the data "·/ · (·)" denotes "Elapsed time (in seconds)/Number of iterations (elapsed time per iteration)". "-/-" means that the iterative solutions blow up. Note that the iteration number of ePIE means the number of cycles. One should notice the obvious drift existing in the probe by DR in Figure 7 (a), with peak level η = 0.1. For this reason, the quality of the recovered images of DR is the lowest among all compared algorithms, since DR does not have fixed points (a) Truth (b) DR [37] (c) PALM [21] (d) ADMM (e) ADMM-Prox (f) DR [37] (g) PALM [21] (h) ADMM (i) ADMM-Prox (j) ePIE [26] (k) DR [37] (l) PALM [21] (m) ADMM (n) ADMM-Prox (o) ePIE [26] (p) DR [37] (q) PALM [21] (r) ADMM (s) ADMM-Prox Fig. 4. Results of recovered images and probes by ePIE [26], DR [37], PALM [21] and ADMM (Algorithm 1) from left to right. D=24 (m = 6.25n) in the 1st-2nd rows; D=16 (m = 16n) in 3rd-4th rows. Scanning on square-lattice in 1st and 3rd rows and random-lattice in 2nd and 4th rows. (a): Ground truth for the image and probe with resolutions n = 256 × 256 andm = 64 × 64 respectively.
for an inconsistent feasibility problem [2]. At this noise level, ADMM and PALM can produce visually acceptable results. When the noise gets weak (peak level η = 1) all algorithms can achieve high quality reconstructions, Figure 7 (second row). The zoom-in views of this experiment are presented in Figure 8. In this view we can see how the proposed ADMM algorithm, Figure 8 (c), presents much weaker ringing artifacts compared to PALM and DR, Figure 8 (a, b). At peak level η = 1, the recovered image from ADMM is nearly artifact-free, Figure 8 (g), while the results by the other two algorithms contain evident ringing artifacts, Figure 8 (e, f).
Convergence histories for this experiment can be found in Figure 9. The results of Figure 9 (a, c) demonstrate how DR is not stable with strong noise (η = 0.1). The R-factors of ADMM are smaller than those obtained by PALM and DR, and the SNRs of the recovery results of ADMM are also higher than those obtained by PALM, which is consistent with the results shown in Figures 7 and 8. In summary, ADMM and PALM are more stable than DR and ePIE, and ADMM can recover images with higher quality than all the other compared algorithms.

Different metrics.
We show the performance of Algorithm 1 with four different metrics: pAGM, pIPM, IGM and wIGM (the latter two are defined in (2.1)). We conduct these experiments with D = 16 and random-lattice scanning.
First, noiseless measurements are considered, with convergence histories shown in Figure 10. Within the first 50 iterations, R-factors with four different metrics decrease with almost the same speed. As the iterations go on, Algorithm 1 with IGM converges the slowest, and it is further accelerated with the weighted norm in wIGM. Algorithm 1 with pAGM/pIPM converges faster than when using the other two metrics.
In summary, when dealing with noiseless measurements, pAGM or pIPM based metrics are recommended in order to gain faster convergence speed. For Poisson noisy data, pIPM should be used instead, in order to obtain higher quality reconstruction results. Due to the existence of noise, the recovered images seem blurry, which can be further improved by considering sparse prior information as in [6]. This study will be considered as future work for this research 5.2. Performance of Algorithm 2. In this section only noiseless data is considered, and experiments are performed with periodical scanning on square and hexagonal lattices. The reconstruction results of Algorithm 2 can be found in Figure 11, and its converge histories are reported in Figure 12.
With additional prior information of the probe, Algorithm 2 reconstructs images with higher SNRs, Figure 11 (b, d), in contrast with the results of Figure 11 (a, c) by Algorithm 1. We can see how Algorithm 2 is able to completely remove the structural artifacts. Figure 12 (a) shows how the R-factors of Algorithm 2 decrease faster compared with those from Algorithm 1, and fewer iterations are needed to reach the given tolerance. It is important to note that, as shown in Figure 12 (b, c), the SNRs of the reconstructed images and probes are greatly increased, which demonstrates the advantage of incorporating the prior information of the probe to the proposed algorithm.

GPU Acceleration.
We conduct this experiment using Algorithm 1 with different D ∈ {12, 8, 6, 4, 3} to produce large-scale data, since the number of measurements m satisfies m ≈ n D 2 . Table 2 reports the corresponding execution times and GPU speedup ratios (SR) as SR := t CP U /t GP U . The GPU speedup ratios range between 12 ∼ 17. Considering that the GPU Matlab implementation has not been optimized, the proposed algorithm shows great potential for large-scale BP-PR problems. The number of iterations (IterN um) to reach the given tolerance are about 300 ∼ 340 with different D, which demonstrates that the convergence speed is not sensitive to the number of measurements for a large-scale problem. Although it is convenient to produce more measurements experimentally, with our algorithms, using fewer number of measurements can save memory and reduce the computation cost. One can also observe the computation cost for CPU (or GPU) is almost   Table 2 Performances of Algorithm 1 on GPU v.s. CPU. IterN um denotes the iteration number to reach the given tolerance, t CP U and t GP U denote the elapsed time in seconds for CPU and GPU, respectively. SR denotes the speedup ratio. Algorithm 1 stops if the R-factor is less than 1 × 10 −5 or if the maximum iteration number reaches 1000. The code runs on a desktop with CPU (Intel I7-4820K, 64GB RAM) and GPU (Nvidia TITAN-X with 12 GB GDDR5X) in single precision mode. The GPU version is directly implemented using the build-in functions of Matlab. In order to test the robustness of proposed algorithms, we conduct more experiments on a different probe (non-circularly symmetric) with its absolute value shown in Figure 13 (c). Two other images shown in Figure 13 (a)-(b) are considered with scanning on square-lattice and random-lattice. The convergence curves of the R-factor, SNRs of recovered images and probes are reported in Figure 14, where one can readily see that the R-factors by proposed algorithms decrease faster, and the recovery results have better quality (bigger SNRs). Especially the test reported in the first column of Figure 14 shows that ADMM-Prox, with the help of additional proximal terms, becomes more stable and faster compared with standard ADMM.
6. Conclusions. In this paper we propose different nonlinear optimization models with and without prior information of the probe for the blind phase retrieval ptychography problem. Efficient generalized ADMMs are designed to solve the proposed models, and numerous numerical experiments demonstrate the superior performance of the proposed algorithms in DR [37] PALM [21] ADMM Fig. 7. Results of DR [37], PALM [21] and ADMM (Algorithm 1) with random-lattice scanning for Poisson noisy data. Peak level η = 0.1 (top), η = 1 (bottom). β = 0.5, 0.2 for Algorithm 1 with η = 0.1, 1, respectively. All compared algorithms stop when the iteration number reaches Iter M ax = 300.
(a) DR [37] (b)PALM [21] (c) ADMM (d) Truth (e) DR [37] (f)PALM [21] (g) ADMM both speed and reconstruction quality compared with the state-of-the-art algorithms. Especially, the performance on GPU reported in this paper have motivated the development of a high performance C++ multi-GPU implementation of the ADMM algorithm proposed in this paper. The multi-GPU implementation of ADMM was reported [16] to process more than 3 million measured samples per second on a single GPU, and it has already been installed and being used on the microscopes of the Advanced Light Source (ALS), at Lawrence Berkeley National Laboratory for real-time analysis [12] and partial coherence analysis [5].   Convergence histories of R-factor, SNRs of images and probes from up to down, respectively by ePIE [26], DR [37], PALM [21] and ADMM (Algorithm 1), with D = 24 and scanning on square-lattice in 1st and 2nd columns and random-lattice in 3rd and 4th columns. All compared algorithms stop if R − factor k ≤ 1.0 × 10 −6 or iteration number reaches 1000. Results for the complex-valued image (Figure 13 (a)) are put in the 1st, 3rd columns, and for the real-valued image (Figure 13(b)) in other columns.
Appendix E. Proof of Theorem 1.