Primal-dual extragradient methods for nonlinear nonsmooth PDE-constrained optimization

We study the extension of the Chambolle--Pock primal-dual algorithm to nonsmooth optimization problems involving nonlinear operators between function spaces. Local convergence is shown under technical conditions including metric regularity of the corresponding primal-dual optimality conditions. We also show convergence for a Nesterov-type accelerated variant provided one part of the functional is strongly convex. We show the applicability of the accelerated algorithm to examples of inverse problems with $L^1$- and $L^\infty$-fitting terms as well as of state-constrained optimal control problems, where convergence can be guaranteed after introducing an (arbitrary small, still nonsmooth) Moreau--Yosida regularization. This is verified in numerical examples.

where F : Y → R := R ∪ {+∞} and G : X → R are proper, convex, and lower semicontinuous functionals, and K : X → Y is a (nonlinear) Fréchet-di erentiable operator between two Hilbert spaces X and Y with locally Lipschitz-continuous derivative K . Such problems arise for example in inverse problems with nonsmooth discrepancy or regularization terms or in optimal control problems subject to state or control constraints. We are particularly interested in the situation where K is a nonlinear operator involving the solution of a partial di erential equation and F is a nonsmooth discrepancy or tracking term. To x ideas, a prototypical example is the L tting problem i.e., G(u) = α u L , F (y) = y L , and K(u) = S(u) − y δ , where S maps u to the solution y of −∆y + uy = f for given f and y δ is a given noisy measurement; see [ ]. This problem Here, σ , τ > are appropriately chosen step lengths, K (u) * denotes the adjoint of the Fréchet derivative of K, and prox F * denotes the proximal mapping of the Fenchel conjugate of F ; we postpone precise de nitions to later and only remark that if F * is the indicator function of a convex set C, the proximal mapping coincides with the metric projection onto C. Such methods do not require (for linear K) choosing the initial guess su ciently close to the solution to ensure convergence or solving-possibly ill-conditioned-linear systems in each iteration. Consequently, they recently have received increasing interest also in the context of optimal control; see, e.g., [ , ]. In addition, other proximal point methods for optimal control problems have been treated in [ ] and [ ]; in particular, the latter is concerned with classical forward-backward splitting for sparse control of linear elliptic PDEs. However, so far these methods have only been considered in the nite-dimensional setting, i.e., after discretizing ( . ), or for speci c (linear) problems. One of the goals of this work is therefore to show convergence of Algorithm in Hilbert spaces and to demonstrate that it can be applied to problems of the form ( . ).
While the general convergence theory is a straightforward extension of the analysis in [ ] (in fact, the proof is virtually identical), it requires verifying a set-valued Lipschitz propertyknown as the Aubin or pseudo-Lipschitz property-on the inverse of a monotone operator H u encoding the optimality conditions. This is also called metric regularity of H u . This veri cation is signi cantly more involved in in nite dimensions. For problems of the form ( . ) where F and G are given by integral functionals for regular integrands, we can apply the theory from [ ] to obtain an explicit, veri able, condition for metric regularity to hold. While our analysis will show that for problems such as ( . ), this condition does in fact not hold in general unless a Moreau-Yosida regularization is introduced-or the data y δ and the tting term are nite-dimensionalwe do obtain convergence for arbitrarily small regularization parameter, and numerical examples show that this can be observed in practice independent of the discretization. Similarly, although for nonlinear operators, the convergence is only local since smallness conditions on the distance to the solution enter via bounds on the nonlinearity of the operator, in contrast to semismooth Newton methods we actually observe convergence for any starting point and arbitrarily small regularization parameter without the need for continuation.
In addition, Moreau-Yosida regularization results in a strongly convex functional, which can be exploited for accelerating Algorithm as in [ ] via adaptive step length and extrapolation parameters. This leads to the following iteration.
Algorithm Accelerated nonlinear primal-dual extragradient method : choose u , Here, µ ≥ is a xed acceleration parameter; setting µ = coincides with the unaccelerated Algorithm . The appropriate choice for µ > is related to the constant of strong convexity of F * , and in the convex case yields the optimal convergence rate of O( /k ) for the functional values rather than the rate O( /k) for the original version; see [ , , ]. A similar acceleration is possible if G is strongly convex by swapping the roles of σ i and τ i in Line ; we will refer to both variants as Algorithm in the following. Such an acceleration was not considered in [ ]. While a proof of the optimal convergence rate is outside the scope of this work, we show that Algorithm converges (locally) in in nite-dimensional Hilbert spaces under the same conditions as for Algorithm and demonstrate the accelerated convergence in numerical examples.
This work is organized as follows. In the remainder of this section, we summarize some notations and de nitions necessary for what follows. Section is concerned with the convergence analysis of the accelerated Algorithm in in nite-dimensional Hilbert spaces, where we discriminate the case of F * (Section . ) or G (Section . ) being strongly convex. We also brie y address the veri cation of metric regularity for functionals of the form ( . ) in Section . . A more detailed discussion for the speci c case of the motivating problems (L tting, L ∞ tting, and optimal control with state constraints) is given in Section , where we also derive the explicit form of the accelerated Algorithm in these cases. Section concludes with numerical examples for the three model problems.
. Convex analysis We assume G : X → R and F : Y → R to be convex, proper, lower semicontinuous functionals on Hilbert spaces X and Y , satisfying int dom G ∅ and int dom F ∅. We call, e.g., F strongly convex with constant γ F > if where ∂F denotes the convex subdi erential of F . We denote by the Fenchel conjugate of F , which is convex, proper, and lower semicontinous. As usual, we identify the topological dual Y * of Y with itself. The Moreau-Yosida regularization of F for the parameter γ > is de ned as whose Fenchel conjugate is (cf., e.g., [ , Prop. . (i) Note that F * γ is strongly convex with constant at least γ . For convex F , G and continuously Fréchet-di erentiable K, we can apply the calculus of Clarke's generalized derivative (which reduces to the convex subdi erential for convex functionals; see, e.g., [ , Chap. . ]) to deduce for ( . ) the overall system of critical point conditions Algorithm can be derived from these conditions with the help of the proximal mapping (or resolvent) of G, and similarly for F * . We recall the following useful calculus rules for proximal mappings, e.g., from [ , Prop. . (i), (viii)]: ( ) For any σ > it holds that ( ) For any γ > it holds that Set-valued analysis We rst de ne for U ⊂ X the set of Fréchet (or regular) normals to U at u ∈ U by and the set of tangent vectors by For a convex set U , these coincide with the usual normal and tangent cones of convex analysis. For any cone V ⊂ X , we also de ne the polar cone We use the notation R : Q ⇒ W to denote a set-valued mapping R from Q to W ; i.e., for every q ∈ Q holds R(q) ⊂ W . For R : Q ⇒ W , we de ne the domain dom R := {q ∈ Q | R(q) ∅} and graph Graph R := {(q, w) ⊂ Q × W | w ∈ R(q)}. The regular coderivatives of such maps are de ned graphically with the help of the normal cones. Let Q and W be Hilbert spaces, and R : Q ⇒ W with dom R ∅. We then de ne the regular coderivative D * R(q|w) : We also de ne the graphical derivative DR(q|w) : Q ⇒ W by Finally, we say that the set-valued mapping R : Q ⇒ W is metrically regular at w for q if Graph R is locally closed and there exist ρ, δ, > such that for any q, w such that q − q ≤ δ, w − w ≤ ρ.
We note that metric regularity of R is equivalent to the Aubin property of R − . Hence, for the sake of consistency with [ ], we denote the in mum over valid constants by R − ( w | q), or R − for short when there is no ambiguity about the point ( w, q). Metric regularity is then equivalent We now show the convergence in in nite-dimensional Hilbert spaces of Algorithm , where the acceleration is stopped at some iteration N . We begin by observing from the de nition of the proximal mapping that Algorithm may be written in the form for the family, over a base pointū ∈ X , of monotone operators the preconditioning operator , and the discrepancy term Observe (or see [ , Lem. . ]) that |ν i | ≤ C |u i+ − u i | for some constant C > . This is the only property needed from ν i for the convergence proof. Therefore ν i can also incorporate further discrepancies, e.g., from inexact evaluation of K which can be useful in the context of PDE-constrained optimization. Throughout, we set q = (u, ) ∈ X × Y and w = (ξ , η) ∈ X × Y , extending this notation to q, etc., in the obvious way. Here we x R > such that there exists a solution q to Note that the condition ∈ H u ( q) is equivalent to the necessary optimality condition ( . ) for ( . ). Regarding the operator K : X → Y and the step length parameters σ i , τ i > , we require that K is Fréchet-di erentiable with locally Lipschitz-continuous derivative K satisfying Observe that σ i τ i = σ τ is maintained under acceleration schemes such as the one in Algorithm ; it is therefore su cient to ensure this condition for the initial choice. We denote by L the local Lipschitz factor of u → K (u) on the closed ball B( , R) ⊂ X . We de ne the uniform condition number ( . c) κ := Θ/θ based on Θ and θ from the condition If τ i , σ i > are constant, u i ≤ R, and ( . b) holds, such θ and Θ exist [ , Lem. . ]. Easily this extends to < C ≤ τ i , σ i ≤ C < ∞.
Remark . . The bound ( . ), on which the analysis from [ ] depends, is the reason we need to stop the acceleration: Since τ i → and σ i → ∞, no uniform bound exists for M i if the acceleration is not stopped. Possibly the convergence proofs from [ ] can be extended to the fully accelerated case, but such an endeavour is outside the scope of the present work. In numerical practice, in any case, we stop the algorithm-and hence a fortiori the acceleration-at a suitable iteration N .
We now distinguish whether F * or G is strongly convex. The former is always guaranteed by Moreau-Yosida regularization of F , while the latter-if it holds in addition, which is the case in the examples considered here-might allow stronger acceleration, independent of the Moreau-Yosida parameter. In both cases, the convergence proof follows closely the original proof in [ , § -]. Although this was stated in nite-dimensional spaces, none of the arguments rely on this fact. Aside from the inverse mapping theorem for set-valued functions extracted from [ , Lem. . ], which holds in general complete metric spaces, the arguments are entirely algebraic manipulations. They therefore hold in in nite-dimensional Hilbert spaces as well.
Some modi cations are, however, necessary since the accelerated step sizes are no longer constant. The original proof starts with a basic descent inequality obtained from the monotonicity of H and assumes strong convexity properties. It then modi es this inequality through a sequence of lemmas to obtain an estimate from which a generic telescoping result quickly produces convergence [ , Thm. . ]. In the following, we detail the rst two elementary steps which contain changes to the original proof (for F * strongly convex only the second step changes, for G strongly convex both do). The remaining steps heavily employ the metric regularity of H and are unchanged and therefore only summarized brie y.
. F * We begin by considering the case of F * being strongly convex, which is closest to the setting of [ ]. In this case, we chose for µ ≥ the acceleration sequence ( . ) σ i+ := ω i σ i and τ i+ := τ i /ω i with ω i := / + µσ i .
Under the above assumptions, and if metric regularity holds for H u , Algorithm locally converges to a solution of ( . a).
We begin from the basic descent estimate obtained from the monotonicity of H u i and the assumed strong convexity.
Remark . . Strong convexity of F * with factor γ F * is equivalent [ ] to strong monotonicity of ∂F * in the sense that observing that there is no factor / in the latter unlike mistakenly written at [ , the end of page ]. Hence the slight di erence in the statement of ( D -loc-γ -F * ) in comparison to the similarly-named equation in [ ]. In the cited article, the exact factors make no di erence however; in the present work they do for the acceleration.
Note that ( D -loc-γ -F * ) still uses the old norm · M i for the new iterate. To pass to · M i + under acceleration requires replacing [ , Lem. . ]. For this, we rst need the following bound on the step lengths.
Proof. We rst note that Thus the claim holds if i.e., after multiplying both sides by ω i and using the de nition of ω i , In other words, we need to show that But using the concavity of the square root, we can estimate This proves ( . ).
The following lemma is the crucial step towards extending the results of [ ] to the accelerated case.

If
holds.
Proof. Using ( . ) and the property q ≤ R/ from ( . a), we have Choosing C ≤ R/( κ) and using ( . ) and ( . ), we thus get As both q i ≤ R and q i+ ≤ R, by local Lipschitz continuity of K we have again We now expand In the nal step, we have used the fact that {τ i } i ∈N is non-decreasing. Using ( . ), we further derive by application of Young's inequality Using once more Young's inequality, ( . ), and Lemma . , we deduce By application of ( . ) and ( D -loc-γ -F * ), we bound Setting c := L γ F * − µ and C := ( − ξ ) θ cκ and using ( . ) therefore yields Using ( . ) and this estimate in ( D -loc-γ -F * ), we obtain ( D -M).
The remaining proof now proceeds as in [ ]. Metric regularity-whose veri cation is the main di culty in function spaces and will be investigated based on the results of [ ] at the end of this section-allows removing the squares from ( D -loc-γ -F * ) and bridging from the perturbed local solutions q i to local solutions q i . This is done through a sequence of technical lemmas in [ , § . -. ] which culminate in the general descent estimate ( D) of [ , Thm. . ]. From there, a generic telescoping argument given in [ , Thm. . ] yields convergence, which we summarize in the following statement.
Theorem . . Let ( . ) be satis ed with the corresponding constants R, Θ, κ and L , and suppose F * is strongly convex with factor γ F * . Let q solve ∈ H u ( q) and H u be metrically regular at for q with ( . ) If µ ∈ [ , γ F * ) and we use the rule ( . ) for i = , . . . , N for some N ∈ N, after which τ i = τ N and σ i = σ N for i > N , there exists δ > such that for any q ∈ X × Y with q − q ≤ δ, the iterates q i+ = (u i+ , i+ ) generated by Algorithm converge to a solution q * = (u * , * ) of ( . ).
Proof. The proof is identical to that of [ , Thm. . ] and given here for the sake of completeness. Under the given assumptions, we can, for some ξ > , obtain from ( D -M) as in the proof of [ , Thm. . ] the inequality and consequently an application of ( . ) shows that This says that {q i } ∞ i= is a Cauchy sequence and hence converges to some q. It also implies that Lem. . ] states that under the given assumptions, it follows from ( D) that ν i → and hence that By ( . ), we moreover have −z i ∈ H u i (q i+ ). Using K ∈ C (X ; Y ) and the outer semicontinuity of the subgradient mappings ∂G and ∂F * , we see that Here the lim sup is in the sense of an outer limit [ ], consisting of the limits of all converging subsequences of elements i ∈ H u i (q i+ ). As by ( . ) we have −z i ∈ H u i (q i+ ), it follows in particular that ∈ H u ( q), which is precisely ( . ).
Remark . . Theorem . holds if F * is merely strongly convex on the "nonlinear" subspace i.e., if ( . ) holds merely for all , ∈ Y N L . In this case, in ( . ) can be replaced by P N L , the orthogonal projection of on , and a straightforward modi cation of Lemma . yields ( D -M).
Since the Moreau-Yosida regularization, required for metric regularity in our examples, already implies strong convexity on the full space, we do not treat this more general case in detail. .
Under the above assumptions, and if metric regularity holds for H u , Algorithm converges to a solution of ( . a) as before. First, a trivial modi cation of the proof of [ , Lem. . ] yields again the basic descent estimate.
If G is strongly convex on X with constant γ G > , then we have Analogously to Lemma . , one now derives the following bounds.
We can now show the main lemma to account for the acceleration in the case of strongly convex G.
Proof. Proceeding as in the proof of Lemma . , since now {σ i } i ∈N is non-decreasing, we derive from ( D -loc-γ -G) instead of ( . ) the estimate Aplying Young's inequality, ( . ), and Lemma . , we deduce We now conclude analogously to the proof of Lemma . .
The remaining proof now follows as in the case of strongly convex F * , and we obtain the following convergence result.
Theorem . . Let ( . ) be satis ed with the corresponding constants R, Θ, κ and L , and suppose G is strongly convex with factor γ G . Let q solve ∈ H u ( q) and H u be metrically regular at for q If µ ∈ [ , γ G ) and we use the rule ( . ) for i = , . . . , N for some N ∈ N, after which τ i = τ N and σ i = σ N for i > N , there exists δ > such that for any q ∈ X × Y with q − q ≤ δ, the iterates q i+ = (u i+ , i+ ) generated by Algorithm converge to a solution q * = (u * , * ) of ( . ).
. We nally address the veri cation of metric regularity in in nite-dimensional Hilbert spaces required for the convergence of Algorithm . Motivated by the problems considered in the next section, we assume that for a proper, convex, lower semicontinuous f * and (after rescaling F + G, see below) We wish to apply the results from [ ]. Towards this end, we consider the Moreau-Yosida regularization ( . ) of F for some parameter γ > , and assume (using ( . )) that the convexi ed graphical derivative of the regularized subdi erential satis es at least at non-degenerate points for some cone V ∂F * ( |η) and a pointwise-de ned self-adjoint positive semi-de nite linear superposition operator T : Using the sum rule for graphical coderivatives from [ , Cor. . ], we deduce that D[∂F * γ ] has the same type of structure with For the Moreau-Yosida regularized problem, we denote the corresponding operator H u by H γ , u . Then we have the following result.
Proposition . ([ , Prop. . ]). Assume ( . ) holds and K ∈ C (X ; Y ). Suppose further that q solves ∈ H γ , u ( q) for someF ≥ . Then H γ , u is metrically regular at for q if and only if T + γ I βI for some β > , or Proof. In [ , Prop. . ], we actually take T = . However, the only place where this speci c structure is used is [ , Lem. . ]. In Lemma . in Appendix , we have updated the su cient conditions of the former to be able to deal with general T .
This implies convergence for any choice of the Moreau-Yosida regularization parameter γ > . On the other hand, if γ = , we typically have to prove existence of a lower bound forb. This is signi cantly more di cult. We will address the issue of verifying-or disproving-the lower bound onb with speci c examples in the next section.
We now discuss the application of the preceeding analysis to the motivating examples of L tting, L ∞ tting, and optimal control with state constraints. Since this will depend on the speci c structure of the mapping S, we consider as a concrete example the problem of recovering the potential term in an elliptic equation.
This operator has the following useful properties [ ]: ( ) The operator S is uniformly bounded in U ⊂ X and completely continuous: If for u ∈ U , the sequence {u n } ⊂ U satis es u n u in X , then ( ) S is twice Fréchet di erentiable.
( ) There exists a constant C > such that ( ) There exists a constant C > such that Furthermore, from the implicit function theorem, the directional Fréchet derivative S (u)h at u ∈ U for given h ∈ X can be computed as the solution w ∈ H (Ω) to ∇w, ∇ + uw, = −yh, ( ∈ H (Ω)).
Similar expressions hold for S (u)(h , h ) and (S (u) * h ) h . Hence, assumptions ( -) hold for S * and (S (u) * ) for given as well.
Other operators satisfying the above assumptions are mappings from a Robin or di usion coe cient to the solution of the corresponding elliptic partial di erential equation; cf. [ ].
. L First, we consider the L tting problem ( . ). In order to make use of the strong convexity of the penalty term for the acceleration, we rewrite this equivalently as i.e., we set G(u) = u L , K(u) = S(u) − y δ , and F (y) = α y L in ( . ). Hence where ι C denotes the indicator function of the convex set C in the sense of convex analysis [ ].

Using rule ( ) above, we thus obtain for the Moreau-Yosida regularization
Since G is strongly convex with constant γ G = , we can use the acceleration scheme ( . ) for any µ < . Algorithm thus has the following explicit form, where we denote the dual variable with p instead of to be consistent with the notation in this section.
(We remark that in the case of nite-dimensional data y δ ∈ Y h ⊂ Y , replacing F by F • P h where P h denotes the orthogonal projection onto Y h , there exists a constant c > such that b( q| ; H u,h ) ≥ c > holds; see [ , § . ]. Hence, regularization is not necessary in this case.) We summarize the above discussion on the convergence for the in nite-dimensional L tting problem ( . ) in the next corollary.
Proof. Note that G is strongly convex with factor , while Moreau-Yosida regularization makes F * γ strongly convex with factor γ . By Proposition . , H γ , u is metrically regular at for q. The claim now follows from Theorem . .
Remark . . In general, ensuring that the iterates generated by Algorithm remain feasible, i.e., satisfy u i ∈ U , requires adding an explicit constraint to ( . ). This would lead to a nonsmooth G(u) = u L + ι [ε,∞) (u) (where the indicator function is to be understood pointwise almost everywhere), which was not considered in [ ]. The analysis there could be extended to cover this case; speci cally, all non-degenerate cases would be covered by improving [ , Lem. . ] to include the case VḠ = { } instead of just VḠ = X , see Lemma . in Appendix .
However, to be able to directly apply the theory as stated in [ ], and since in our numerical examples the iterates are always feasible as long as the minimizer and the initial guess are su ciently far from the lower bound, we omit the constraint in our model problems.
. L ∞ We next consider the L ∞ tting ("Morozov") problem from [ ], i.e., now F ( ) = ι [−δ,δ ] ( ) (again to be understood pointwise almost everywhere) with G and K as before. Again, it is well-known that the Moreau-Yosida regularization of pointwise constraints is given by its quadratic penalization, i.e., Hence, where now F * ( ) = δ L . In this case, the proximal mapping of F * is given by For the Moreau-Yosida regularization F * γ , we obtain after some simpli cation Again, we use the acceleration scheme ( . ) for µ < γ G = . Algorithm now has the following explicit form.
Algorithm Accelerated primal-dual algorithm for L ∞ tting : choose u , p : for i = , . . . , N do As before, we deduce from the characterization of D[∂F * ] from [ , Cor. . ] that ( . ) holds for F * , while the discussion in [ , § . ] shows that metric regularity of H γ , u only holds for γ > (or nite-dimensional data). Summarizing, we have the following convergence result for the in nite-dimensional L ∞ tting problem ( . ).
. Finally, we address the state-constrained optimal control problem in Ω.
In this case, G is as before and F (y) = α − y d L + ι (−∞,c] (y) with K(u) = S(u). For simplicity, we assume here that the upper bound c is constant; the extension to variable c ∈ L ∞ (Ω) (as well as lower bounds) is straightforward.
For F γ , we directly use the de nition ( . ) to compute pointwise The corresponding regularized optimality conditions are again given by It remains to compute F * . Since y d ∈ L (Ω) is measurable, is a proper, convex, and normal integrand, and hence we can proceed by pointwise computation. Let x ∈ Ω be arbitrary. For the Fenchel conjugate with respect to y, we consider the rst-order necessary conditions for the maximizer Inserting this into the de nition and making the case distinction whether α + y d (x) ≤ c yields The subdi erential (with respect to z) is given by (Note that the cases agree for z = αc − y d (x), i.e., z → ∂ f * (x, z) is single-valued and hence z → f * (x, z) is continuously di erentiable for almost every x ∈ Ω.) To compute the pointwise proximal mapping prox σ f * (x, ·) ( ) for given x ∈ Ω, we use the resolvent formula prox σ f * (x, ·) ( ) = (Id +σ ∂ f * (x, ·)) − ( ) =: w, i.e., ∈ {w } + σ ∂ f * (x, w), together with ( . ) and distinguish the two cases Together we obtain Again, we use the acceleration scheme ( . ) for µ < γ G = . Algorithm now has the following explicit form, where P for a logical proposition P depending on x ∈ Ω, denotes the pointwise Iverson bracket, i.e., P (x) = if P(x) is true and else.
Algorithm Accelerated primal-dual algorithm for state constraints : choose u , p : for i = , . . . , N do Let us assume that strict complementarity holds, i.e., α (x) c − y d (x) for a.e. x ∈ Ω.
Then it follows from Corollary . in Appendix that ( . ) is satis ed for F * . Furthermore, since t (x) ∈ { , α } for a.e. x ∈ Ω and V ∂F * ( |η) = L (Ω) and V ∂F * ( |η) • = { } locally in a neighbourhood of ( u, ), we deduce that However, the lower bound does not hold in general. This can be seen by taking any orthonormal basis of L (Ω), which converges weakly but not strongly to zero, and use the fact that S (u) is a compact operator from L (Ω) to L (Ω) due to the Rellich-Kondrachev embedding theorem. Therefore, alsob( q| ; H u ) = . By Proposition . , there is thus no metric regularity without regularization (γ > ). (Similarly to the L tting problem, if the state constraints are only prescribed at a nite number of points, it is possible to show metric regularity for γ = as well.) The next corollary, which follows similarly to Corollary . , summarizes the convergence results from Theorems . and . for the in nite-dimensional state-constrained optimal control problem ( . ).
We now illustrate the convergence behavior of the primal-dual extragradient method for the three model problems in Section . Since we are interested in the properties of the algorithm in function spaces, we consider here the case in d = dimension to allow for very ne discretizations with reasonable computational e ort. We have also tested the model problems in d = dimensions and observed very similar behavior.
In each case, the operator S corresponds to the solution of ( . ) for Ω = (− , ) and constant right-hand side f ≡ . For the implementation, we use a nite element approximation of ( . ) on a uniform grid with n = elements (unless stated otherwise) with a piecewise constant discretization of u and a piecewise linear discretization of y as in [ ]. The functional values are computed using an approximation of the integrals by mass lumping, which amounts to a proper scaling of the corresponding discrete sums. In this way, the functional values are independent of the mesh size.
The parameters in the primal-dual extragradient method are chosen as follows: The Moreau-Yosida parameter is xed at γ = − unless otherwise stated, and we compare the two cases of µ = (no acceleration) and µ = − − < = γ G (full acceleration). We point out that this value of γ is signi cantly smaller than those for which semismooth Newton methods tend to converge even with continuation; cf. [ , ]. As a starting value, we take in each case u ≡ and p ≡ . The (initial) step sizes are set to σ =L − and τ = . L − , whereL = max{ , S (u )u / u } is a very simple estimate of the Lipschitz constant of K = S . The algorithm (and the acceleration) is terminated after a prescribed number N of iterations. The MATLAB implementation used to generate the results in this section can be downloaded from h ps://github.com/clason/nlpdegm. . L We rst consider the L tting problem ( . ) using the example from [ ]: We choose the exact parameter u † (x) = − |x | and corresponding exact data y † = S(u † ) and add random-valued impulsive noise by setting where for each x ∈ Ω, ξ (x) is an independent normally distributed random value with mean and variance δ . For the results shown, we take r = . and δ = . , i.e., % of data points are corrupted by % noise. We then apply Algorithm with N = iterations and α = − xed. Figure compares the convergence behavior of the functional values with µ = and µ ≈ (for the same data y δ ). The e ect of acceleration can be seen clearly. Note that the convergence is nonmonotone due to the acceleration (and the aggressive choice of step lengths).
The convergence behavior for di erent mesh sizes is illustrated in Figure , Figure : L tting: convergence for di erent values of Moreau-Yosida regularization parameter γ without and with acceleration y δ in order to mitigate the in uence of the random data). As can be observed, the number of iterations to reach a given functional value is virtually independent of the mesh size. This property-shared by many function-space algorithms-is often referred to as mesh independence. Finally, we report on the e ect of the Moreau-Yosida parameter γ on the performance of the algorithm. where y δ is obtained from y † = S(u † ) (with u † as above) by quantization. Speci cally, we set where n b denotes the number of bins and [s] denoting the nearest integer to s ∈ R (i.e., the data are rounded to n b discrete equidistant values). Here we take n b = and apply Algorithm for N = iterations. Again, Figure  . Finally, we consider the state-constrained optimal control problem ( . ). Here, we choose the desired state y d = S(u † ) (with u † again as before) and the constraint c = . . The control costs are set to α = − , and we again terminate acceleration (and the algorithm) after N = iterations.
As before, Figures and illustrate the bene t of acceleration and the mesh independence of the algorithm, respectively. Since in this example, the solution only becomes feasible for very small values of γ , the visual comparison of the e ect of γ on the performance is more Accelerated primal-dual extragradient methods with nonlinear operators can be formulated and analyzed in function space. Their convergence rests on metric regularity of the corresponding saddle-point inclusion, which can be veri ed for the class of PDE-constrained optimization problems considered here after introducing a Moreau-Yosida regularization. Unlike semismooth Newton methods (which also require Moreau-Yosida regularization in function space, cf., e.g., This work can be extended in a number of directions. We plan to investigate the possibility of obtaining convergence estimates on the primal variable alone under lesser assumptions. An alternative would be to exploit the uniform stability with respect to regularization for xed discretization, and with respect to discretization for xed regularization, to obtain a combined convergence for a suitably chosen net (γ , h) → ( , ). This is related to the adaptive regularization and discretization of inverse problems [ , ]. Furthermore, it would be of interest to extend our analysis to include nonsmooth regularizers G, which were excluded in the current work for the sake of the presentation.

[ , . . ]
Here we improve the su cient condition [ , Lem. . (i)] to allow for a more general linear operatorF thanF = γ I as well as for a more general cone VḠ than VḠ = X arising from the graphical derivative of ∂F * γ and ∂G, respectively. These modi cations are necessary for the treatment of state constraints; the latter is also the basis for extending the analysis in [ ] to cover pointwise constraints on the primal variable as mentioned in Remark . .