Splines are Universal Solutions of Linear Inverse Problems with Generalized-TV regularization

Splines come in a variety of flavors that can be characterized in terms of some differential operator L. The simplest piecewise-constant model corresponds to the derivative operator. Likewise, one can extend the traditional notion of total variation by considering more general operators than the derivative. This leads us to the definition of the generalized Beppo-Levi space M, which is further identified as the direct sum of two Banach spaces. We then prove that the minimization of the generalized total variation (gTV) over M, subject to some arbitrary (convex) consistency constraints on the linear measurements of the signal, admits nonuniform L-spline solutions with fewer knots than the number of measurements. This shows that non-uniform splines are universal solutions of continuous-domain linear inverse problems with LASSO, L1, or TV-like regularization constraints. Remarkably, the spline-type is fully determined by the choice of L and does not depend on the actual nature of the measurements.


Introduction
Imposing sparsity constraints is a powerful paradigm for solving ill-posed inverse problems and/or for reconstructing signals at super-resolution [6]. This is usually achieved by formulating the task as an optimization problem that includes some form of 1 regularization [49]. The concept is central to the theory of compressed sensing (CS) [9,19] and is currently driving the development of a new generation of algorithms for the reconstruction of biomedical images [36]. The primary factors that have contributed to making sparsity a remarkably popular research topic during the past decade are as follows: • the possibility of recovering the signal from few measurements (CS) with a theoretical guarantee of perfect recovery under strict conditions [7,10,19]; • the availability of fast iterative solvers for this class of problems [4,14,26,39]; • the increasing evidence of the superiority of the sparsity-promoting schemes over the classical linear reconstruction (including the Tikhonov 2 regularization) in a variety of imaging modalities [36].
The approach developed in this paper is also driven by the idea of sparsity. However, it deviates from the standard paradigm because the recovery problem is formulated in the continuous domain under the practical constraint of a finite number of linear measurements. The ill-posedness of the problem is then dealt with by searching for a solution that is consistent with the measurements and that minimizes a generalized version of the total-variation (TV) seminorm-the continuous-domain counterpart of 1 regularization. Our major finding (Theorem 1) is that the extremal points of this kind of recovery problem are nonuniform splines whose type is matched to the regularization operator L. The powerful aspect is that the result holds in full generality, as long as the problem remains convex. The only constraint is that the linear inverse problem should be well-posed over the (very small) null space of the regularization operator, which is the minimal requirement for any valid regularization scheme. In particular, Theorem 1 gives a theoretical explanation of the well-documented observation that total variation regularization-the simplest case of the present theory with L = D (derivative operator)-tends to produce piecewise-constant solutions [11,41]. Recognizing the intimate connection between linear inverse problems and splines is also helpful for discretization purposes because it provides us with a parametric representation of the solution that is controlled by the regularization operator L. In that respect, our representer theorems extend some older results on spline interpolation with minimum L 1 -norms, including the adaptive regression splines of Mammen and van de Geer [38] and the functional analytic characterization of Fisher and Jerome [27]. There is a connection as well with the work of Steidl et al. on splines and higher-order TV [48], although their formulation is strictly discrete and restricted to the denoising problem.

Linear Inverse Problems: Current Status and Motivation
Our notational convention is to use bold letters to denote ordinary vectors and matrices to distinguish them from their infinite-dimensional counterparts; that is, functions (such as s) and linear operators (such as L). Simply stated, the inverse problem is to recover a signal s from a finite set of linear measurements y = y 0 (s) + n ∈ R M where n is a disturbance term that is usually assumed to be small and independent of s. In most real-world problem the unknown signal lives in the continuum so that it is appropriate to view it as an element of some Banach space B. Then, by the assumption of linearity, there exists a set of functionals ν m ∈ B (the continuous dual of B) with m = 1, . . . , M such that the noise-free measurements are given by y 0 = ν(s) = ( ν 1 , s , . . . , ν M , s ). The measurement functionals ν m are governed by the underlying physics (forward model) and assumed to be known. Since the signal s ∈ B is an infinite-dimensional entity and the number of measurements is finite, the inverse problem is obviously ill-posed, not to mention the fact that the true measurements y are typically only approximate versions of y 0 since they are corrupted by noise.

Finite-Dimensional Formulation
The standard approach for the resolution of such inverse problems is to select some finitedimensional reconstruction space V = span{ϕ n } N n=1 ⊂ B. Based on the (simplifying) assumption that s ∈ V, one then converts the original noise-free forward model into the discretized version y 0 = Ax, where x ∈ R N represents the expansion coefficients of s in the basis {ϕ n } N n=1 . Here, A is the so-called sensing matrix of size (M × N ) whose entries are given by [A] m,n = ν m , ϕ n .
The basic assumption made by the theory of compressed sensing is that there exists a finitedimensional basis (or dictionary) {ϕ n } N n=1 that "sparsifies" the class of desired signals with the property that x 0 ≤ K 0 for some fixed K 0 which is (much) smaller than N ; in other words, it should be possible to synthesize the signal exactly by restricting the expansion to no more than K 0 atoms in the basis {ϕ n } N n=1 [20,23,40]. The signal recovery is then recast as the constrained optimization problem arg min where the minimization of the 1 norm promotes sparse solutions [49]. The role of the righthand-side inequality is to encourage consistency between the noisy measurements y and their noise-free restitution y 0 = Ax. The popularity of (1) stems from the fact that the theory of CS guarantees a faithful signal recovery from M > 2K 0 measurements under strict conditions on A (i.e., restricted isometry) [7,10,19]. Instead of basing the recovery on the synthesis formula s = n x n ϕ n ∈ V, one can adopt an alternative analysis or regularization point of view. To that end, one typically assumes that s is discretized in some implicit "pixel" basis with expansion coefficients s = (s 1 , . . . , s N ) ∈ R N , where the s n are the samples of the underlying signal. The corresponding system matrix (forward model) is denoted by H : R N → R M . Given some appropriate regularization operator L : R N → R N , the idea then is to exploit the property that the transformed version of the signal, Ls, is sparse. This translates into the optimization problem arg min which is slightly more involved than (1). The two forms are equivalent only when N = N and L is invertible, the connection being A = HL −1 . For computational purposes, (2) is often converted into the equivalent unconstrained version of the problem arg min where λ ∈ R + is an adjustable regularization parameter that needs to tuned such that y − Hs 2 2 = 2 . One of the preferred choices for L is the finite-difference operator-or the discrete version of the gradient in dimensions higher than one. This corresponds to the "total-variation" reconstruction method, which is widely used in applications [3,11,30,41].
The sparsity-promoting effect of these discrete formulations and the conditions under which the expansion coefficients of the signal can be recovered are fairly well understood [28,54]. What is less satisfactory is the intrinsic interdependence between the sparsity constraints and the choice of the appropriate reconstruction space, which makes it difficult to deduce rates of convergence and error estimates relating to the underlying continuous-domain recovery problem.

Infinite-Dimensional Formulation
Recently, Adcock and Hansen have addressed the above limitation by formulating an infinitedimensional theory of CS [1]. The measurements are the same as before, but the unknown signal is now a function s : R d → R. For the purpose of illustration, we take d = 1 and s ∈ BV(R) with the property that such a function admits the (unique) expansion s = n w n ψ n in the (properly normalized) Haar wavelet basis {ψ n }. It is known that the condition s ∈ BV(R) implies the inclusion of w = (w n ) in weak-1 (Z)-a space that is slightly larger than 1 (Z) [12]. Conversely, one can force the inclusion in BV(R) by imposing a bound on the 1 norm of these coefficients. If one further assumes that the signal is sparse in the Haar basis, one can recast the reconstruction problem as which is the infinite-dimensional counterpart of the synthesis formulation (1). The key question is to derive conditions on how to choose the ν m to guarantee recovery of wavelet coefficients up to a certain scale. This has been done in [1,2], which means that the issue of convergence is now reasonably well understood for the synthesis form of the recovery problem. In our framework, M D (R) is the space of functions on R of bounded (total) variation, which is slightly larger than BV(R) because it also includes constant signals. This allows us to close the circle by enforcing a regularization on the "true" total variation of the solution, which is associated with the derivative operator D = d dx . This results in the functional optimization problem which is the continuous-domain counterpart of (2). Now, the motivation for our present theory is that (5) corresponds to a special case of Theorem 1 with L = D and the closed compact convex set C ⊂ R M being specified by the inequality on the right-hand-side of (5); that is, The key is that the differentiation operator D is splineadmissible in the sense of Definition 1: Its causal Green's function is the Heaviside (or unit-step) function ρ D (x) = 1 + (x) whose rate of growth is n 0 = 0, while its null space N D = span{p 1 } with p 1 (x) = 1 is composed of all constant-valued signals. This implies that the extreme points of (5) necessarily take the form with K ≤ M . This corresponds to a piecewise-constant signal with jumps of size a k at the x k , as illustrated in Figure 1. The solution also happens to be a polynomial spline of degree 0 with knots at the x k with the property that D{s} = K k=1 a k δ(· − x k ) = w δ , which is a weighted sum of shifted Dirac impulses (the innovation of the spline), as shown on the bottom of Figure  1. In view of Definition 2, the solution (6) can also be described as a nonuniform L-spline with L = D. The remarkable aspect of this result is that the parametric form (6) is universal, in the sense that it does not dependent on the measurement functionals ν m . To the best of our knowledge, this is the first mathematical explanation of the well-known observation that TV regularization tends to enforce piecewise-constant solutions. The other interesting point is that one can interpret the solution as the best K-term representation of the signal within an infinitedimensional dictionary that consists of a constant signal p 1 plus a continuum of shifted Green's functions (i.e., {1 + (· − τ )} τ ∈R ), making the connection with the synthesis views (1) and (4) of the problem. Also, note that the described sparsifying effect is much more dramatic than that of the finite-dimensional setting since one is collapsing a continuum (integral representation) into a discrete and finite sum.
We shall now show that the mechanism at play is very general and transposable to a much broader class of regularization operators L and data-fidelity terms, as well as for the multidimensional setting. Figure 1: Prototypical solution of a linear inverse problem with total-variation regularization. The signal is piecewise-constant; in other words, it is a nonuniform L-spline with L = D (derivative operator). The application of D uncovers the innovation w δ : The Dirac impulses are located at the points of discontinuity (knots), while their height (weight) encodes the magnitude of the corresponding jump.

Road Map of the Paper
The remainder of this paper is organized as follows: After setting the notation, we present and discuss of our main representer theorem (Theorem 1) in Section 3. We also provide a refined version for the simpler interpolation scenario (Theorem 2). We then proceed with the review of primary applications in Section 4. The mathematical tools for proving our results are developed in the second half of the paper. The first enabling component is the tight connection between splines and operators, which is the topic of Section 5. In particular, we present an operator-based method to synthesize a spline from its innovation, which requires the construction of an appropriate right-inverse operator (Theorem 4). The existence of such inverse operators is fundamental to the characterization of the native spaces associated with our generalized total-variation criterion (gTV) (Theorem 5), as we show in Section 6. The actual proof of Theorems 1 and 2 is given in Section 7. It relies on a preparatory result (generalized Fisher-Jerome theorem) that establishes the impulsive form of the solutions of some abstract minimization problem over the space M(R d ) of bounded Borel measures.
We conclude the paper in Section 8 with a brief discussion of open issues.

Representer Theorems for Generalized Total Variation
Although we are considering a finite number of measurements, we are formulating the reconstruction problem in the continuous domain. This calls for a precise specification of the underlying functional setting.

Notation
The space of tempered distribution is denoted by S (R d ) where d gives the number of dimensions. This space is made of continuous linear functionals µ : ϕ → µ, ϕ acting on the Schwartz' space S(R d ) of smooth and rapidly decaying test functions on R d [29,33]. We shall primarily work with the space M(R d ) of regular, real-valued, countably additive Borel measures on R d , which is also known (by the Riesz-Markov theorem) to be the continuous dual of C 0 (R d ): the Banach space of continuous functions on R d that vanish at infinity equipped with the supremum norm · ∞ [42, Chap. 6]. Since S(R d ) is dense in C 0 (R d ), this allows us to define M(R d ) as and also to extend the space of test functions to ϕ ∈ C 0 (R d ). The action of w will be denoted by ϕ → w, ϕ = R d ϕ(x)w(x)dx where the right-hand side stands for the Lebesgue integral of ϕ with respect to the underlying measure 1 . The bottom line is that M(R d ) is the Banach space associated with the norm · M which returns the "total variation" of the measure that specifies w.
Two key observations in relation to our goal are: 1. the compatibility of the L 1 and total-variation norms with the former being stronger than the latter. Indeed,

the inclusion of Dirac impulses in
We shall monitor the algebraic rate of growth/decay of (ordinary) functions of the variable x ∈ R d by verifying their inclusion in the Banach space where for α 0 ≥ |m| = m 1 + · · · + m d . A linear operator whose output is a function is represented with a roman capital letter (e.g., L). The action of L on the signal s is denoted by s → L{s}, or Ls for short. Such an operator is said to be shift-invariant if it commutes with the shift operator s → s(· − x 0 ); that is, if L{s(· − x 0 )} = L{s}(· − x 0 ) for any admissible signal s and x 0 ∈ R d .

Main Result on the Optimality of Splines
Since the solution is regularized, the constrained minimization is performed over some native space M L (R d ) that is tied to some admissible differential operator L, such as D, D 2 (second derivative), or ∆ (Laplacian) for d > 1.
Definition 1 (Spline-admissible operator). A linear operator L : 3. the (growth-restricted) null space of L, It is largest function space for which the generalized total variation This also means that gTV is only a semi-norm on M L (R d ). However, it can be turned into a proper norm by factoring out the null space of L. We rely on this property and the finite dimensionality of N L to prove that M L (R d ) is a bona fide Banach space (see Theorem 5).
Having set the functional context, we now state our primary representer theorem.
Theorem 1 (gTV optimality of splines for linear inverse problems). Let us assume that the following conditions are met: 1. The regularization operator L : is spline-admissible in the sense of Definition 1.

The linear measurement operator
3. The recovery problem is well-posed over the null space of L: ν(q 1 ) = ν(q 2 ) ⇔ q 1 = q 2 , for any q 1 , q 2 ∈ N L .
Then, the extremal points of the general constrained minimization problem where C is any (feasible) convex compact subset of R M , are necessarily nonuniform L-splines of the form n=1 is a basis of N L and L{ρ L } = δ so that β = Ls M = K k=1 |a k | = a 1 . The full solution set of (10) is the convex hull of those extremal points.
Theorem 1 is a powerful existence result that points towards the universality of nonuniform L-spline solutions. The key property here is L{s} = K k=1 a k δ(· − x k ), which follows from Conditions 1-3 in Definition 1 and is consistent with the more detailed characterization of splines presented in Section 5. For the time being, it suffices to remark that these splines are smooth (i.e., infinitely differentiable) everywhere, except at their knot locations {x k }.
Although the extremal problem is defined over a continuum, the remarkable outcome is that the problem admits solutions that are intrinsically sparse, with the level of sparsity being measured by the minimum number K of required spline knots. In particular, this explains why the solution of a problem with a TV/L 1 -type constraint on the derivative (resp., the second derivative) is piecewise-constant (resp., piecewise linear when L = D 2 ) with breakpoints at x k . The other pleasing aspect is the direct connection between the functional concept of generalized TV and the 1 -norm of the expansion coefficients a.
We observe that the solution is made up of two components: an adaptive one that is specified by {x k } and a, and a linear regression term (with expansion coefficients b) that describes the component in the null space of the operator. Since b does not contribute to Ls M , the optimization tends to maximize the contribution of the null-space component. The main difficulty in finding the optimal solution is that K and (x k ) are problem-dependent and unknown a priori.
We have mentioned in Section 2.2 that the semi-norm Df M yields the classical total variation of a function in 1D. Unfortunately, there is no such direct connection for d > 1, the reason being that the multidimensional gradient ∇ is not spline-admissible because it is a vector operator. Instead, as a proxy for the popular total variation of Rudin and Osher [41], we suggest using the (fractional) Laplacian semi-norm (−∆) γ/2 f M with γ ≥ d, which is endowed with the same invariance and null-space properties. According to Theorem 1, such a γth-order regularization results in extremal points that are nonuniform polyharmonic splines [21,37].

Connection with Unconstrained Problem
The statement in Theorem 1 is remarkably general. In particular, it covers the generic regularized least-squares problem which is commonly used to formulate linear inverse/compressed-sensing problems [6,9,19,23,26]. The connection is obtained by taking which is a ball of diameter centered on the measurement vector y = (y 1 , . . . , y M ). Indeed, since the datafidelity term is (strictly) convex, the extreme points s of (10) saturate the inequality such that y − ν(s ) 2 2 = 2 and gTV is minimized with α = α( ) = Ls M . In the unconstrained form (12), the selection of a fixed λ ∈ R + results in a particular value of the data error y − ν(f λ ) 2 2 = (λ) with the optimal solution f λ = s having the same total variation as if we were looking at the primary problem (10) with C = B(y; ).
To get further insights on the optimization problem (12), we can look at two limit cases. When λ → ∞, the solution must take the form f ∞ = p ∈ N L so that Lf ∞ M = 0. It then follows that y − ν(f ∞ ) 2 2 ≤ y 2 < ∞. On the contrary, when λ → 0, the minimization will force the data term y−ν(f 0 ) 2 2 to vanish. Theorem 1 then ensures the existence of a nonuniform "interpolating" L-spline f 0 (x) with ν(f 0 ) = y and minimum gTV semi-norm.

Generalized Interpolation
In the latter interpolation scenario, the convex set C reduces to the single point C = {y ∈ R M }. This configuration is of special theoretical relevance because it enables us to refine our upper bound on the number K of spline knots.
Theorem 2 (Generalized spline interpolant). Under Assumptions 1-3 of Theorem 1, the extremal points of the (feasible) generalized interpolation problem arg min are nonuniform L-splines of the same form (11) as in Theorem 1, but with K ≤ (M − N 0 ).

Application Areas
We first briefly comment on the admissibility conditions in Theorem 1 and indicate that the restrictions are minimal. To the best of our knowledge, the continuity of the measurement operator ν is a necessary requirement for the mathematical analysis of any inverse problem. The difficulty here is that our native space is non-reflexive, which forces us to rely on the weak*-topology. The continuity requirement in Theorem 1 is therefore equivalent is laid out in Theorem 6. In particular, we refer to the norm inequality (25), which suggests that Condition 2 in Theorem 1 is met by picking ν m ∈ L 1,−n 0 (R d ) where the latter is the Banach space associated with the weighted L 1 -norm In fact, L 1,−n 0 (R d ) is the predual of the space L ∞,n 0 (R d ) defined by (8), which implies that The condition ν m ∈ L 1,−n 0 (R d ) is a mild algebraic decay requirement that turns out to be satisfied by the impulse response of most physical devices. As for the requirement that the inverse problem is well defined over the null space of L (Condition 3), it a prerequisite to the success of any regularization scheme. Otherwise, there is simply no hope of turning an ill-posed problem into a well-posed one. For instance, in the introductory example with classical total-variation regularization, the constraint is that ν should have at least one component ν m such that ν m , 1 = 0 which, again, is very mild requirement. Next, we discuss examples of signal recovery that are covered by Theorems 1 and 2. The standard setting is that one is given a set of noisy measurements y = ν(s) + "noise" of an unknown signal s and that one is trying to recover s from y based on the solution of (12), or some variant of the problem involving some other (convex) data term-the most favorable choice being the log likelihood of the measurement noise. We shall then close the discussion section by briefly making the connection with a class of inverse problems in measure space; that is, the case L = Identity.

Interpolation
The task here is to reconstruct a continuous-domain signal from its (possibly noisy) nonuniform samples {s(x m )} M m=1 , which is achieved by searching for the function s : R d → R that fits the samples while minimizing Ls M . This corresponds to the problem setting in Theorem 1 with ν m = δ(· − x m ) and C = B(y; ), where y denotes the measurement vector. Hence, the admis- where the boundedness is ensured by the stability condition in Theorem 4. The more technical continuity requirement is achieved when ρ L is continuous (Hölder exponent r 0 > 0). This happens when the order of the differential operator is greater than one, which seems to exclude 2 simple operators such as D (piecewise-constant approximation). This limitation notwithstanding, our theoretical results are directly applicable to the problems of adaptive regression splines [38] with L = D N , the construction of shape-preserving splines [35], as well as a whole range of variations including TV denoising.

Generalized Sampling
The setting is analogous to the previous one, except that the samples are now observed through a sampling aperture φ ∈ L 1,−n 0 (R d ) so that ν m = φ(· − x m ) [24,50]. The function φ may, for example, correspond to the point-spread function of a microscope. Then, the recovery problem is equivalent to a deconvolution [18]. Since the measurements are obtained by integration of s against an ordinary function ν m ∈ L 1,−n 0 (R d ), there is no requirement for the continuity of ρ L because of the implicit smoothing effect of φ. This means that essentially no restrictions apply.

Compressed Sensing
The result of Theorem 1 is highly relevant to compressed sensing, especially since the underlying L 1 /TV signal-recovery problem is formulated in the continuous domain. We like to view (11) as the prototypical form of a piecewise-smooth signal that is intrinsically sparse with sparsity K = a 0 . The model also conforms with the notion of a finite rate of innovation [56]. If we know that the unknown signal s has such a form, then Theorem 1 suggests that we can attempt to recover it from an M -dimensional linear measurement y = ν(s) by solving the optimization problem (10) with C = B(y; ), which is in agreement with the predominant paradigm in the field. While the theorem states that M ≥ K, common sense dictates that we should take M > N freedom , where N freedom = 2K + N 0 is the number of degrees of freedom of the underlying model. The difficulty, of course, is that a subset of those parameters (the spline knots x k ) induce a model dependency that is highly nonlinear.

Inverse Problems in the Space of Measures
Some of the theoretical results of this paper are also of direct relevance for inverse problems that are formulated in the space M(R d ) of measures [5]. The prototypical example is the recovery of the location (with super-resolution precision) of a series of Dirac impulses from noisy measurements, which may be achieved through the continuous-domain minimization of the total variation of the underlying measure [8,16,22,25]. The Fisher-Jerome theorem [27, Theorem 1] as well as our extension for the unbounded domain R d and arbitrary convex sets (Theorem 7) support this kind of algorithm, as they guarantee the existence of sparse solutions-understood as a sum of Dirac spikes-for this family of problems.

Splines and Operators
We now switch to the explanatory part of the presentation. The first important concept that is implicit in the statement of Theorems 1 and 2 is the powerful association between splines and operators, the idea being that the selection of an admissible operator L specifies a corresponding type of splines [46,47] [55,Chapter 6].
Sorted by increasing complexity, the three types of operators that are of relevance to us are: (i) ordinary differential operators, which are polynomials of the derivative operator D = d dx [13,46,52]; (ii) partial differential operators such as the Laplacian ∆ (or some polynomial thereof); and (iii) fractional derivatives such as D γ or (−∆) γ 2 with γ ∈ R + whose Fourier symbols are (jω) γ and ω γ , respectively [21,51,53]. It can be shown that all linear-shiftinvariant operators of Type (i) and all elliptic operators of Type (ii) are spline-admissible in the sense of Definition 1. This is also known to be true for fractional derivatives and fractional Laplacians with γ ≥ d [21,51].
Let us mention that the issue of making sure that the null space of the operator L is finitedimensional is often nontrivial for d > 1. It is a fundamental aspect that is addressed in the L 2 theory of radial basis functions and polyharmonic splines with the definition of the appropriate native spaces [58,Chapter 10]. Here, we have chosen to bypass some of these technicalities by including a growth restriction i.e., the condition that q ∈ L ∞,n 0 (R d ) in the definition of N L . A fundamental property in that respect is that the finite-dimensional null space of a LSI operator can only include exponential polynomial components of the form x m e j ω 0 ,x , which correspond to a zero of multiplicity at least |m| + 1 of the frequency response L(ω) at ω = ω 0 (see [55, Proposition 6.1 p. 118] and [32, Section 6]).
Once it is established that L is spline-admissible, one can rely on the following unifying distributional definition of a spline.
Definition 2 (Nonuniform L-spline). A function s : R d → R of slow growth (i.e., s ∈ L ∞,n 0 (R d ) with n 0 ≥ 0) is said to be a nonuniform L-spline if where (a k ) is a sequence of weights and the Dirac impulses are located at the spline knots {x k }.
The generalized function L{s} = w δ is called the innovation of the spline because it contains the crucial information for its description: the positions {x k } of the knots and the amplitudes (a k ) of the corresponding discontinuities.
For a constructive use of Definition 2, we also need to be able to re-synthesize the spline s from its innovation. In the case of our introductory example with L = D (see Figure 1), one simply integrates w δ , which yields (6) (up to the integration constant b 1 ) owing to the property that In principle, the same inversion procedure is applicable for the generic operator L and amounts to substituting the δ distribution in (15) by the Green's function ρ L . The only delicate part is the proper handling of the "integration constants" (the part of the solution that lies in the null space of the operator), which is achieved through the specification of N 0 linear boundary conditions of the form φ n , s = 0.
We now show that the underlying functionals φ = (φ 1 , . . . , φ N 0 ) can be incorporated in the specification of an appropriate right-inverse operator L −1 φ . Our construction requires that φ first be matched to a basis of N L such as to form a biorthogonal system. We note that this is always feasible as long as the φ n are linearly independent with respect to N L . (An explicit construction is given in the proof of Theorem 2.) n=1 is a basis of N L and the vector of "boundary" functionals φ = (φ 1 , . . . , φ N 0 ) with φ n ∈ N L satisfy the biorthogonality condition φ(p n ) = e n where e n is the nth element of the canonical basis.
The interest of such a system is that any q ∈ N L has a unique representation as q = N 0 n=1 φ n , q p n with associated norm φ(q) 2 . The fundamental requirement for our formulation is the stability/continuity of the inverse by construction, we can control stability by relying on Theorem 3 whose proof is given in Appendix A.

and only if its kernel g is measurable and
ess sup This allows us to characterize the desired operator in term of its Schwartz' kernel (or gener- Theorem 4 (Stable right-inverse of L). Let (φ n , p n ) N 0 n=1 be a biorthogonal system for N L ⊂ M L (R d ) ⊂ L ∞,n 0 (R d ). Then, there exists a unique operator L −1 φ : ϕ → L −1 φ ϕ = R d g φ (·, y)ϕ(y)dy such that for all ϕ ∈ S(R d ). The kernel of this operator is with ρ L such that Lρ L = δ and q n (y) = φ n , ρ L (· − y) . Moreover, if g φ satisfies the stability condition (16) with α 0 = n 0 , then L −1 φ admits a continuous extension M(R d ) → L ∞,n 0 (R d ) with (17) and (18) remaining valid for all ϕ ∈ M(R d ).
The proof of Theorem 4 is given in Appendix B. Since the choice of the N 0 linear boundary functionals φ n is essentially arbitrary, there is flexibility in defining admissible inverse operators. The important ingredient for our formulation is the existence of such inverses with the unconditional guarantee of their stability (see Theorem 5 below).
To put this result into context, we now provide some illustrative examples. For L = D N 0 , we have that n 0 = (N 0 − 1), ρ D N 0 (x) = x n 0 + n 0 ! , and p n (x) = x n−1 (n−1)! for n = 1, . . . , N 0 , where the polynomial basis is biorthogononal to φ with φ n (x) = (−1) (n−1) δ (n−1) (x). This (canonical) choice of boundary functionals then translates into the construction of an inverse operator L −1 φ that imposes the vanishing of the function and its derivatives at the origin. By applying (19) and recognizing the binomial expansion of (x − y) n 0 , we simplify the expression of the kernel of this operator as The crucial observation here is that the function g φ (x, ·) with x ∈ R fixed is compactly supported and bounded. Moreover, g φ (x, ·) ∞ = |g φ (x, 0)| = x n 0 n 0 ! so that g φ obviously satisfies the stability bound (16) with α 0 = n 0 . By contrast, the condition fails for the conventional shift-invariant inverse ϕ → ρ D N 0 * ϕ (n 0 -fold integrator), which stresses out the non-trivial stabilizing effect of the second correction term in (19). The other important consequence of the correction is the vanishing of g φ (x, y) as y → ±∞ for any fixed x ∈ R d , contrary to its leading term (x − y) n 0 + /n 0 ! which does not decay (and even grows) as y → −∞.
The primary usage of the inverse operators of Theorem 4 is the resolution of differential equations of the form for some w ∈ M(R d ). Indeed, by invoking the properties of L −1 φ and the biorthogonality of (φ, p), we readily show that (20) admits a unique solution in M L (R d ), which is given by For the particular case of the spline innovation w δ in Definition 2, we find that which, upon substitution of the kernel given by (19), results in a form that is the same as (11) in Theorem 1 modulo some adjustment of the constants b n .

Native or Generalized Beppo-Levi Spaces
The search for the solution of our optimization problem is performed over the native space M L (R d ) defined by (9), which is the largest space over which our gTG regularization functional is well defined. The delicate aspect is that M L (R d ) is specified in terms of a semi-norm, in analogy with the definition of the classical Beppo-Levi spaces of order n ∈ N and exponent p ≥ 1, written as B p,n (R d ) = {f ∈ S (R d ) : ∂ m f Lp < ∞ for all multi-indices |m| = n} [17,34]. Hence, in 1D, the proposed definition of M D n (R) is a slight extension of B 1,n (R). In higher dimensions, it can be shown 3 that B p,2n (R d ) = {f ∈ L ∞,2n−1 (R d ) : (−∆) n f Lp < ∞}, so that there also exists a close connection between B 1,2n (R d ) and M (−∆) n (R d ).
The crucial point for our formulation is that M L (R d ) also happens to be a complete normed (or Banach) space when equipped with the proper direct-sum topology. We shall now make this structure explicit with the help of the inverse operators defined in Theorem 4. Since the principle is similar to the characterization of the Beppo-Levi spaces, we shall also refer to M L (R d ) as a generalized Beppo-Levi space.
Theorem 5 (Banach-space structure of native space). Let L be a spline-admissible operator, M L (R d ) its native space defined by (9), and (φ, p) some biorthogonal system for its null space N L . Then, the following equivalent conditions hold: , while its kernel necessarily fullfills the stability condition 2. Every f ∈ M L (R d ) admits a unique representation as Proof. As preparation, we define a subset of M L (R d ) as Since the boundary conditions φ(f ) = 0 are linear, M L,φ (R d ) is clearly a vector space. We now show that it is a Banach space when equipped with the norm · L = L{·} M . By definition, · L is a semi-norm on M L (R d ), meaning that it fulfills the properties of a norm, except for the unicity condition. To establish the latter on M L,φ (R d ), we consider f ∈ M L,φ (R d ) such that f L = 0, which is equivalent to f ∈ N L . Since f ∈ M L,φ (R d ) (by hypothesis), we have that φ(f ) = 0, from which we deduce that f = N 0 n=1 φ n , f p n = 0, as expected. This proves that M L,φ (R d ) is isometrically isomorphic to M(R d ) and, hence, a Banach space. Alternatively, one can also view M L,φ (R d ) as a concrete transcription (or representative within the equivalence class) of the abstract quotient space M L (R d )/N L .

Existence and Stability of Inverse Operators.
We have just revealed that L is a bijective, norm-preserving mapping M L,φ (R d ) → M(R d ). This allows us to invoke the bounded-inverse theorem, which ensures the existence and boundedness (here, an isometry) of the inverse operator L 3. Identification of the Underlying Norm. Any element p ∈ N L is uniquely characterized by its expansion coefficients φ(p) in the basis p. The same holds true for q = Proj N L {f } ∈ N L with φ(q) = φ(f ) for any f ∈ M L (R d ). Since M L,φ (R d ) and N L are both Banach spaces, we can equip their direct sum M L (R d ) with the composite norm f L,φ = w M + φ(f ) 2 , with the guarantee that the Banach-space property is preserved.
For the converse implication, we simply identify M L,φ (R d ) as the closed subspace of M L (R d ) with the property that f L,φ = Lf M .
The connection with the L-spline s of Definition 2 is that s ∈ M L (R d ) if and only if the 1 -norm of its spline weights a = (a 1 , . . . , a K ) is finite. Indeed, we have that Ls M = w δ M = K k=1 |a k | = a 1 , owing to the property that δ(· − x k ) M = 1. We note that the choice of gTV is essential here since the simpler (and a priori only slightly more restrictive) L 1 -norm regularization Ls L 1 would exclude the spline solutions that are of interest to us because δ / ∈ L 1 (R d ).
Our final ingredient is the identification of the predual space of M L (R d ), which is denoted by C L (R d ).
The proof is given in Appendix C. The direct-sum decomposition in Theorem 6 is achieved by means of the operator Proj N L : f → q = N 0 n=1 p n , f φ n with q = p(f ) 2 = p(q) 2 , which relies on the biorthogonality of (φ, p) to project C L (R d ) onto N L (R d ). This also means that C L,p (R d ) can be defined as

Proof of Theorems 1 and 2
Our technique of proof will be to first establish the optimality of innovation-type solutions of the form that appear in Definition 2 for general linear inverse problems defined on M(R d ) (Theorem 7) and to then transfer the result to M L (R d ) with the help of the stable inverse operators specified in Theorem 4. The first step is achieved by generalizing an earlier result by Fisher and Jerome [27].
Let H be the direct sum of M(R d ) = C 0 (R d ) and a finite-dimensional space N equipped with some norm · N . The generic element of H is f = (w, p) with f H = w M + p N .
Theorem 7 (Generalized Fisher-Jerome theorem). Let F : H → R M with M ≥ N 0 = dim(N ) be a weak*-continuous linear map such that for some constant B > 0 and every p ∈ N . Let C be a convex compact subset of R M such that U = F −1 (C) = {(w, p) ∈ H : F (w, p) ∈ C} is nonempty (feasibility hypothesis). Then, is a nonempty, convex, weak*-compact subset of H with extremal points (w δ , p) of the form with K ≤ M and x k ∈ R d for k = 1, . . . , K, and min (w,p)∈U w M = K k=1 |a k |.
Theorem 7 is the most technical component of our formulation as it involves the weak* topology. The details of the proof are laid out in Appendix D together with a precise definition of the underlying concepts.
The essence of Theorem 7 is very similar to Fisher-Jerome's original result [27, Theorem 1], except for two crucial points: (i) the fact that they are only considering measures defined over a bounded domain Ω ⊂ R d (or, by extension, on a compact metric space), and (ii) the nature of the constraints which, in their case, is limited to coordinatewise inequalities of the form z 1,m ≤ [F (w, p)] m ≤ z 2,m . These differences are substantial enough to justify a new, self-contained proof. In particular, we believe that our extension for functions defined on R d (beyond the compact Hausdorff framework of [27]) is essential for covering nonlocal operators such as fractional derivatives, and for deploying Fourier-domain/signal-processing techniques.
Our primary constraint for the validity of Theorem 7 is the existence of the lower bound (26). We now show that this property is implicit in the statement of the hypotheses of Theorem 1. Proposition 1. Let (φ n , p n ) N 0 n=1 be a biorthogonal system of N L ⊂ M L (R d ) such that q = N 0 n=1 φ n , q p n for all q ∈ N L . Then, Condition 3 in Theorem 1 is equivalent to the existence of a constant 0 < B such that While there are softer ways of establishing this equivalence, we have chosen an explicit approach that also serves as background for the proof of Theorem 2.
Proof. Any q ∈ N L has a unique expansion q = N 0 n=1 c n p n with c = φ(q) and q N L = c 2 . The property that q is uniquely determined by its measurements b = ν(q) is therefore equivalent to c also being the solution of the overdetermined system Pc = b with It is well known that such a system is solvable if and only if (P T P) is invertible and that its (least-squares) solution is given by This characterization then yields the norm estimate where σ min (P) = σ min (P T ) and σ max (P) are the minimum and maximum singular values of P, respectively. Finally, the invertibility of (P T P) is equivalent to σ 2 min (P) = λ min (P T P) > 0, while the continuity assumption on ν ensures that σ max (P) < ∞. The constant is then given by B = σ 2 min (P)/σ max (P).
Proof of Theorem 1. Let (φ, p) be a biorthogonal system for N L . Then, by Theorem 5, any function f ∈ M L (R d ) has a unique decomposition as f = L −1 φ w + p with w = Lf ∈ M(R d ) and p ∈ N L . This allows us to interpret the measurement process f → ν(f ) = ν, f as the linear map F : H → R M such that We also know from Theorem 6 that Hence, the weak*continuity of ν : M L (R d ) → R M is equivalent to the weak*-continuity of F : H → R M . The complementary lower bound is given by Proposition 1 as With this new representation, the constrained minimization problem is equivalent to the one considered in Theorem 7 with N = N L , which ensures that all extreme points of the solution set are of the form (p, w δ ) with w δ = K k=1 a k δ(· − x k ), K ≤ M , and x k ∈ R d . Upon application of the (stable) right-inverse operator, this maps into s = L −1 φ w δ +p, where p is a suitable component that is in the null space of the operator. Finally, we use the explicit kernel formula (19) and the procedure outlined at the end of Section 5 to convert this representation into (11), which removes the artificial dependence upon φ.
Proof of Theorem 2. From the proof of Proposition 1, we know that the minimal singular value of the cross-product matrix P = [p 1 · · · p M ] T with p m = ν(p m ) ∈ R N 0 is non-vanishing. The geometric implication is that span{p} M m=1 = R N 0 . Since the corresponding system is redundant, we can always identify a subset of these row vectors that forms a basis of R N 0 . Without loss of generality, we now assume that this subset is {p m } N 0 m=1 and that the corresponding submatrix P 0 = [p 1 · · · p N 0 ] T is therefore invertible. In other words, we have identified a reduced vector of measurement functionals ν 0 = (ν 1 , . . . , ν N 0 ) that is linearly independent with respect to N L . This, in turn, allows us to construct the boundary functional φ 0 = P −1 0 ν 0 that meets the biorthogonal requirement φ 0 (p n ) = P −1 0 ν 0 (p n ) = P −1 0 p n = e n . In effect, this yields a biorthogonal system with the property that N L = span{ν n } N 0 n=1 ⊂ M L (R d ). Coming back to our interpolation problem, we define y = (y 0 , y 1 ) with y 0 = (y 1 , . . . , y N 0 ) ∈ R N 0 and y 1 ∈ R M −N 0 . Due to the biorthogonality of (φ 0 , p), the unique element q 0 ∈ N L such that ν 0 (q 0 ) = P 0 φ 0 (q 0 ) = y 0 is given by The other ingredient is Theorem 5, which ensures that any f ∈ M L (R d ) has a unique decomposition as f = L −1 φ 0 w + q with q ∈ N L and w ∈ M(R d ). Now, the crucial property is that the boundary conditions φ 0 (L −1 φ 0 w) = 0 imply that ν 0 (L −1 φ 0 w) = 0 for all w ∈ M L (R d ). This allows us rewrite the solution of our generalized interpolation problem as f = L −1 φ 0 w 1 + q 0 , where The result then follows from the continuity of L −1 φ 0 and the reduced version of Theorem 7 with N = {0}.

Further Theoretical and Computational Issues
The analogy with the finite-dimensional theory of compressed sensing raises the fundamental theoretical question: Is it possible to provide conditions on the measurement operator ν such that a perfect recovery is possible for certain classes of signals; in particular, splines with a given number of knots? This is an open topic that calls for further investigation. Because the problem is formulated in the continuum, we suspect that it is much more difficult-if not impossible-to identify conditions that ensure unicity.
While the reconstruction problem in Theorem 1 is formulated in analysis form (i.e., minimization of Ls M ), the interesting outcome is that the solution (11) is given in synthesis form, with the unusual twist that the underlying dictionary {ρ L (· − τ )} τ ∈R d of basis functions is infinitedimensional and not even countable. This interpretation suggests a natural discretization which is to select a finite subset of equally-spaced functions {ρ L (· − τ n )} N n=1 with N M and to rely on linear programming for = 0, or quadratic programming for > 0, or some other convex optimization technique to numerically solve the underlying 1 -minimization problem. We have preliminary evidence that this approach is feasible. In particular, we have considered the generalized interpolation scenario covered by Theorem 2 and observed that the simplex algorithm performs well in the sense that it always returns a nonuniform L-spline with a number of knots K ≤ (M − N 0 ). The key theoretical question now is to establish the convergence of such a scheme as the sampling step gets smaller.
Since the space that is spanned by the null-space components of L and the integer shifts of ρ L is the space of cardinal L-splines (see [52] for the generic case of an ordinary differential operator), one may also consider an alternative discretization that uses the corresponding B-spline basis functions, which are much better conditioned than Green's functions. This would bring us back to a numerical problem that is very similar to (3), with the advantage of maintaining a direct control over the discretization error. In the case of a pure denoising problem, another possible option is to run the taut-string algorithm [38,44] or some appropriate variation thereof.
At any rate, we believe that the issue of the proper discretization of the reconstruction problem (12) as well as the development of adequate numerical schemes are important research topics on their own right. For the cases where the solution is not unique, Theorem 1 also suggests a new computational challenge: the design of a minimization algorithm that systematically converges to an extremal point of the problem, the best solution being the spline that exhibits the minimal number of knots K = a 0 .

A Proof of Theorem 3
Proof. First, we establish the sufficiency of the stability condition by considering the signal f (x) = G{w}(x) = R d g(x, y)w(y)dy, where w ∈ M(R d ), and by constructing the estimate which implies that In doing so, we have shown that To prove necessity, we use the property that g(x, y) = G{δ(· − y)}(x), where the shifted Dirac impulse δ(· − y) is included in M(R d ) with δ(· − y) M = 1. We then observe that, for each y ∈ R d , G{δ(· − y)} ∞,α 0 = ess sup Moreover, G being bounded, we have that which means that ess sup As we already know that the inequality holds in the other direction as well, we obtain which concludes the proof.

B Proof of Theorem 4
Proof. We first establish the properties of the operator on Schwartz' space of smooth and rapidlydecreasing signals S(R d ) to avoid any technical problem related to splitting sums and interchanging integrals. We also rely on Schwartz's kernel theorem which states the equivalence between the continuous linear operators G : S(R d ) → S (R d ) and their Schwartz kernels (or general impulse response) g ∈ S (R d × R d ), meaning that two such operators are identical if and only if their kernels are equal-in the sense of distributions.
Using the explicit representation of g φ together with L{ρ L } = δ (Dirac distribution) and L{p m } = 0 for m = 1, . . . , N 0 , we then easily show that for all ϕ ∈ S(R d ). Next, we invoke the biorthogonality property φ m , p n = δ m−n (Kronecker delta) to evaluate the inner product of (19) with φ m as which proves that the boundary conditions are satisfied.
Let us now consider another operator L −1 that is also a right inverse of L. Clearly, the result of the action of the two operators can only differ by a component that is in the null space of L so that (L −1 ϕ − L −1 φ ϕ) = q ∈ N L . Since (φ, p) forms a biorthogonal system, q is uniquely determined by φ(q), which implies that the right-inverse operator that imposes the condition φ(L −1 ϕ) = 0 is unique.
Next, we define C = sup x,y∈R d (|g φ (x, y)| (1 + x ) −n 0 ) < ∞ and extract the generic continuity bound L −1 φ ϕ ∞,n 0 ≤ C ϕ M from the the proof of Theorem 3. This allows us to extend the domain of the operator from S(R d ) to M(R d ), by the Hahn-Banach theorem. Since S(R d ) is dense in M(R d ), we can do likewise for the right-inverse property and the boundary conditions by invoking the continuity of L −1 φ and φ. Alternatively, one can establish this extension indirectly by identifying a specific Banach space M L,φ (R d ) and then by showing that it is the bijective image of M(R d ) by L −1 φ (see proof of Statement 1 in Theorem 5, which also nicely settles the issue of stability).

C Proof of Theorem 6
Proof. First, we prove that C L,p (R d ) is isometrically isomorphic to C 0 (R d ). For any ϕ ∈ C 0 (R d ), L * ϕ = 0 implies that ϕ ∈ N L * ∩ C 0 (R d ) = {0} (since the basis functions of the null space do not vanish at infinity); i.e., ϕ = 0. L * is therefore injective, and hence bijective since it is surjective C 0 (R d ) → C L,p (R d ) by definition. In particular, this implies that the adjoint L −1 * φ of the operator L −1 φ defined by (19) is the inverse of L * from C L,p (R d ) to C 0 (R d ). Therefore, , then f = L * ϕ with ϕ ∈ C 0 (R d ) and f, p = ϕ, Lp = 0 for any p ∈ N L . The unique element of N L orthogonal to N L is 0 so that the sum C L,p (R d ) ⊕ N L is direct. N L is a (finite-dimensional) Banach space for the norm p(f ) 2 , implying that C L (R d ) = C L,p (R d ) ⊕ N L is a Banach space for (24).
Next, we recall that L −1 φ is continuous and bijective from M L,φ (R d ) to M(R d ) (Theorem 5), while we have just shown that its adjoint is continuous and bijective from C L,p (R d ) to C 0 (R d ).
Knowing that C 0 (R d ) = M(R d ), this implies that C L,p (R d ) = M L,φ (R d ). Finally, we have as expected.
To establish the weighted L 1 -norm inequality, we first observe that the hypotheses f ∈ L 1,−n 0 (R d ) and p n ∈ L ∞,n 0 (R d ) imply that | f, p n | ≤ p n ∞,n 0 f L 1,−n 0 (by the Hölder inequality). Likewise, using the stability bound (21), we get The desired result is then obtained from the summation of these individual bounds.

D Proof of Theorem 7
The proof follows the same steps as the original one of Fisher and Jerome [27,Theorem 1]. Yet, it differs in the assumptions and technicalities (i.e., the consideration of the non-compact domain R d and the use of explicit bounds). We have done our best to make it self-contained.
As preparation, we recall that the weak*-topology on M(R d ) = C 0 (R d ) is the locally convex topology associated with the family of semi-norms p ϕ (w) = | w, ϕ | for ϕ ∈ C 0 (R d ).
In particular, a sequence of elements w n ∈ M(R d ) converges to 0 for the weak*-topology if and only if w n , ϕ → 0 for every ϕ ∈ C 0 (R d ). A subset of M(R d ) is said to be weak*-closed (weak*-compact, respectively) if it is closed (compact, respectively) for the weak*-topology. We shall use Propositions 2 and 3, which are consequences of the Banach-Alaoglu theorem and its variations [43, p.68].
Proposition 2. Compactness in the weak*-topology of M(R d ).
• If (w n ) is a sequence in M(R d ), bounded for the TV-norm, then we can extract a subsequence that converges in M(R d ) for the weak*-topology.
The second point of Proposition 2 is valid because the space C 0 (R d ) is separable. These properties also carry over to the Banach space H = M(R d ) ⊕ N = C 0 (R d ) ⊕ N , which is endowed with the corresponding weak*-topology: A sequence (w n , p n ) in H vanishes for the weak*-topology if and only if w n vanishes for the weak*-topology of M(R d ) and p n N → 0.
Proposition 3. Compactness in the weak*-topology of H.
• If (w n , p n ) is a sequence in H such that w n M + p n N is bounded, then we can extract a subsequence that converges in H for the weak*-topology.
Proof of Theorem 7. The proof is divided in two parts. First, we show that V is a nonempty, convex, and weak*-compact subspace of H. This allows us to specify V by means of its extremal points via the Krein-Milman theorem. Second, we show that the extremal points have the announced form. We set β = inf (w,p)∈U w M .
Part I: V is nonempty, convex, and weak*-compact Since F is weak*-continuous, it is also continuous H → R M in the topology of H. Hence, there exists a constant A > 0 such that Let us consider a sequence (w n , p n ) n∈N in U such that w n M decreases to β. In particular, w n M is bounded above by w 0 M . We set M = max x∈C x . Using respectively (26), (30), and F (w n , p n ) 2 ≤ M (since (w n , p n ) ∈ U), we deduce the inequalities which shows that p n is bounded. We can then extract a sequence (w sn , p sn ) from (w n , p n ) that converges to (w ∞ , p ∞ ) ∈ H for the weak*-topology (by Proposition 3). Since w n → β and w sn is a subsequence, it must also converge to β.
On the other hand, the set U = F −1 (C) is weak*-closed in H, as the preimage of a closed set by a weak*-continuous function F . Consequently, (w ∞ , p ∞ ) is the weak*-limit of a sequence of elements in U. We therefore deduce that (w ∞ , p ∞ ) ∈ U, so that w ∞ M ≥ β. In light of the previous inequality, this yields w ∞ M = β, which proves that V is not empty.
In addition to being weak*-closed, the set U = F −1 (C) is convex because C is convex and where the set on the right-hand side is weak*-compact, due to Proposition 3. Since any weak*closed set included in a weak*-compact set is necessarily weak*-compact, this shows that V is weak*-compact.
We are now in the position to apply the Krein-Milman theorem [43, p. 75] to the convex weak*-compact set V ⊂ H, which tells us that "V is the closed convex hull of its extreme points in H endowed with the weak*-topology". This leads us to the final part of the proof, which is the characterization of those extreme points.
Part II: The extreme points of V are of the form (27) We shall prove that a necessary condition for (w, p) to be an extreme point of V is that there are no disjoint Borelian sets E 1 , . . . , E M +1 ⊂ R d such that w(E m ) = 0 for m = 1, . . . , M + 1. The only elements of M(R d ) satisfying this condition are precisely those described by (27).
We shall proceed by contradiction and assume that there exist disjoint sets E 1 , . . . , E M +1 such that w(E m ) = 0 for all m.
We denote the restriction of w to E m as w m = w1 Em . We also define E = R d \ m E m , and w = w1 E with w M = β. For m = 1, . . . , M + 1, we set y m = F (w m , p) ∈ R M . Since any collection of (M + 1) vectors in R M is linearly dependent, there exists (c m ) 1≤m≤M +1 = 0 such that M +1 m=1 c m y m = 0. Let µ = M +1 m=1 c m w m ∈ M(R d ) and ∈ (− max , max ) with max = 1/ max m |c m |, so that (1 + c m ) > 0 and (1 − c m ) > 0 for all m. By construction, we have that F (µ, p) =