Multi-Contrast MRI Reconstruction with Structure-Guided Total Variation

Magnetic resonance imaging (MRI) is a versatile imaging technique that allows different contrasts depending on the acquisition parameters. Many clinical imaging studies acquire MRI data for more than one of these contrasts---such as for instance T1 and T2 weighted images---which makes the overall scanning procedure very time consuming. As all of these images show the same underlying anatomy one can try to omit unnecessary measurements by taking the similarity into account during reconstruction. We will discuss two modifications of total variation---based on i) location and ii) direction---that take structural a priori knowledge into account and reduce to total variation in the degenerate case when no structural knowledge is available. We solve the resulting convex minimization problem with the alternating direction method of multipliers that separates the forward operator from the prior. For both priors the corresponding proximal operator can be implemented as an extension of the fast gradient projection method on the dual problem for total variation. We tested the priors on six data sets that are based on phantoms and real MRI images. In all test cases exploiting the structural information from the other contrast yields better results than separate reconstruction with total variation in terms of standard metrics like peak signal-to-noise ratio and structural similarity index. Furthermore, we found that exploiting the two dimensional directional information results in images with well defined edges, superior to those reconstructed solely using a priori information about the edge location.


Multi-Contrast Magnetic Resonance Imaging
Magnetic resonance imaging (MRI) is a well established imaging modality with numerous applications. One of its key advantages is versatility: depending on image acquisition protocol, images with very different contrast and informational content can be acquired [36,41]. Most common are images that are weighted by the relaxation times T 1 and T 2 but many more options are available. In clinical applications, often not one but several MRI images with different contrasts are acquired during one session. As an example, the UK Biobank 1 contains for each subject MRI data for images weighted not only by T 1 and T 2 but also for images that are fluid-suppressed (FLAIR), or show susceptibility, diffusion or function. All of these data have to be acquired sequentially one at a time, which makes the whole scanning procedure rather lengthy. Therefore, shortening the acquisition time would not only reduce patient's discomfort but would increase the patient throughput leading to more efficient use of the scanning facilities.

Magnetic Resonance Imaging and Compressed Sensing
To speed up the scanning procedure, it has been proposed almost a decade ago to apply compressed sensing [11,12,16,22] to MRI [38] which is still an active research topic [8,13,14,29,31,37,39,45,48,49,53]. One of the main  Figure 1: Ground truth T 1 and T 2 images with the side information that is exploited by weighted and directional total variation. ideas of compressed sensing is to acquire fewer measurements and to solve the reconstruction problem by exploiting a priori knowledge about the solution. Initially, the a priori knowledge has been sparsity in a wavelet basis and penalizing large total variations; the latter being related to sparsity of the image gradient. Over the years many other forms of a priori knowledge have been proposed for MRI reconstruction such as higher order total variation [31], sparsity in a self-learned dictionary [45] or regularization of dynamic sequences with the nuclear norm [37,49] to name just a few. In a multi-contrast MRI scan, the images have very different information content, but as they are acquired from the same patient anatomy we know a priori that they are likely to show very similar structures [8,29]. An example of a T 1 and T 2 weighted pair of MRI images of the same subject is shown on the right in figure 1. Parallel MRI [28,33,44,50] is another example of image reconstruction problem which can benefit from exploiting common information. In [14] joint reconstruction of different coil images is performed in the framework of compressed sensing.

Contributions
In this paper we aim to exploit the expected redundancy in a series of multi-contrast MRI images by extracting information about i) the location of edges and ii) the direction of edges from one contrast to aid the reconstruction of the other. We propose two priors that enable us to incorporate a priori structural knowledge into a total variation functional. In both cases the prior is convex such that we can use algorithms from convex optimization to solve the minimization problem. A double-split allows us to apply the alternating direction method of multipliers (ADMM) where all but one update are in closed-form. An extension of the fast gradient projection method first proposed for the standard total variation in [6] is used to efficiently compute proximal operator for both priors.

Related Work
In this work we propose extensions of total variation, based on i) location and ii) direction, that can exploit structural a-priori knowledge and apply it to the multi-contrast MRI setting where structural information is available from another contrast. In this context we group the related work into the four following classes: Total Variation with Local Weighting Extensions of total variation or similar edge-preserving priors with spatially varying regularization parameter have been used before for optical tomography [2] and image denoising [26,32]. While the weights are a priori defined by side information in [2], they are estimated based on local statistics in [26,32]. In that respect this contribution improves upon [2] as our algorithm can handle a non-smooth formulation of the prior.

Total Variation and Directional Information
It has been proposed to include directional information into the total variation functional either by rotating the coordinate system and using locally the 1 -norm [7] or by scaling preferred directions and applying the 2 -norm [4,23,27,35]. The directions are globally constant and predefined in [4] and based on the image content in [7,23,27,35]. Our approach for directional information in the total variation functional shares similarities with [23,27,35]. While [23,27,35] compute the directions and scaling from the structure tensor of the current image estimate or the noisy input image, we project the gradient in the total variation functional onto a predefined vector field given by the other contrast.

One-Sided Reconstruction
Incorporating structural information by a prior has been used in other settings such as combined positron emission tomography (PET) with computed tomography or MRI [9,34,52], optical tomography [2], remote sensing [24,43] or electric impedance tomography [30] but to the best of our knowledge has not been applied to multi-contrast MRI. In addition, only [2] and [30] share similarities with our approach. In [2] the authors propose to locally adapt the weight of the prior isotropically and used a smoothed penalty function to facilitate diffusion techniques for reconstruction. On the other hand while the prior in [30] is anisotropic, as it directionally dependent, it reduces to a quadratic prior when no edge information is available. In contrast, the here proposed priors reduce to total variation in absence of additional information.

Parallel Level Sets
The directional extension of total variation is related to the idea of measuring the difference in structure of two images by means of parallel level sets. A symmetric version of the latter has been used for joint reconstruction of PET-MRI [20,21] and colour image processing [19]. We will point out the similarities and differences in more detail in §3. Moreover, in all [19][20][21] the parallel level sets functional has been smoothed and the problem has been solved using gradient based optimization. In contrast, here we consider the non-smooth convex formulation and propose a convex optimization algorithm for its solution.

Problem Setting and Notation
Our derivation is carried out in a fully discrete setting where the object of interest u ∈ [0, ∞) N ⊂ R N is sampled from a planar / volumetric MRI image. We will use this notation independently of the contrast, i.e. u might represent a T 1 or T 2 weighted image. Moreover, we follow a standard assumption for many acquisition sequences of no or negligibly small phase so that we are effectively dealing with real-valued images. An extension to complexvalued images could be done by means of a non-linear forward operator [18,25,56] but this is out of scope of the present paper. Without phase, it is natural to assume that the MRI image u is non-negative which we will incorporate into the reconstruction. With the common assumption of additive Gaussian noise [40,42] a maximum a posteriori reconstruction with the prior proportional to exp(−αJ), with functional J : R N → R to be defined later, is equivalent to the minimization problem where A : R N → C M is the MRI forward operator and b the acquired data. Throughout the paper we use |x| 2 := x * x to denote the standard norm for complex vectors with x * being the Hermitian (complex conjugate transpose) of x.

Forward Operator for Magnetic Resonance Imaging
The forward model in MRI is commonly assumed to be the Fourier transform F [36]. As we model our image to be real-valued but the Fourier transform acts on complex images, we embed the real into the complex space by means of an operator Re * : . It is not difficult to show that Re * is the adjoint of the real part restriction operator Re : C N → R N , Re(x + iy) := x when we equip the complex space with the inner product x, y C N = Re(x * y). Moreover, let π ∈ {1, . . . , N } M defining a sequence of sample locations which mimics an arbitrary MRI acquisition protocol. Then we can define a general sampling operator where for better readability we denote the mth component of the vector π by π[m]. Here, we focus on the case of practical interest, M N , where the number of measurements is much smaller than the number of unknowns. With such defined operators, the MRI forward operator for our model can be expressed as their concatenation Due to the embedding Re * and the sampling S this operator is in general not invertible. For the reconstruction method proposed in section §4.2 we need the adjoint of A which is given as with F −1 denoting the inverse Fourier transform and S * the adjoint of the sampling operator. The latter is given by with the Kronecker delta δ m,n = 1 if m = n and 0 otherwise, see e.g. [18].

Discrete Gradient
The functional J in (1) encodes the a priori information in a way such that unlikely or undesirable solutions u result in a large value J(u). For images it is common to penalize changes between neighbouring pixel values which can be expressed by the discrete gradient operator. At every location n = 1, . . . , N we define a discrete gradient ∇u n ∈ G. In the numerical simulations, we will use forward differences in two dimensions such that G = R 2 but other choices are possible, too. In general, the discrete gradient operator ∇ : R N → G N should be a linear mapping from space of images to the space of gradients. We make use of the discrete divergence operator div : G N → R N defined as the negative adjoint of the gradient, i.e. for all p ∈ G N , u ∈ R N it holds div p, u R N = p, −∇u G N . For an approximation of the gradient with forward differences the matching approximation for the divergence corresponds to backward differences [3]. Moreover, let M := Lin(G) be the space of linear mappings from G to G. Then, we define the multiplication of a matrix-field D ∈ M N with a vector-field p ∈ G N pointwise as Dp ∈ G N with (Dp) n := D n p n , a matrix-vector multiplication at the particular location.

Total Variation
A popular regularization J in a variational formulation (1) is the total variation [47] which in our discrete setting reads with the discrete gradient operator as defined in the previous section. The total variation has many desirable properties: it is convex and it leads to edge-preserved denoising. However, the standard formulation does not allow to incorporate any extra a priory knowledge about the solution.

A Priori Information on Location of Edges
While the actual intensities of two MRI contrasts are very different, their structure in terms of edges is likely to be highly correlated. To incorporate the information about the location of edges extracted from one contrast, v, into the reconstruction of the other we propose to introduce weights w n into the total variation functional.

Definition 1 (Weighted Total Variation)
Let w ∈ [0, 1] N be a vector of weights. We define the weighted total variation as Remark 2 An option for the choice of such weights is w n = η/|∇v n | η , where |x| 2 η := |x| 2 + η 2 for some parameter η > 0. This choice results in 0 < w n ≤ 1, with the upper bound attained when there is no side information, i.e. v = const, hence |∇v n | = 0 and the lower bound approached asymptotically for |∇v n | → ∞. The parameter η controls what magnitude of an edge is considered to be 'large' and what is considered to be 'small'. While in general this parameter could be a spatial map, for simplicity here we assume that it is constant over space.

Remark 3
Obviously, for the choice of uninformative weights, i.e. w n = 1 for all n = 1, . . . , N , the weighted total variation functional wTV reduces to the standard total variation (5). Furthermore, 0 ≤ w n ≤ 1 implies that for all u ∈ R N it holds 0 ≤ wTV(u) ≤ TV(u).

A Priori Information on Direction of Edges
In the weighted total variation functional (6) we made use of the location of the edges by means of weights depending on the modulus of the gradient of the side information. However, it is reasonable to assume that these images do not only share the location but also the direction of edges modulo their sign. The latter is necessary as the actual intensity values are independent of one another such that in one image their might be a jump 'up' while in the other one there is a jump 'down'.

Definition 4 (Directional Total Variation)
Let ξ ∈ G N with 0 ≤ |ξ n | ≤ 1 be a vector-field and P ξn : the directional total variation.

Remark 5
In this paper we choose ξ ∈ G N , ξ n := ∇v n /|∇v n | η which captures the 'structure' of v with more degrees of freedom than in the case of weighted total variation, cf. figure 1. As in the previous case, we will make use of an edge parameter η that is related to the size of an edge. Similar to (6), we have 0 ≤ |ξ n | < 1 with the lower bound being attained for |∇v n | = 0 and the upper bound approached as |∇v n | → ∞. In the limit |ξ n | → 1, P ξn becomes the orthogonal projection onto the orthogonal complement of ξ n . Thus, in contrast to isotropic weighting of |∇u n | in (6), in the limit (7) penalizes only the component of ∇u n that is orthogonal to ξ n resulting in an anisotropic weighting.
To be more precise, it was proven in [18] that |P ξn ∇u n | = |∇u n | 2 − ∇u n ,ξ n 2 1/2 , which shows that directional total variation is a special case of asymmetric parallel level sets with a different normalization of the side informationξ n := (2 − |ξ n | 2 ) 1/2 ξ n . From (8) it can be seen that directional total variation favours parallel level sets. Indeed, on the one hand, (8) is minimal if and only if ∇u n is parallel to (in the span of) ξ n and hence parallel to ∇v n . On the other hand, as gradients are orthogonal to level sets, parallel gradients imply parallel level sets.

General Framework
Both functionals (6) and (7) can be uniformly written as where the matrix-field D ∈ M N depends on the structural knowledge derived from the image v. In the case of weighted total variation, the matrix-field is isotropic, i.e. it is directionally independent. On the other hand, for directional total variation, the matrix field is anisotropic as it has principle directions along and orthogonal to ξ n . As ξ n was defined to be the normalized gradient field of v these directions correspond to the normal and tangential direction to the level sets of v.

Algorithmic Approach
In order to numerically solve problem (1) we will reformulate the problem such that it can be efficiently solved with the alternating direction method of multipliers (ADMM), see [1,10] and references therein. As we model MRI images to be real-valued, it is efficient to perform two splits. Similar to total variation regularization, no closed-form proximal operator for priors of the form (9) exists, thus we revert to a variant of fast gradient projection algorithm [6].

Proximal Operator with Fast Gradient Projection
Evaluation of proximal operator for structural total variation (9), entails solution of the following convex minimization problem prox αJ+χ T (y) := argmin with the non-empty, closed and convex constraint set T ⊂ R N . Although we are primarily interested in nonnegativity constraints, i.e. T = [0, ∞) N , there is no need to be too specific at this point. Analogously to the case for usual total variation [6], structural total variation can be dualized as where the supremum is taken over the unit ball in the gradient space U := {x ∈ G N : |x n | ≤ 1}. Substituting (13) into (12) and exchanging the order of the minimum and supremum as the function is convex in u and concave in p Algorithm 1 Fast Gradient Projection Method for Structure-Guided Total Variation Input: α ≥ 0 regularization parameter y ∈ R N proximal point K ∈ N number of iterations D anisotropy (default = id) P T projection onto the set T (default = id) p 0 initial dual variable (default = 0) Output: u K approximation of minimizer (primal variable) p K dual variable 1: function FGP_J(α, y, K, D, P C , p 0 ) 2: for k = 1 : K do 4: g k ← αD∇P T (y + α div D * q k−1 ) compute gradient (17) 5: Nesterov two step update 8: u k ← P T (y + α div D * p k ) calculate final primal variable 9: return (u K , p K ) (see e.g. [46], Corollary 37.3.2) we obtain = sup where the inner minimization in (14) has the solution u (p) = P T (h p ), h p := y + α div D * p. Following [6], the function under supremization in (15) can be equivalently rewritten as and its gradient with respect to p is given by A variant of the fast projected gradient algorithm (with Nesterov acceleration) for solution of (15) and hence (12) is outlined in algorithm 1, where the orthogonal projection onto U is given by  (17), see [6] for details. For both regularizers in this paper it holds D ≤ 1. Moreover, we approximate the gradient with forward and the divergence with backward differences for which we have ∇ ≤ √ 8 in 2D space [6] such that in both cases of interest s = (8α 2 ) −1 is sufficient to guarantee convergence in function values.

Double-Split Alternating Direction Method of Multipliers
Recall that we want to solve (1) for k = 1 : K do update first block 4: apply algorithm 1 5: averaging step update Lagrange multipliers 7: [10] 9: return u K with J as in (9). To fully exploit the structure of our forward operator A =F • S,F := Re * •F, we recast the problem as a constraint optimization problem with the associated augmented Lagrangian and µ ∈ C N , ν ∈ R N are the scaled Lagrange multipliers. In order to make the algorithm as efficient as possible, u and x are associated with the first and z with the second block of ADMM [1,10]. Thus in every iteration we need to solve As the first minimization problem decouples in u and x, we obtain three update steps for ADMM, the first two of which can be performed in parallel In (23) we used thatFz = Fz for real z. It should be noted that both S * S and ρI are diagonal matrices, so that the matrix inversion in (23), (S * S + ρI) −1 , reduces to a component-wise scaling and is therefore computationally efficient. The final ADMM algorithm can be found in 2. In each iteration of the algorithm, we apply once the discrete Fourier transform and its inverse as well as the proximal operator via algorithm 1. After each iteration, if the primal and dual residual are too far apart, we adjust the parameter ρ according to the guidelines in [10].

Remark 8
In vector notation, the double split can be written as (u; x) = Gz where G := (I;F) has full column rank. If the we compute the proximal operator with sufficient accuracy, i.e. the errors are absolutely summable, and ρ is constant, then algorithm 2 converges to a solution of (1) [17]. Numerically, we observe convergence for both u and ρ. parameter η r e g . p a r a m . α Figure 2: Reconstruction quality in terms of SSIM using wTV and dTV in dependence of the parameter η and the regularization parameter α for the data set BrainWebA and radial sampling. In all cases η = 1e-2 yields the best results in terms of SSIM.

Data and Algorithms
We numerically test the two extensions for total variation to incorporate structural information with six datasets that are either based on the Shepp-Logan phantom, realistically simulated MRI from BrainWeb [15] and clinical MRI images from a patient, cf. figure 1. We simulate the MRI data by sampling from the discrete Fourier transform in a variety of ways including Cartesian sampling (equidistantly and randomly undersampled), radial sampling (equidistantly spaced radial spokes, golden angle [55]) and spiral sampling (variable density and phyllotaxis [51]). In all cases we added Gaussian noise to the complex-valued MRI data with standard deviation scaled such that for fully sampled data the expected 2 -norm of the noise is 5% of the 2 -norm of the noise-free data. Both algorithms 1 and 2 have been implemented in MATLAB R2015a. The algorithms and the datasets used in this paper are available as supplementary material.

Quality Measures and Parameter Selection
We evaluate the results in terms of the peak signal-to-noise (PSNR) and the structural similarity index (SSIM) [54] both available in the image processing toolbox in MATLAB R2015a.
The regularization parameter α and the edge parameter η are chosen to maximize the SSIM between the reconstructed result and the ground truth.

Parameter Estimation
Both proposed extensions of total variation have a parameter η that relates to the magnitude of the gradients in the side information. Figure 2 shows the SSIM of the reconstructions of T 1 and T 2 weighted images from radially sampled BrainWebA data using both structure enhancing regularizers in dependence of the regularization parameter α and the edge parameter η. In all cases the best results are obtained for η = 1e-2 which corresponds to approximately 1% of the maximal gradient magnitude. For a large value of η-in this example approximately 1-both regularizers perform the same and both coincide with total variation (not shown). Similar plots were obtained for the other data sets and sampling patterns and hence will be omitted for brevity. In what follows the edge parameter η is always chosen to be 1e-2.   Figure 14: Results for patientB with random Cartesian sampling.

Visual Assessment
figures, but probably most visibly in figure 9, incorporating structural knowledge from the other contrast does visually improve the reconstruction using either wTV or dTV. When comparing wTV and dTV, one notices that while wTV results in patchy images, dTV is able to recover smooth structures accurately. Moreover, including the directional information yields another level of improvement of fine details.

Quantitative Assessment
Quantitative analysis of the results is summarized in figures 15 and 16, and table 1. Figure 15 shows the reconstruction quality for all six test cases in dependence of the regularization parameter. Whenever more than one sampling scheme was used, the solid line corresponds to the mean performance with the worst and best performance indicated by shaded lines. For all test cases, but especially for T 1 -weighted SheppLogan and BrainWeb, wTV and dTV strongly outperforms the standard total variation. Moreover, the curves are layered which means that the results are not only better for one choice of regularization parameter but for all choices shown. Figure 16 shows the performance for the optimal value of the regularization parameter for all test cases (data sets and sampling schemes). Also here, the curves are layered, meaning that in every test case dTV outperforms all the other methods. The average performance can be read out from table 1, where again dTV consistently performs best with respect to all measures. The particular differences in performance between the methods vary strongly between the data sets, chosen samplings and contrasts but on average dTV improves on total variation by about 6dB in PSNR and 8% in SSIM for both contrasts, cf. table 1.

Discussion
The largest improvements were obtained for the T 1 -weighted SheppLogan and both contrasts from BrainWeb. We attribute this to the higher level of detail in T 1 than in T 2 version of SheppLogan, which in turn results in T 1 having higher total variation than T 2 . While the quantitative results for patient do not indicate much improvement, visual inspection corroborates the increase in image quality. This might be due to the fact that the noisy reconstructed MRI images have been taken as ground truth and in this case the similarity measure do not match human perception. 5e-4 5e-3 5e-2 5e-4 5e-3 5e-2 5e-4 5e-3 5e-2 5e-4 5e-3 5e-2 5e-4 5e-3 5e-2 regularization parameter α Figure 15: Quantitative analysis of the results for the quality measures PSNR (top) and SSIM (bottom) with respect to the regularization parameter α. Including structural knowledge into the reconstruction does not only improve the reconstruction for the optimal choice of regularization parameter but it also makes it more robust as the results are consistently better for all shown choices of regularization parameter.  Figure 16: Quantitative summary of all results: While the relative performance of the methods depends strongly on the data set and the sampling scheme, incorporating structural knowledge in most cases significantly improves the results. Moreover, the best results have consistently been obtained making use of directional edge information.

Conclusions
In this paper we extended total variation to accommodate the structural a priori information available from another contrast in MRI. The structural information can be either purely on the location or on the location and direction of edges. In both cases, the prior is convex so that we can use efficient methods from convex optimization to solve the problem. The numerical results with numerous test cases show that exploiting structural information is beneficial in the reconstruction of highly undersampled MRI. Moreover, utilizing directional information yields not only better defined images but also better reconstruction of smooth structures and fine details.
In the future, we will extend the proposed framework to more than a pair of contrasts and so to exploit the structural similarity of a whole sequence of MRI images. Moreover, we intend to extend our method to joint reconstruction of multiple contrasts.