Abstract.

We consider a class of optimization problems defined over trees with unary cost terms and shifted pairwise cost terms. These problems arise when considering block coordinate descent (BCD) approaches for solving inverse problems with total generalized variation (TGV) regularizers or their nonconvex generalizations. We introduce a linear-time reduction that transforms the shifted problems into their nonshifted counterparts. However, combining existing continuous dynamic programming (DP) algorithms with the reduction does not lead to BCD iterations that compute TGV-like solutions. This problem can be overcome by considering a box-constrained modification of the subproblems or smoothing the cost terms of the TGV regularized problem. The former leads to shifted and box-constrained subproblems, for which we propose a linear-time reduction to their unconstrained counterpart. The latter naturally leads to problems with smooth unary and pairwise cost terms. With this in mind, we propose two novel continuous DP algorithms that can solve (convex and nonconvex) problems with piecewise quadratic unary and pairwise cost terms. We prove that the algorithm for the convex case has quadratic worst-case time and memory complexity, while the algorithm for the nonconvex case has exponential time and memory complexity, but works well in practice for smooth truncated total variation pairwise costs. Finally, we demonstrate the applicability of the proposed algorithms for solving inverse problems with first-order and higher-order regularizers.

Keywords

  1. inverse problems
  2. total generalized variation
  3. infimal convolution
  4. continuous dynamic programming
  5. block coordinate descent
  6. higher-order regularization

MSC codes

  1. 68U10
  2. 90C25
  3. 90C39

1. Introduction.

Consider a class of continuous optimization problems of the form
\begin{align} \min_{x} f\left(x\right) := \sum_{i \in \mathcal{V}} f_i\left(x_i\right) + \sum_{ij \in \mathcal{E}} f_{ij}\left(x_j - x_i\right) \end{align}
(P)
defined over a tree \(\left(\mathcal{V}, \mathcal{E}\right)\) with \(n := |\mathcal{V}|\) nodes and \(m := |\mathcal{E}| = n - 1\) edges, where \(x \in \mathbb{R}^n\). The terms \(f_i : \mathbb{R} \to \mathbb{R}\) and \(f_{ij}: \mathbb{R} \to \mathbb{R}\) are known as the unary and pairwise cost functions, respectively. It is assumed that \(f\) is bounded from below and that the minimum in problem (P) is attained at some point.
There are exact continuous dynamic programming (DP) algorithms that can efficiently solve (P) for many different choices of unary and pairwise cost functions, like the algorithms proposed in the “Total Variation on a Tree” paper [21] and the references therein. These algorithms can be used as exact solvers for many (convex and nonconvex) total variation (TV) regularized [25] inverse problems, and in this paper we explore how to repurpose and extend these types of continuous DP algorithms so that they can be used to solve total generalized variation (TGV) regularized [4] inverse problems.
As an exemplary TGV regularized inverse problem, consider a TGV regularized scanline denoising problem defined over a chain of the form
\begin{align} \min_{x, v_1, v_2, \dots, v_k} \frac{1}{2} \left\|x - b\right\|^2 + \lambda_1 \left\|Dx - v_1\right\|_1 + \sum_{i=2}^k \lambda_i \left\|D v_{i - 1} - v_i\right\|_1 + \lambda_{k+1} \left\|D v_k\right\|_1, \end{align}
(1.1)
where \(b \in \mathbb{R}^n\) is the noisy input signal, \(\lambda_1, \ldots, \lambda_{k+1} \in \mathbb{R}_{++} := \left\{\lambda \in \mathbb{R} \mid \lambda \gt 0\right\}\) are adjustable parameters, and \(x \in \mathbb{R}^n\) and \(v_i \in \mathbb{R}^{n - i}\) for \(i=1, \ldots, k\) are the optimization variables. Here \(D\) represents a forward difference operator of the form \(Dz := \left(z_2 - z_1, z_3 - z_2, \ldots, z_l - z_{l - 1}\right) \in \mathbb{R}^{l - 1}\), defined for vectors \(z \in \mathbb{R}^l\) of arbitrary dimension \(l \gt 1\). We assume that \(n \gt k\), \(k \geq 1\) and with a slight abuse of notation that the sum term in (1.1) is not present for the special case when \(k=1\). We say that the TGV scanline denoising problem (1.1) contains a TGV regularizer of order \(k+1\). The TGV regularizer has the desirable property that it promotes piecewise affine solutions for \(k=1\), piecewise quadratic solutions for \(k=2\), and in general piecewise polynomial solutions where each piece is a polynomial of order \(k\).
A natural observation is that for \(k \geq 3\) we could split the \(k+2\) terms in problem (1.1) into the \(k+1\) subproblems
\begin{align} & \min_{x} \frac{1}{2} \left\|x - b\right\|^2 + \lambda_1 \left\|Dx - v_1\right\|_1, \end{align}
(1.2)
\begin{align} & \min_{v_1} \lambda_1 \left\|Dx - v_1\right\|_1 + \lambda_2 \left\|Dv_1 - v_2\right\|_1, \end{align}
(1.3)
\begin{align} & \min_{v_i} \lambda_i \left\|Dv_{i - 1} - v_i\right\|_1 + \lambda_{i+1} \left\|Dv_i - v_{i+1}\right\|_1 \quad (\text{for } i=2, \ldots, k - 1), \end{align}
(1.4)
\begin{align} & \min_{v_k} \lambda_k \left\|Dv_{k - 1} - v_k\right\|_1 + \lambda_{k+1} \left\|Dv_k\right\|_1, \end{align}
(1.5)
and try to use block coordinate descent (BCD) [36] for solving (1.1). Here subproblem (1.2) is a TV denoising problem [25] with a shifted regularizer, subproblems (1.3) and (1.4) are TV-\(\ell_1\) denoising problems [236] with shifted regularizers, and subproblem (1.5) is a standard TV-\(\ell_1\) denoising problem. Note that subproblem (1.4) does not appear for the special case when \(k=2\). Note also that \(v_2=0\) and that subproblems (1.4) and (1.5) do not appear for the special case when \(k=1\).
In view of these subproblems, it is an appealing research direction to extend algorithms for (P) to the shifting setting
\begin{align} \min_{x} f\left(x, v\right) := \sum_{i \in \mathcal{V}} f_i\left(x_i\right) + \sum_{ij \in \mathcal{E}} f_{ij}\left(x_j - x_i - v_{ij}\right), \end{align}
\(({\text{P}_{\rm s}})\)
where \(v \in \mathbb{R}^{m}\) is a constant shift vector, and use them as subroutines to build BCD approaches for TGV regularized scanline or tree denoising problems of arbitrary order. More general inverse problems that contain nonconvex versions of the TGV regularizer or nonconvex data terms could be addressed similarly. Moreover, these BCD approaches can be combined with Lagrangian decomposition schemes to address more general TGV regularized inverse problems defined over images.
Unfortunately, simply extending existing continuous DP algorithms to the shifted setting and applying them to the BCD subproblems (1.2) to (1.5) does not lead to iterations that produce TGV-like solutions. This is illustrated via a simple numerical example in section SM1 of the supplementary materials. In particular, the example demonstrates that applying BCD to TGV2 scanline denoising degenerates to standard TV scanline denoising.
BCD iterations that produce TGV-like solutions (with provable guarantees) can be obtained by smoothing the TGV regularizer, which naturally leads to variants of (\(\text{P}_{\rm s}\)) with piecewise quadratic unary and pairwise cost functions. Alternatively, the earlier work [24] considered a simple box-constrained heuristic for solving TGV2 regularized inverse problems with nonconvex data terms that lead to BCD iterations that produced piecewise affine solutions. This heuristic can be combined with continuous DP algorithms by having exact solvers for subproblems of the form
\begin{align} \min_{x, v} f\left(x, v\right) \; \text{s.t.} \; \left\|v - v^{\prime }\right\|_{\infty } \leq \varepsilon, \end{align}
\(({\text{P}_{\varepsilon }})\)
where \(v^{\prime } \in \mathbb{R}^m\) and \(\varepsilon \gt 0\), which are box-constrained generalizations of (\(\text{P}_{\rm s}\)).
To the best of our knowledge, continuous DP algorithms that exactly solve problems (\(\text{P}_{\rm s}\)) and (\(\text{P}_{\varepsilon }\)) with (convex and nonconvex) piecewise quadratic unary and pairwise cost functions have not been considered before in the state of the art. Similarly, they have also not been considered for the canonical problem (P).

1.1. Contributions.

We close this gap with the following two contributions.
Reductions. First, we show the following:
(\(\text{P}_{\rm s}\)) can be reduced to (P) in \(\mathcal{O}(n)\) time and memory for arbitrary unary and pairwise cost functions. This allows us to apply the many existing algorithms for solving special cases of (P) to solve the corresponding special cases of (\(\text{P}_{\rm s}\)), while inheriting their worst-case time and memory complexities.
(\(\text{P}_{\varepsilon }\)) can be reduced to (\(\text{P}_{\rm s}\)) for piecewise quadratic unary and pairwise costs if the pairwise costs \(f_{ij}\) are monotonically decreasing functions for \(\leq 0\) and monotonically increasing functions for \(\gt 0\). This reduction can be performed in \(\mathcal{O}(n)\) additional time and memory for problems with piecewise quadratic pairwise cost functions that contain \(\mathcal{O}\left(1\right)\) pieces. In that case, it is guaranteed that the pairwise costs of the reduced problem still contain only \(\mathcal{O}\left(1\right)\) pieces while preserving the “shape” of the pairwise cost function, in the sense that piecewise linear pairwise costs stay piecewise linear and piecewise quadratic costs stay piecewise quadratic. Consequently, by combining this reduction with the previous one, we can apply the many existing algorithms for solving special cases of (P) to solve the corresponding special cases of (\(\text{P}_{\varepsilon }\)) while inheriting their worst-case time and memory complexities.
DP algorithms. Second, we propose two novel continuous DP algorithms for solving (P) where the unary and pairwise cost functions are assumed to be (convex or nonconvex) piecewise quadratic functions with \(\mathcal{O}(1)\) pieces. The proposed algorithms and their worst-case time and memory complexities are summarized in Table 1. Despite the exponential worst-case time and memory complexities of the second algorithm, it is fast in practice for problems with truncated (and potentially smoothed) TV pairwise costs.
Table 1 Proposed DP algorithms and their worst-case time and memory complexities.
Polynomial runtime of Potts message passing. As a byproduct of the analysis for the nonconvex case, we consider the message passing algorithm for the Potts model with nonconvex piecewise quadratic unaries, which can be seen as a direct generalization of Johnson’s algorithm [16], which uses \(\ell_2^2\) unaries. We show that the generalized algorithm exhibits a worst-case time and memory complexity of \(\mathcal{O}(n^4)\), which is reduced to \(\mathcal{O}(n^3)\) under a weak condition. Since \(\ell_2^2\) unaries satisfy the condition, it follows that Johnson’s algorithm runs in \(\mathcal{O}(n^3)\) time and space in the worst case, whereas Johnson only proved an \(\Omega (n^2)\) worst-case lower bound.
This is in line with other well-known results like [121828303335] that show that the Potts model can be solved in \(\mathcal{O}(n^2)\) time for \(\ell_1\) and \(\ell^2_2\) unaries, and even in \(\mathcal{O}(n)\) time on average or in the worst case under some additional assumptions. Notice, however, that our runtime analysis of Johnson’s algorithm is not necessarily tight, and that it might be the case that it matches the faster asymptotical runtimes of the aforementioned references.

1.2. Related work.

Exact solvers for various special cases of (P) have already been proposed in the literature. In the discrete setting, the most noteworthy work is the efficient belief propagation paper [10] that has popularized fast distance transforms, which can be used to efficiently solve discretized versions of (P) with arbitrary unary costs and Potts, truncated linear, or truncated quadratic pairwise costs. They require \(\mathcal{O}\left(n L\right)\) time and memory in the worst case, where \(L\) denotes the number of discrete values that the variables \(x_i\) can take. These are also widely used together with loopy belief propagation [22] or tree-based approximations [91420] to obtain very efficient approximate solvers for inverse problems defined over images (e.g., [1017193132]).
Most papers in the continuous setting focused on solving special cases of (P) defined over chains. The simplest case is \(\ell^2_2\) unaries and pairwise costs, which leads to a tridiagonal linear system that can be solved in \(\mathcal{O}\left(n\right)\) worst-case time and memory via the well-known Thomas algorithm. Alternatively, the same worst-case complexities can be achieved when using Gaussian belief propagation [134], which even allows one to easily handle anisotropic weights for the unary and pairwise costs and elegantly extend the approach to the tree setting, while inheriting the same worst-case complexities as in the chain setting.
Another popular class of problems on chains in the continuous setting is the well-known Potts model. The authors of [35] have shown that the Potts model with \(\ell^2_2\) unaries can be solved in worst-case \(\mathcal{O}\left(n^2\right)\) time and \(\mathcal{O}\left(n\right)\) memory for a fixed regularization weight, and in worst-case \(\mathcal{O}\left(n^3\right)\) time and \(\mathcal{O}\left(n^2\right)\) memory for all possible values that the regularization weight can take. The latter has been refined in [12] to worst-case \(\mathcal{O}\left(n^2 m(y)\right)\) time and \(\mathcal{O}\left(n m(y)\right)\) memory where \(1 \leq m\left(y\right) \leq n\) denotes the number of breakpoints in the piecewise affine function generated by their \(\gamma\)-shooting algorithm. Despite the worst case being \(m\left(y\right) = \mathcal{O}(n)\), the authors argued that \(m\left(y\right) \ll n\) for typical problem instances. The authors of [33] have shown that the Potts model with \(\ell_1\) unary terms can be solved in worst-case \(\mathcal{O}\left(n^2\right)\) time and \(\mathcal{O}\left(n\right)\) memory, and in \(\mathcal{O}\left(n^3\right)\) time for weighted \(\ell_1\) unaries. By placing some additional assumptions on the weights, the latter can be accelerated to \(\mathcal{O}\left(n^2\right)\) time. Some of the aforementioned approaches for the Potts model can be accelerated to linear time on average or in the worst case using appropriate pruning strategies (see [182830] for more details).
Last, TV regularized problems have also received attention in the continuous setting. For instance, the authors of [7816] proposed \(\mathcal{O}\left(n\right)\) worst-case time and memory algorithms for solving TV regularized denoising problems with isotropic regularization weights on chains. The authors of [15] proposed a \(\mathcal{O}\left(n^3\right)\) worst-case time and \(\mathcal{O}\left(n\right)\) worst-case memory algorithm for solving truncated TV regularized denoising problems with isotropic regularization weights on chains. Most of this work was generalized by the “Total Variation on a Tree” paper [21] that extended all existing prior work to anisotropic regularizers, tree graphs, and more general piecewise unary and pairwise cost functions. Their algorithms match the worst-case time and memory complexities of the aforementioned TV references, except for the algorithm given in [15] as this case was not treated in [21]. Notice also that the second to last algorithm in Table 2 for solving truncated TV problems is fast in practice despite its predicted exponential worst-case time and memory complexities.
Table 2 DP algorithms proposed in [21] and their worst-case time and memory complexities.
Our novel DP algorithms can be considered an extension of the algorithms from [21] to piecewise quadratic unary and pairwise costs. There is, however, a notable difference in the way we derive our algorithms. Namely, we directly propagate Bellman functions in our DP algorithms instead of optimality conditions. In that sense, our DP algorithms can be seen as direct continuous extensions of the fast distance transform algorithms from the efficient belief propagation paper [10]. This means that our DP algorithms can be directly combined with loopy belief propagation algorithms or tree-based approximations like semiglobal matching (SGM) [14] and more-global matching [9] to address inverse problems defined over images.
Models based on segment penalties. Another interesting complementary class of algorithms for chain problems are models based on segment penalties that can also be exactly solved via DP. These are used to solve denoising and interpolation problems and their basic idea is to divide the input signal into segments and fit a piecewise solution to each segment. The aforementioned references for Potts and TV regularized denoising models can be interpreted in this way, as those models perform the segmentation implicitly and fit piecewise constant solutions to the resulting segments. A related approach is the cubic time DP algorithm proposed in [2] for solving reconstruction problems with \(\mathcal{H}_1\)-regularized segment penalties, which produces piecewise-smooth solutions. However, the most interesting instances from this class are exact algorithms that produce piecewise polynomial solutions as they serve as an alternative to the TGV scanline denoising model given by (1.1). One instance of this is the approach in [27] that, in \(\mathcal{O}(n^2)\) time and using \(\mathcal{O}(n)\) memory, generates piecewise polynomial denoising solutions by optimizing a higher-order Mumford–Shah model, which features segmentwise regularization of higher-order derivatives. Remarkably, this approach was partly generalized in [29] to solve a joint denoising and interpolation problem with cubic smoothing splines that allow discontinuities. The extended approach again requires only \(\mathcal{O}\left(n^2\right)\) time and \(\mathcal{O}\left(n\right)\) memory.
The solvers from [2729] can be used to efficiently produce piecewise polynomial solutions for denoising problems with additive Gaussian noise and isotropic regularization weights defined over chain graphs. They are preferred for such problems to our approach for solving TGV regularized problems by combining BCD and DP. However, our approach can handle much more general unary cost terms (both convex and nonconvex) that allow us to incorporate different noise types or inverse problems like motion estimation, directly support anisotropic weighting (which is particularly important for motion estimation problems), and solve inverse problems defined over trees.

1.3. Applications.

We show some exemplary applications of our algorithms in section 6. Concretely, we consider exact solvers for convex scanline denoising problems with Huber smoothed TV pairwise costs and \(\ell_1\), \(\ell^2_2\) and Huber unary costs, exact solvers for nonconvex scanline denoising problems with smooth truncated TV pairwise costs and \(\ell^2_2\) and smoothed truncated \(\ell_1\) unary costs, (convex and nonconvex) TGV2-regularized scanline denoising, nonconvex (smoothed and truncated) TV regularized stereo via continuous SGM, and stereo with a nonconvex TGV2 regularizer.

1.4. Outline.

The rest of the paper is organized as follows. We introduce our notation and briefly recall important mathematical preliminaries in section 2. The reductions are derived in section 3. Our DP algorithm for the convex case is presented in section 4 and for the nonconvex case in section 5. Applications are shown in section 6, while concluding remarks are given in section 7.

2. Preliminaries.

We will briefly recall some important preliminaries and introduce some basic terminology and notation before proceeding with the algorithmic parts of the paper. Due to the aforementioned reductions, it suffices to consider optimization problems of the form (P) defined over a directed tree \((\mathcal{V}, \mathcal{E})\) whose edges are all oriented toward a root node \(r \in \mathcal{V}\). This tree has \(n := |\mathcal{V}|\) vertices and \(m := |\mathcal{E}| = n - 1\) edges. However, we will often consider the special case when the directed tree is actually a directed chain with \(n\) nodes \(\mathcal{V} := (1, 2, \dots, n)\) and \(m = n - 1\) edges \(\mathcal{E} = \{(i, i+1) \mid i=1, 2, \dots, n-1\}\), where the \(n\)th node represents the root node. We use the notation \(ij\) to denote a directed edge from node \(i\) to node \(j\).
Message passing. Since the underlying graph is a tree, it is well-known that a minimizer for this problem can be computed exactly via continuous DP (cf. [21, Proposition 1] for a comprehensive proof). This can be done by computing so-called messages between adjacent nodes \(i\) and \(j\) for every edge \(ij \in \mathcal{E}\). We distinguish between pairwise messages \(\varphi_{i \to j} : \mathbb{R} \to \mathbb{R}\) that are defined for every edge \(ij \in \mathcal{E}\) via the infimal convolution
\begin{align} \varphi_{i \to j}\left(x_j\right) &:= \min_{x_i} \varphi_i\left(x_i\right) + f_{ij}\left(x_j - x_i\right) \end{align}
(2.1)
and unary messages \(\varphi_j: \mathbb{R} \to \mathbb{R}\) that are defined via the summation
\begin{align} \varphi_j\left(x_j\right) := f_j\left(x_j\right) + \sum_{i : ij \in \mathcal{E}} \varphi_{i \to j}\left(x_j\right) \end{align}
(2.2)
for every node \(j \in \mathcal{V}\). The messages have to be propagated during a postorder traversal of the tree. In other words, the messages are propagated from the leaves toward the root. If the tree is a chain, then a forward pass through the nodes suffices to perform the message passing.
A minimizer of (P) can be computed by backtracking from the root node \(r\). To facilitate this process, we need to store the “back pointers” \(\beta_{ij} : \mathbb{R} \to \mathbb{R}\) defined as
\begin{align} \beta_{ij}(x_j) := \mathop{\operatorname{arg\,min}}_{x_i} \varphi_i\left(x_i\right) + f_{ij}\left(x_j - x_i\right) \end{align}
(2.3)
for every edge \(ij \in \mathcal{E}\). To compute a minimizer, we start at the root node \(r\) and find the label (i.e., value) \(x_r\) that minimizes \(\varphi_r\). If there are multiple such values, then we can pick an arbitrary one among those values. The backtracking proceeds then through the tree using a preorder traversal. More precisely, the backtracking is performed from the root node toward the leaves, and we set \(x_i := \beta_{ij}(x_j)\) for each edge \(ij \in \mathcal{E}\) that is encountered during the traversal. The backtracking simplifies to a backward pass through the nodes if the tree is a chain.
Message representation. The messages in our proposed algorithms are going to be piecewise quadratic functions. Thus, if a message contains \(N\) distinct intervals (or pieces), it will be represented by a collection of the form \(\left(\left(l_i, r_i, p_i\right)\right)_{i=1}^N\) where \(l_i \in \mathbb{R} \cup \left\{ -\infty \right\}\) represents the left boundary, (\(l_i \lt )\) \(r_i \in \mathbb{R} \cup \left\{ \infty \right\}\) represents the right boundary, and the function \(p_i: \mathbb{R} \to \mathbb{R}\) represents the message values of the \(i\)th interval. We assume that the domain of each message is \(\mathbb{R}\) and that each piece \(p_i\) is defined as \(p_i\left(z\right) := a_i z^2 + b_i z+c_i\) with \(\operatorname{dom} p_i = (l_i, r_i] \setminus \left\{ -\infty, \infty \right\}\).
Remark 2.1.
In contrast to [21], we do not propagate optimality conditions but rather function values after partial minimization (which are known in the literature as Bellman functions) as described in this section. This results in slightly higher memory usage in comparison to [21], but has the added benefit that our messages can immediately be used with practical heuristics like loopy belief propagation [10], or SGM [13]. We will show this later in section 6, when we consider concrete applications.

3. Reductions.

We will now explain the reductions of (\(\text{P}_{\rm s}\)) to (P) and (\(\text{P}_{\varepsilon }\)) to (\(\text{P}_{\rm s}\)).

3.1. Reducing (\(\mathbf{P}_{\mathbf{s}}\)) to (P).

A reduction could be trivially shown by substituting the pairwise costs \(f_{ij}\) in (\(\text{P}_{\rm s}\)) with the pairwise costs \(\bar{f}_{ij} = f_{ij}\left(\cdot - v_{ij}\right)\). However, the problem with this approach is that the existing continuous DP algorithms for solving (P) from the related work subsection either do not directly support shifted pairwise costs (hence, they would need to be extended to the shifted setting and their complexity analysis would have to be redone, which is nontrivial in general) or they are designed for a broad class of (nonconvex) cost types and hence exhibit poor practical performance compared to dedicated solvers. The former is also generally true for algorithms that have not been developed yet, as researchers usually focus on solving (P) and not (\(\text{P}_{\rm s}\)), and thus there is no reason to believe that these future algorithms will support shifted pairwise costs.
However, there is a better reduction approach that can be applied to all algorithms for solving (\(\text{P}_{\rm s}\)) that are of practical interest (including all aforementioned DP algorithms). It is based on the idea that we construct an auxiliary vector \(x^{\prime } \in \mathbb{R}^n\) such that the equality \(x^{\prime }_j - x^{\prime }_i = v_{ij}\) holds for all edges \(ij \in \mathcal{E}\). Notice that there are infinitely many such vectors and an arbitrary one can be constructed in \(\mathcal{O}\left(n\right)\) time during a preorder traversal of the tree \(\left(\mathcal{V}, \mathcal{E}\right)\). Based on this, we can introduce the substitution \(x = \bar{x} + x^{\prime }\) and reduce (\(\text{P}_{\rm s}\)) to the equivalent optimization problem
\begin{align*} \min_{\bar{x}} \sum_{i \in \mathcal{V}} f_i\left(\bar{x}_i + x^{\prime }_i\right) + \sum_{ij \in \mathcal{E}} f_{ij}\left(\bar{x}_j - \bar{x}_i\right), \end{align*}
which trivially reduces to (P) by the simple fact that all algorithms that are of practical interest support shifts in the unary cost functions. For example, whenever the data terms measure a mismatch between the model estimate and the measured data points, we can simply absorb the shift into the data points. In the case of more general unary cost functions, our goal is to shift their representations. For example, we shift piecewise-defined unary cost functions by translating each individual piece by a fixed amount.
This reduction is summarized in Algorithm 3.1 and can be performed in additional \(\mathcal{O}\left(n\right)\) time and memory, and it implies two important things. First, (\(\text{P}_{\rm s}\)) is not harder to solve than (P). Second, we can use existing solvers for solving (P) to solve the corresponding variants of (\(\text{P}_{\rm s}\)) while inheriting the same worst-case time and memory complexities of the solver for (P). This is particularly convenient in situations when we have access to implementations of existing solvers for (P), as we do not need to modify these implementations but only perform the reduction in a preprocessing and a postprocessing step. Notice that we could also further improve the worst-case memory complexity of the reduction to \(\mathcal{O}\left(1\right)\) by using in-place operations. This would not lead to any advantages in the memory complexity as we require \(\Theta \left(n\right)\) memory to describe our problem instances and solutions.
Algorithm 3.1. Reducing (\(\text{P}_{\rm s}\)) to (P).
Input: Tree \(\left(\mathcal{V}, \mathcal{E}\right)\), unary costs \(\left(f_i\right)_{i \in \mathcal{V}}\), pairwise costs \(\left(f_{ij}\right)_{ij \in \mathcal{E}}\), and shifts \(\left(v_{ij}\right)_{ij \in \mathcal{E}}\).
Output: A minimizer \(x^\star := (x^\star_1, x^\star_2, \dots, x^\star_{n})\) of (\(\text{P}_{\rm s}\)).
1:// Compute an auxiliary vector \(x^{\prime } \in \mathbb{R}^n\)
2:Initialize \(x^{\prime }_r\) to any scalar from \(\mathbb{R}\)
3:for \(ij \in \mathcal{E}_{\text{preorder}}\) do
4: \(x^{\prime }_i := x^{\prime }_j - v_{ij}\)
5:end for
6: 
7:// Compute a minimizer of (P)
8:\(\bar{x}^\star := \operatorname{arg\,min}_{\bar{x}} \sum_{i \in \mathcal{V}} f_i\left(\bar{x}_i + x^{\prime }_i\right) + \sum_{ij \in \mathcal{E}} f_{ij}\left(\bar{x}_j - \bar{x}_i\right)\)
9: 
10:// Reconstruct a minimizer of (\(\text{P}_{\rm s}\))
11:\(x^\star := \bar{x}^\star + x^{\prime }\)

3.2. Reducing (\(\mathbf{P}_{\boldsymbol{\varepsilon }}\)) to (\(\mathbf{P}_{\mathbf{s}}\)).

Notice that we can rewrite (\(\text{P}_{\varepsilon }\)) as
\begin{align*} \min_x \sum_{i \in \mathcal{V}} f_i\left(x_i\right) + \min_v \sum_{ij \in \mathcal{E}} f_{ij}\left(x_j - x_i - v_{ij}\right) \; \text{s.t.} \; \left\|v - v^{\prime }\right\|_{\infty } \leq \varepsilon \end{align*}
and that the inner optimization problem is separable for each \(v_{ij}\) and consequently decomposes into subproblems of the form
\begin{align} \min_{v_{ij}} f_{ij}\left(x_j - x_i - v_{ij}\right) \; \text{s.t.} \; \left|v_{ij} - v^{\prime }_{ij}\right| \leq \varepsilon. \end{align}
(3.1)
The following proposition characterizes the solution of (3.1) under the assumption that the pairwise cost term \(f_{ij}\) is a monotonically decreasing function for \(\leq 0\) and a monotonically increasing function for \(\gt 0\).
Proposition 3.1.
Let \(f_{ij}: \mathbb{R} \to \mathbb{R}\) be a function that is monotonically decreasing for \(\leq 0\) and monotonically increasing for \(\gt 0\). Let \(\varepsilon\) denote an arbitrary positive real scalar and let \(z\), \(v_{ij}\), and \(v^{\prime }_{ij}\) denote arbitrary real scalars. Then the following two statements are true:
a.
The equality
\begin{align} \min_{v_{ij}} f_{ij}\left(z - v_{ij}\right) \; \textit{s.t.} \; \left|v_{ij} - v^{\prime }_{ij}\right| \leq \varepsilon = \bar{f}_{ij} \left( z - v^{\prime }_{ij} \right) \end{align}
(3.2)
holds, where
\begin{align*} v^\star_{ij} = \begin{cases} v^{\prime }_{ij} - \varepsilon &\textit{for } z \leq v^{\prime }_{ij} - \varepsilon, \\ z &\textit{for } v^{\prime }_{ij} -\varepsilon \lt z \lt v^{\prime }_{ij} + \varepsilon, \\ v^{\prime }_{ij} + \varepsilon &\textit{for } z \geq v^{\prime }_{ij} + \varepsilon \end{cases} \end{align*}
is a minimizer of the left-hand side of (3.2) that leads to
\begin{align*} \bar{f}_{ij}(z - v^{\prime }_{ij}) := \begin{cases} f_{ij}(z - (v^{\prime }_{ij} - \varepsilon )) &\textit{for } z - v^{\prime }_{ij} \leq -\varepsilon, \\ f_{ij}(0) &\textit{for } -\varepsilon \lt z - v^{\prime }_{ij} \lt \varepsilon, \\ f_{ij}(z - (v^{\prime }_{ij} + \varepsilon )) &\textit{for } z - v^{\prime }_{ij} \geq \varepsilon. \end{cases} \end{align*}
b.
If \(f_{ij}\) is a convex function, then \(\bar{f}_{ij}\) is also a convex function.
Proof.
We prove the two claims separately.
a.
Let \(s, t \in \mathbb{R}\) and introduce the substitutions \(v_{ij} = s+v^{\prime }_{ij}\) and \(z = t+v^{\prime }_{ij}\). Based on these, we can replace the left-hand side of (3.2) with the equivalent optimization problem
\begin{align} \min_{s} f_{ij}\left(t - s\right) \; \text{s.t.} \; -\varepsilon \leq s \leq \varepsilon. \end{align}
(3.3)
This problem can be trivially solved as follows. Notice that a minimizer of \(f_{ij}\) is attained at 0 due to the assumed monotonicity properties on \(f_{ij}\). Furthermore, the optimization problem can be interpreted as follows. For a fixed value of \(t\) the variable \(s\) allows us to move the argument left or right from \(t\) by at most \(\varepsilon\) to improve the function value in comparison to \(f_{ij}\left(t\right)\). Positive values of \(s\) move us left of \(t\) and negative values of \(s\) move us right of \(t\). Hence, due to the monotonicity properties assumed on \(f_{ij}\), for \(t \geq 0\) a minimizer is obtained by going as left as possible while taking care that \(t - s \geq 0\) and \(s \le \varepsilon\), and for \(t \lt 0\) a minimizer is obtained by going as right as possible while taking care that \(t - s \leq 0\) and \(s \ge -\varepsilon\). Therefore, a minimizer of (3.3) is given by
\begin{align*} s^\star = \begin{cases} -\varepsilon &\text{for } t \leq -\varepsilon, \\ t &\text{for } -\varepsilon \lt t \lt \varepsilon, \\ \varepsilon &\text{for } t \geq \varepsilon, \end{cases} \end{align*}
which implies that
\begin{align*} \min_{s} f_{ij}\left(t - s\right) \; \text{s.t.} \; -\varepsilon \leq s \leq \varepsilon = \begin{cases} f_{ij}\left(t - (-\varepsilon )\right) &\text{for } t \leq -\varepsilon, \\ f_{ij}\left(0\right) &\text{for } -\varepsilon \lt t \lt \varepsilon, \\ f_{ij}\left(t - \varepsilon \right) &\text{for } t \geq \varepsilon \end{cases} \end{align*}
and consequently that the first part of the proposition holds after undoing the substitutions \(v_{ij} = s+v^{\prime }_{ij}\) and \(z = t+v^{\prime }_{ij}\) in the previous two expressions.
b.
We will assume that \(f_{ij}: \mathbb{R} \to \mathbb{R}\) is a convex function for this part of the proof. Let the function \(F_{ij}: \mathbb{R} \times \mathbb{R} \to \mathbb{R}\) be defined as \(F_{ij}\left(t, s\right) := f_{ij}\left(t - s\right)\) with domain \(\operatorname{dom} F_{ij} = \mathbb{R} \times \left[-\varepsilon, \varepsilon \right]\). Then \(F_{ij}\) is a convex function by construction as its domain is a convex set and is defined as the composition of the linear function \(t - s\) with the convex function \(f_{ij}\). Furthermore, the optimization problem \(f^\star_{ij}\left(t\right) := \min_s F_{ij}\left(t, s\right)\) is equivalent to the optimization problem (3.3). Partial minimization preserves convexity, which means that \(f^\star_{ij}\) is a convex function in \(t\). Notice that by undoing the substitution \(z = t+v^{\prime }_{ij}\) it follows that \(\bar{f}_{ij}(z - v^{\prime }_{ij}) = f^\star_{ij}(z - v^{\prime }_{ij})\), which implies that \(\bar{f}_{ij}\) is a convex function in \(z\) and consequently proves the second part of the proposition.  ▪
The proposition tells us that transforming \(f_{ij}\) into \(\bar{f}_{ij}\) means that we cut \(f_{ij}\) into two pieces at 0, translate the left piece of the function to the left by \(\varepsilon\), translate the right piece of the function to the right by \(\varepsilon\), and insert a horizontal piece of height \(f_{ij}\left(0\right)\) into the interval \([-\varepsilon, \varepsilon ]\). This is an operation that can be trivially carried out for arbitrary pairwise cost functions \(f_{ij}\). The key problem, however, is that we might end up with modified pairwise cost functions \(\bar{f}_{ij}\) for which we do not have an exact solver for (\(\text{P}_{\rm s}\)), even though we have an exact solver for the original pairwise cost functions \(f_{ij}\). Fortunately, this is not an issue when the pairwise cost functions are piecewise quadratic functions. In that case, it trivially follows that the transformation preserves the “shape” of the pairwise cost, in the sense that piecewise linear pairwise costs are transformed into new piecewise linear pairwise costs and that piecewise quadratic pairwise costs are transformed into new piecewise quadratic pairwise costs.
Based on this, we can perform the reduction from (\(\text{P}_{\varepsilon }\)) to (\(\text{P}_{\rm s}\)) for piecewise quadratic pairwise costs as summarized in Algorithm 3.2. The algorithm just goes over all pairwise costs \(f_{ij}\) of problem (\(\text{P}_{\varepsilon }\)) and transforms them into new pairwise costs in accordance to the previous proposition. If the number of pieces in all pairwise cost terms \(f_{ij}\) is \(\mathcal{O}\left(1\right)\), then the reduction requires only additional \(\mathcal{O}\left(n\right)\) time and memory in the worst case. Consequently, solving (\(\text{P}_{\varepsilon }\)) is no harder than solving (\(\text{P}_{\rm s}\)). And if we use a solver for (\(\text{P}_{\rm s}\)) to solve a corresponding problem (\(\text{P}_{\varepsilon }\)), then we inherit the same worst-case time and memory complexities of the solver. The same two conclusions hold also between (\(\text{P}_{\varepsilon }\)) and (P) by combining this reduction with the first one. Notice that we could again further improve the worst-case memory complexity of the reduction to \(\mathcal{O}\left(1\right)\) by using in-place operations, which again would not lead to any advantages in the memory complexity as we require \(\Theta \left(n\right)\) memory to describe our problem instances and solutions.
Algorithm 3.2. Reducing (\(\text{P}_{\varepsilon }\)) to (\(\text{P}_{\rm s}\)) for piecewise quadratic pairwise costs.
Input: Tree \(\left(\mathcal{V}, \mathcal{E}\right)\), unary costs \(\left(f_i\right)_{i \in \mathcal{V}}\), pairwise costs \(\left(f_{ij}\right)_{ij \in \mathcal{E}}\), reference shifts \((v^{\prime }_{ij})_{ij \in \mathcal{E}}\), and parameter \(\varepsilon \gt 0\). 
Output: A minimizer \(x^\star := (x^\star_1, x^\star_2, \dots, x^\star_{n})\) and \(v^\star := (v_{ij}^\star \text{ for } ij \in \mathcal{E})\) of (\(\text{P}_{\varepsilon }\)). 
1:// Transform each pairwise cost \(f_{ij}\) into \(\bar{f}_{ij}\) and compute a \(v\) minimizer of (\(\text{P}_{\varepsilon }\)) 
2:for \(ij \in \mathcal{E}\) do 
3: \(\bar{f}_{ij} := \emptyset\) 
4: for \(\left(l, r, p\right) \in f_{ij}\) do 
5:  if \(r \lt 0\) then\(\triangleright\) Transform left part of \(f_{ij}\)
6:   Insert \(\left(l - \varepsilon, r - \varepsilon, p\left(\cdot + \varepsilon \right)\right)\) into \(\bar{f}_{ij}\) 
7:  else if \(l \gt 0\) then\(\triangleright\) Transform right part of \(f_{ij}\)
8:   Insert \(\left(l + \varepsilon, r + \varepsilon, p\left(\cdot - \varepsilon \right)\right)\) into \(\bar{f}_{ij}\) 
9:  else\(\triangleright\) Transform the piece of \(f_{ij}\) that contains 0
10:   Insert \((l - \varepsilon, -\varepsilon, p\left(\cdot + \varepsilon \right))\) into \(\bar{f}_{ij}\) if \(l \neq 0\) 
11:   Insert \((-\varepsilon, \varepsilon, f_{ij}(0))\) into \(\bar{f}_{ij}\) if \(l \ne 0\) 
12:   Insert \((\varepsilon, r + \varepsilon, p\left(\cdot - \varepsilon \right))\) into \(\bar{f}_{ij}\) if \(r \neq 0\) 
13:  end if 
14: end for 
15:end for 
16: 
17:// Compute a minimizer of (\(\text{P}_{\rm s}\)), which is also an \(x\) minimizer of (\(\text{P}_{\varepsilon }\)) 
18:\(x^\star := \operatorname{arg\,min}_{x} \sum_{i \in \mathcal{V}} f_i\left(x_i\right) + \sum_{ij \in \mathcal{E}} \bar{f}_{ij}(x_j - x_i - v^{\prime }_{ij})\) 
19: 
20:// Compute a \(v\) minimizer of (\(\text{P}_{\varepsilon }\)) 
21:for \(ij \in \mathcal{E}\) do 
22: \(v_{ij}^\star := \text{proj}_{[v^{\prime }_{ij} - \varepsilon, v^{\prime }_{ij} + \varepsilon ]}(x^\star_j - x^\star_i)\) 
23:end for 

4. Convex case.

We will assume in this section that the unary cost functions \(f_{i}\) and pairwise cost functions \(f_{ij}\) are convex piecewise quadratic functions with \(\mathcal{O}\left(1\right)\) pieces.
The key observation for developing the algorithm is that the messages in this setting are also convex and piecewise quadratic functions. This is formally stated and proved in the following lemma.
Lemma 4.1.
Suppose that we perform message passing. Let the pairwise messages \(\varphi_{i \to j} : \mathbb{R} \to \mathbb{R}\) be defined by (2.1) and the unary messages \(\varphi_j: \mathbb{R} \to \mathbb{R}\) be defined by (2.2). Suppose that all unary cost terms \(f_i\) are convex piecewise quadratic functions, that all pairwise cost terms \(f_{ij}\) are convex piecewise quadratic functions, and that a minimizer for all pairwise messages \(\varphi_{i \to j}\) in the infimal convolution (2.1) exists for any value of \(x_j\). Then the messages \(\varphi_{i \to j}\) and \(\varphi_j\) will also be convex piecewise quadratic functions during the message passing. Furthermore, \(\varphi_{i \to j}\) is continuously differentiable if the pairwise cost terms \(f_{ij}\) are continuously differentiable.
Proof.
The convexity of the messages \(\varphi_{i \to j}\) and \(\varphi_j\) directly follows from the assumption that the cost functions \(f_i\) and \(f_{ij}\) are convex, utilizing the property that convexity is preserved under addition and partial minimization. The second part of the lemma claims that the node and edge messages are piecewise quadratic functions. We will prove this by induction. For the base case, we have \(\varphi_{l} := f_{l}\) for all leaf nodes, whereby the unary cost terms \(f_l\) are assumed to be piecewise quadratic.
In the inductive step, we assume that \(\varphi_i\) is piecewise quadratic. The edge messages \(\varphi_{i \to j}\left(x_j\right)\) are defined by the infimal convolution (2.1), which has the stationarity condition \(\partial f_{ij}\left(x_j - x_i\right) \cap \partial \varphi_{i}\left(x_i\right) \ne \emptyset\), which can be separated into three cases. For the first case, we only consider values of \(x_j\) that lead to a minimizer \(x_i\), which lies in the differentiable part of \(\varphi_i\) and the quadratic part of \(f_{ij}\). In that case, the optimality condition simplifies to \(\varphi^{\prime }_{i}\left(x_i\right) = f^{\prime }_{ij}\left(x_j - x_i\right)\), from which we can extract a linear relationship of the form \(m x_j + q=x_i\). By substituting this equality into the infimal convolution (2.1), we have \(\varphi_{i \to j}\left(x_j\right) = \varphi_i(m x_j + q) + f_{ij}\left(\left(1 - m\right) x_j - q\right)\).
For the second case, we consider all contact points \(z_t\) of \(\varphi_i\) where \(\varphi_i\) is nondifferentiable. Since \(x_i = z_t\), then for all points \(x_j\), such that \(f^{\prime }_{ij}\left(x_j - z_t\right) \in \partial \varphi_{i}\left(z_t\right)\), it holds true that \(\varphi_{i \to j}\left(x_j\right) = \varphi_i\left(z_t\right) + f_{ij}\left(x_j - z_t\right)\). Similarly, for the third case, we consider all contact points \(z_k\) where \(f_{ij}\) is nondifferentiable. Since \(x_j - x_i = z_k\), then for all points \(x_j\) such that \(\partial f_{ij}\left(z_k\right) \ni \varphi^{\prime }_{i}\left(x_j - z_k\right)\), we conclude that \(\varphi_{i \to j}\left(x_j\right) = \varphi_i\left(x_j - z_k\right) + f_{ij}\left(z_k\right)\). The case where both \(f_{ij}\) and \(\varphi_{i \to j}\) are nondifferentiable can be ignored, since the interaction between a contact point and an appropriate segment limit point yields the same result.
The three cases are exhaustive, so \(\varphi_{i \to j}\) is equal to a quadratic function at any \(x_j\). By the convexity of \(\varphi_i\) and \(f_{ij}\), the three scenarios alternate between one another finitely many times, and each interaction defines a disjoint interval paired with a unique quadratic function. Therefore, the message \(\varphi_{i \to j}\) is a piecewise quadratic function defined over these intervals.
To establish the same result for \(\varphi_i\), note that the sum of piecewise quadratic functions is again a piecewise quadratic function. Using the definition of the node infimal convolution (2.2) immediately yields the desired result. This establishes the inductive invariant, which proves the second part of the lemma.
In the third part, we assume that the pairwise cost functions \(f_{ij}\) are all continuously differentiable. For the sake of contradiction, assume \(\varphi_{i \to j}\) is not continuously differentiable. This implies that there is a nondifferentiable contact point \(x_j \in \mathbb{R}\), at which \(\varphi^{\prime }_{i \to j}(x_j^-) \lt \varphi^{\prime }_{i \to j}(x_j^+)\), with \(x_j^{\pm } := x_j \pm 0_+\). Let \(\hat{x}_i(x_j) := \operatorname{arg\,min}_{x_i} \varphi_i\left(x_i\right) + f_{ij}\left(x_j - x_i\right)\). Clearly \(\hat{x}_i(x_j^-) \ne \hat{x}_i(x_j^+)\), since \(f_{ij}\) is smooth, so the nondifferentiable contact point would not exist otherwise.
Since \(\varphi_{i \to j}(x_j)\) is the lower envelope over all quadratic functions \(\varphi (x_i) + f_{ij}(x_j - x_i)\), it holds that \(\varphi_i(\hat{x}_i(x_j^-)) + f_{ij}(x_j - \hat{x}_i(x_j^-)) \ge \varphi_{i \to j}(x_j)\) for any \(x_j \in \mathbb{R}\). Combining this inequality with the continuity of \(\varphi_{i \to j}\) and \(f_{ij}\) (implied by convexity), it follows that the entry angle of \(f_{ij}\) at \(x_j\) is lower than the entry angle of \(\varphi_i\), or in other words \(f_{ij}^{\prime }(x_j - \hat{x}_i(x_j^-)) \le \varphi^{\prime }_{i \to j}(x_j^-)\). From \(\varphi^{\prime }_{i \to j}(x_j^-) \lt \varphi^{\prime }_{i \to j}(x_j^+)\) and the continuity of \(f_{ij}^{\prime }\) it then follows that, inside a sufficiently small interval \((x_j, x_j + \varepsilon )\), we have \(f_{ij}^{\prime }(x - \hat{x}_i(x_j^-)) \lt \varphi^{\prime }_{i \to j}(x)\) and consequently that
\begin{align*} \varphi_{i \to j}(x_j) + \int_{x_j}^{x_j + \varepsilon } f_{ij}^{\prime }(x - \hat{x}_i(x_j^-)) \,dx &\lt \varphi_{i \to j}(x_j) + \int_{x_j}^{x_j + \varepsilon } \varphi^{\prime }_{i \to j}(x) \,dx \\ \Leftrightarrow \varphi_{i \to j}(x_j) + \int_{x_j}^{x_j + \varepsilon } f_{ij}^{\prime }(x - \hat{x}_i(x_j^-)) \,dx &\lt \varphi_{i \to j}(x_j + \varepsilon ). \end{align*}
By the fundamental theorem of calculus, the definition of \(\varphi_{i \to j}\), and the continuity of \(\varphi_{i \to j}\), the left-hand side of the inequality can be rewritten as
\begin{align*} & \varphi_{i \to j}(x_j) + \int_{x_j}^{x_j + \varepsilon } f_{ij}^{\prime }(x - \hat{x}_i(x_j^-)) \, dx \\ &\quad = \varphi_{i \to j}(x_j^-) + \left(f_{ij}(x_j + \varepsilon - \hat{x}_i(x_j^-)) - f_{ij}(x_j - \hat{x}_i(x_j^-))\right) \\ &\quad = \varphi_i(\hat{x}_i(x_j^-)) + f_{ij}(x_j - \hat{x}_i(x_j^-)) + \left(f_{ij}(x_j + \varepsilon - \hat{x}_i(x_j^-)) - f_{ij}(x_j - \hat{x}_i(x_j^-))\right) \\ &\quad =\varphi_i(\hat{x}_i(x_j^-)) + f_{ij}(x_j + \varepsilon - \hat{x}_i(x_j^-)) \lt \varphi_{i \to j}(x_j + \varepsilon ). \end{align*}
We see that \(\hat{x}_i(x_j^-)\) is a better backtracking choice for \(x_j + \varepsilon\) than \(\hat{x}_i(x_j + \varepsilon )\), which contradicts the definition of \(\hat{x}_i(x_j + \varepsilon )\). Hence \(\varphi_{i \to j}\) must be continuously differentiable, which concludes the proof.  ▪
Remark 4.2.
Note that an almost identical proof can be constructed to show that the messages stay (nonconvex) piecewise quadratic functions in the nonconvex case under the assumption that a minimizer for every infimal convolution exists during the message passing.
Based on Lemma 4.1, we can introduce Algorithm 4.1, which combines the three cases for stationarity with message passing. The efficiency of the algorithm stems from the order in which the stationarity conditions are considered, whereby we exploit the convexity of the node messages \(\varphi_i\) and pairwise cost terms \(f_{ij}\). Namely, as the subgradient \(\partial \varphi_i\) increases in value, the subgradient \(\partial f_{ij}\) must decrease in value to satisfy the stationarity condition. This results in a two-sided pointer walk algorithm, where increases in the first pointer are compensated with decreases in the second one, which we can compute in \(\mathcal{O}(1)\) amortized time.
Algorithm 4.1. DP for convex piecewise quadratic + convex piecewise quadratic.
Input: Tree \(\left(\mathcal{V}, \mathcal{E}\right)\), unary costs \(\left(f_i\right)_{i \in \mathcal{V}}\), and pairwise costs \(\left(f_{ij}\right)_{ij \in \mathcal{E}}\).
Output: A minimizer \(x^\star := (x^\star_1, x^\star_2, \dots, x^\star_{n})\).
1:// Message passing
2:Initialize unary messages as \(\varphi_i := f_i\) for all nodes \(i \in \mathcal{V}\)
3:for \(ij \in \mathcal{E}_{\text{postorder}}\) do
4: \(\varphi_{i \to j} := \emptyset\), \(\beta_{ij} := \emptyset\)
5: \(R_0 := 0\)
6: 
7: for \(\vartheta_k \in f_{ij}\) in reverse order do
8:  \(L_k := R_{k - 1}\)
9:  if \(\vartheta_k = (l_k, r_k, p_k)\) then
10:   Find largest \(R_k\) such that \(\partial \varphi_i\left(\left[L_k, R_k - 0_+\right]\right) \subseteq p_k^{\prime }\left(\left[l_k, r_k\right]\right)\)
11:   Gather the (partial) segments and nondifferentiable breakpoints of \(\varphi_i\) that lie
12:   in \(\left[L_k, R_k\right]\) into \(\varphi_i\mid{[L_k, R_k]}\) but exclude linear segments if \(p_k\) is linear
13: 
14:   for \(\xi_t \in \varphi_i\mid \left[L_k, R_k\right]\) do
15:    if \(\xi_t = (l_t, r_t, \pi_t)\) then
16:     Compute \(\beta (x_j) := m_{ij} x_j + q_{ij} = x_i\) from condition \(\pi^{\prime }_t\left(x_i\right) = p^{\prime }_k\left(x_j - x_i\right)\)
17:     \(p\left(x_j\right) := p_k\left(\left(1 - m_{ij}\right)x_j - q_{ij}\right) + \pi_t\left(m_{ij}x_j + q_{ij}\right)\)
18:     \(\left[l, r\right] := \beta^{-1}\left(\left[l_t, r_t\right]\right)\)
19:     Insert \(\left(l, r, p\right)\) into \(\varphi_{i \to j}\) and \(\left(l, r, \beta \right)\) into \(\beta_{ij}\)
20:    else //\(\xi_t = (z_t, \varphi_i(z_t))\)
21:     Find the largest \([L, R] \subseteq [l_k, r_k]\) such that \(p_k^{\prime }([L, R]) \subseteq \partial \varphi_i(z_t)\)
22:     \(\left[l, r\right] := \left[z_t + L, z_t + R\right]\)
23:     Insert \(\left(l, r, \varphi_i\left(z_t\right) + f_{ij}\left(x_j - z_t\right)\right)\) into \(\varphi_{i \to j}\) and \(\left(l, r, z_t\right)\) into \(\beta_{ij}\)
24:    end if
25:   end for
26:  else //\(\vartheta_k = (z_k, f_{ij}(z_k))\)
27:   Find largest \(R_k\) such that \(\partial \varphi_i([L_k, R_k - 0_+]) \subseteq \partial f_{ij}(z_k)\)
28:   Gather the (partial) segments of \(\varphi_i\) that belong to the interval \(\left[L_k, R_k\right]\) into
29:   \(\varphi_i\mid{[L_k, R_k]}\)
30: 
31:   for \((l_t, r_t, \pi_t) \in \varphi_i\mid{[L_k, R_k]}\) do
32:    \([l, r] := [x_k + l_t, x_k + r_t]\)
33:     Insert \((l, r, \varphi_i(x_j - z_k) + f_{ij}(z_k))\) into \(\varphi_{i \to j}\) and \((l, r, x_j - z_k)\) into \(\beta_{ij}\)
34:   end for
35:  end if
36: end for
37: 
38: if \(\varphi_{i \to j}\) for all \(i : ij \in \mathcal{E}\) have been computed then
39:  Let \(k\) correspond to largest subtree among the child nodes \(i : ij \in \mathcal{E}\)
40:  \(\varphi_j := \varphi_j + \varphi_{k \to j} + \sum_{i : ij \in \mathcal{E}, i \ne k}\varphi_{i\to j}\)
41: end if
42:end for
43: 
44:// Backtracking
45:\(x^\star_r := \operatorname{arg\,min}_x \varphi_r\left(x\right)\)
46:for \(ij \in \mathcal{E}_{\text{preorder}}\) do
47: Find the segment \(\left(l, r, \beta \right) \in \beta_{ij}\) that contains \(x^\star_j\)
48: \(x^\star_i := \beta (x^\star_j)\)
49:end for
Remark 4.3.
We use the notation \(\partial h\left(\left[L, R\right]\right)\) to denote the set \(\cup_{z \in \left[L, R\right]} \partial h\left(z\right)\), which is defined for any convex function \(h: \mathbb{R} \to \mathbb{R}\), and any scalars \(L, R \in \left[-\infty, \infty \right]\). We simplify the notation accordingly to \(h^{\prime }\left(\left[L, R\right]\right)\) whenever \(h\) is a continuously differentiable function. Furthermore, we implicitly assume in all expressions that the set \(\partial h \left(\left[L, R\right]\right)\) is ordered in the same way as the set \(\left[L, R\right]\).
Worst-case time and memory complexities for Algorithm 4.1 are proven below.
Lemma 4.4.
The worst-case time and memory complexities of Algorithm 4.1 are \(\mathcal{O}\left(n^2\right)\).
Proof.
Given that the pairwise costs \(f_{ij}\) consist of at most \(\mathcal{O}\left(1\right)\) intervals, it is evident that each edge message pass (2.1) does work linear in the number of intervals of the corresponding node message \(\varphi_i\). The same is true for the backtracking iterations, which scan over each interval once. We will later show that the node message pass (2.2), if implemented properly, has amortized time complexity \(\mathcal{O}(n + \log^2 n)\). Consequently, the total time complexity is \(\mathcal{O}(n \mathcal{I} + n^2)\), where \(\mathcal{I}\) is the maximum number of intervals among the messages. On the memory end, things are even more straightforward. We maintain two backtracking coefficients per interval for each edge message, resulting in a total memory complexity of \(\mathcal{O}\left(n \mathcal{I}\right)\). The piecewise quadratic representations of the node and edge messages are not needed for the backtracking, and can therefore be discarded once the corresponding nodes or edges have been processed. As such, they are irrelevant for an asymptotic analysis of the memory usage. In conclusion, to prove the theorem, it suffices to show that \(\mathcal{I} = \mathcal{O}\left(n\right)\).
To facilitate the proof, we introduce a helper quantity \(\mathcal{P}\), which denotes the maximum number of intervals and/or nondifferentiable contact points among the messages. Clearly \(\mathcal{I} \le \mathcal{P}\), so \(\mathcal{P} = \mathcal{O}(n) \implies \mathcal{I} = \mathcal{O}(n)\). Note that the only operations which can modify the number of intervals and/or nondifferentiable contact points are the infimal convolution (2.1), as well as the unary cost addition and node message aggregation (2.2).
Infimal convolution. The infimal convolution (2.1) is considered for two cases: one for interior intervals of \(f_{ij}\) and one for nondifferentiable contact points of \(f_{ij}\). In either case, the stationarity condition yields a restriction of the message function \(\varphi_i \mid \left[L_k, R_k\right]\) corresponding to the range of subgradients over the interval or in the contact point. The domains of these restrictions are disjoint, and they can overlap with the intervals of \(\varphi_i\) either fully or partially, where a partial overlap occurs when the endpoint of the restricted domain lies between interval bounds. Similarly to a partial interval overlap, the subgradient of a contact point in \(\varphi_i\) may not correspond in its entirety to a single interval in \(f_{ij}\), and may be shared between various intervals. Like before, interactions between contact points in \(f_{ij}\) and contact points in \(\varphi_i\) are ignored, as they only generate single points in \(\varphi_{i\to j}\) which can be inferred by continuity.
The partial overlaps, whether for intervals or contact points in \(\varphi_i\), can only occur at the endpoints of the restricted domains, so there can be at most two partial overlaps per restriction. Given that \(f_{ij}\) has a constant number of intervals and contact points, the number of partial overlaps is also constant.
It is not hard to see that, by a slight generalization of the result in (4.1), the quadratic intervals of \(f_{ij}\) yield continuously differentiable parts of \(\varphi_{i \to j}\). Furthermore, when infimally convolving with intervals of \(f_{ij}\), the intervals and nondifferentiable contact points of \(\varphi_i \mid \left[L_k, R_k\right]\) are mapped into the same number of intervals in \(\varphi_{i \to j}\). On the other hand, the nondifferentiable contact points of \(f_{ij}\) simply shift and offset parts of \(\varphi_i\), leaving the number of intervals and nondifferentiable contact points in the said part unchanged in \(\varphi_{i \to j}\). In either case, a single interval or nondifferentiable contact point is spawned for all intervals and nondifferentiable contact points that overlap with \(\varphi_i \mid \left[L_k, R_k\right]\). Consequently, full overlaps do not affect the number of intervals and/or nondifferentiable contact points.
The only way for the number of intervals to increase in \(\varphi_i\) is through the interaction of \(f_{ij}\) with partial overlaps. Intervals in \(f_{ij}\) spawn a single interval for each partially overlapped interval or nondifferentiable contact point in \(\varphi_i\). Conversely, nondifferentiable contact points in \(f_{ij}\) spawn a single interval for each partially overlapped interval in \(\varphi_i\), keeping in mind that interactions between contact points are trivial and hence ignored. In conclusion, the number of surplus intervals in \(\varphi_{i \to j}\) spawned by each interval or nondifferentiable contact point of \(\varphi_{i}\) is one less than the number of nontrivial partial overlaps in said interval or nondifferentiable contact point. Since the total number of partial overlaps is constant, then the total number of intervals is increased by at most \(\mathcal{O}(1)\).
Similarly, the only way for the number of nondifferentiable contact points to increase is for new ones to appear at the stitching points of the parts of \(\varphi_{i \to j}\), with each part spawned by the interaction of some interval or nondifferentiable contact point in \(f_{ij}\) with \(\varphi_i\). The number of stiching points is one less than the number of parts, which is \(\mathcal{O}(1)\) by assumption, so the number of nondifferentiable contact points is increased by at most \(\mathcal{O}(1)\).
Unary cost addition. The unary cost consists of \(\mathcal{O}\left(1\right)\) intervals by assumption, so adding it to the edge message function in (2.2) increases the number of intervals and/or nondifferentiable contact points by at most \(\mathcal{O}(1)\).
Node message aggregation. Finally, we consider the step of adding up the edge message costs in (2.2). The maximum possible number of intervals and/or nondifferentiable contact points \(\mathcal{P}(j)\) at node \(j \in \mathcal{V}\) satisfies the recurrence relation \(\mathcal{P}(j) = \sum_{i : ij \in \mathcal{E}} \{\mathcal{P}(i) + \mathcal{O}\left(1\right)\} + \mathcal{O}(1)\), whereby the constant terms within the sum come from the infimal convolution step, and the constant term outside the sum comes from the unary cost addition step. The base case is \(\mathcal{P}(i) = \mathcal{O}(1)\) for all leaf nodes \(i \in \mathcal{V}\). Solving this reccurence yields the bound \(\mathcal{P}(j) = \mathcal{O}(|\mathcal{T}_{j}|) = \mathcal{O}\left(n\right)\), where \(\mathcal{T}_j\) denotes the subtree rooted at node \(j\). Likewise, for all edges \(ij \in \mathcal{E}\) we have \(\mathcal{P}(i, j) = \mathcal{P}(i) + \mathcal{O}(1) = \mathcal{O}(n)\). The linear upper bound holds for every node and edge in the tree, so the result \(\mathcal{P} = \mathcal{O}(n) \implies \mathcal{I} = \mathcal{O}(n)\) immediately follows.
Efficient node message aggregation. To finalize the proof, we need to show that the node message aggregation \(\sum_{i : ij \in \mathcal{E}} \varphi_{i \to j}\) from (2.2) can be computed in \(\mathcal{O}(n + \log^2 n)\) amortized time. To motivate the need for an efficient implementation, consider the naive variant. The naive algorithm sorts the intervals of the edge messages \(\varphi_{i \to j}\) and then performs a single pass though the intervals. The need to perform a comparison-based sort at each node raises the time complexity of the entire algorithm to \(O(n^2 \log n)\), which is undesirable.
In order to make the aggregation step efficient, we employ the following strategy: When computing the sum \(\sum_{i : ij \in \mathcal{E}} \varphi_{i \to j}\), start with the child \(k\) with the largest subtree \(\mathcal{T}_{k}\). To add the rest, we first sort all of the intervals in the remaining edge messages (\(k \ne i : ij \in \mathcal{E}\)) in increasing order of the left interval endpoints. The sum can then be computed using a simple pointer walk scheme, akin to the merge step in merge sort. The time complexity of this procedure is \(\mathcal{O}(|\mathcal{T}_{j}| + N_j \log N_j)\), where \(N_j := |\mathcal{T}_j| - |\mathcal{T}_{k}|\). At first glance, this does not appear to have a substantial impact on the runtime.
However, note that for all \(i \ne k\), the number of nodes in \(\mathcal{T}_j\) is at least two times more than in \(\mathcal{T}_i\), or more formally \(|\mathcal{T}_k| \ge |\mathcal{T}_i| \implies |\mathcal{T}_j| \ge 2 |\mathcal{T}_i|\). Keeping in mind that the number of nodes in the entire tree is \(n\), we conclude that on the path from any node \(i \in \mathcal{V}\) toward the root, the number of times an edge message flows from a nonlargest subtree is at most \(\log_2 n\).
The next step is to construct an injective map from intervals in \(\varphi_i\) to intervals in \(\varphi_{i \to j}\). We have already discussed that the infimal convolution (2.1) spawns a single interval for each fully overlapped interval in \(\varphi_i\), and two or more intervals for each partially overlapped interval. With this, we are able to construct a simple injective mapping, where fully overlapped intervals are mapped into their resulting intervals in \(\varphi_{i \to j}\), and where intervals with partial overlaps are mapped into the leftmost interval in \(\varphi_{i \to j}\) among those that result from the partially overlapped parts. The rest of the intervals in \(\varphi_{i \to j}\), resulting from the other partial overlaps, are considered as newly spawned intervals.
For the node message pass (2.2) we construct a surjective map instead. To ignore newly spawned intervals, we consider the function \(\bar{\varphi }_j = \varphi_j - f_j\). Note that the number of intervals in \(\bar{\varphi }_j\) is at most equal to the total number of intervals over all ingoing edge messages \(\varphi_{i \to j}\). To see this, note that the contact points of \(\bar{\varphi }_j\) result from the union of the contact points in the edge messages \(\varphi_{i \to j}\). Based on this remark, we consider the union of intervals over all edge messages \(\varphi_{i \to j}\), and scan through them in any order we desire, mapping each entry to intervals in \(\bar{\varphi }_j\), which are also arbitrarily ordered. Although it is arbitrary, a natural way to define the orderings is to sort the intervals in increasing order of left endpoints, breaking the comparison ties using right endpoints.
Even though the map we defined does not have any useful meaning, it is nonetheless a valid surjective map, since the argument set is at least as large as the codomain. Unfortunately, defining a more meaningful map for an interval-based message representation is markedly more difficult. Fortunately, the proof only requires a surjective mapping and is agnostic in regard to its definition.
By augmenting the domain and codomain of the injective and surjective maps, we can make both bijective. For the injective map, we simply remove the newly spawned intervals from the codomain. For the surjective mapping, we exclude the intervals in the argument set whose indices in the chosen order exceed the size of the codomain.
With the two bijective mappings, we may define equivalence classes of intervals, which strech from the first occurence of each interval, along the path toward the root, up until the root is reached or the equivalence chain is stopped. The equivalence chain stops if the interval is excluded in the domain of the node message mapping. In the edge message pass, all of the existing equivalence classes are extended, and \(\mathcal{O}(1)\) new equivalence classes are started. In the node message pass, a subset of equivalence classes is extended, and another \(\mathcal{O}(1)\) new equivalence classes are started. Hence, the number of equivalence classes in the subtree rooted at \(j \in \mathcal{V}\) satisfies the recurrence \(\mathcal{C}(j) = \sum_{i : ij \in \mathcal{E}} \{\mathcal{C}(i) + \mathcal{O}\left(1\right)\} + \mathcal{O}(1)\), with the base case being \(\mathcal{C}(j) = \mathcal{O}(1)\). Solving the recurrence yields \(\mathcal{C}(r) = \mathcal{O}(n)\) where \(r \in \mathcal{V}\) denotes the root node. Hence, there are at most \(\mathcal{O}(n)\) equivalence classes in the whole tree.
Finally, since each equivalence class lives on some path toward the root, the number of times an equivalence class is a part of the nonlargest subtree is at most \(\log_2 n\). Hence, the number of sorting operations the class participates in is also at most \(\log_2 n\). Since every interval is part of some equivalence class, it follows that the total number of intervals over all sorting operations is \(\mathcal{O}(n \log n)\). The upper bound time complexity of this procedure is thus bounded by the maximum of the optimization problem
\begin{align*} \max_{N} \mathcal{T}(N) := \sum_i \mathcal{O}(N_i \log N_i) \text{ s.t. } \sum_i N_i = \mathcal{O}(n \log n). \end{align*}
By definition of the big O notation, there exists an \(L \in \mathbb{R}_{++}\) such that \(\mathcal{T}(N) \le L \sum_i N_i \log N_i\) and \(\sum_i N_i \le L n \log n\). The optimization problem becomes
\begin{align*} \max_{N} \mathcal{T}(N) &:= L \sum_i N_i \log N_i \text{ s.t. } \sum_i N_i \le Ln \log n\\ \Longleftrightarrow \max_{N} \mathcal{T}(N) &:= L \sum_i N_i \log N_i \text{ s.t. } \sum_i N_i = Ln \log n, \end{align*}
where the inequality constraint was tightened to equality due to monotonicity of the \(N_i \log N_i\) terms. Let \(M := L n \log n\). We introduce the variable substitution \(\alpha_i = \frac{N_i}{M}\), which yields
\begin{align*} \max_{\alpha } \mathcal{T}(\alpha ) &:= LM \sum_i \alpha_i \log (\alpha_i M) \text{ s.t. } \sum_i \alpha_i=1\\ \Longleftrightarrow \max_{\alpha } \mathcal{T}(\alpha ) &:= LM \log M+LM \sum_i \alpha_i \log (\alpha_i) \text{ s.t. } \sum_i \alpha_i=1. \end{align*}
By the concavity of the \(\log\) function and Jensen’s inequality, we have \(\log (1) \ge \sum_i \alpha_i \log (\alpha_i)\), so the maximizer is obtained when \(\alpha\) is a unit basis vector, and the resulting criterion value is \(\mathcal{T}(\alpha^\star ) = LM \log M=L^2 n \log n \log (L n \log n) = L^2 n \log n (\log n + \log \log n + \mathcal{O}(1)) = \mathcal{O}(n \log^2 n)\). Since there are \(n\) sorting operations, their amortized time complexity is \(\mathcal{O}(\log^2 n)\). Hence, the node update can be computed in \(\mathcal{O}(n + \log^2 n)\) time, where the \(\mathcal{O}(n)\) contribution comes from the merge step.  ▪

5. Nonconvex case.

We will assume in this section that the unary cost terms \(f_{i}\) and the pairwise cost terms \(f_{ij}\) in (\(\text{P}_{\rm s}\)) are arbitrary piecewise quadratic functions with \(\mathcal{O}\left(1\right)\) pieces. Based on this, we will introduce an algorithm that generalizes Algorithm 4.1 from section 4 to the nonconvex setting.
This general DP algorithm is summarized in Algorithm 5.1. The algorithm follows the same derivative matching idea between the functions \(\varphi_i\) and \(f_{ij}\) like Algorithm 4.1, but it requires more work due to the nonconvex nature of both functions. There are essentially only three cases of interest, which are all covered by the three double loops in Algorithm 5.1. The first double loop matches differentiable points of \(\varphi_i\) with differentiable points of \(f_{ij}\). The second double loop matches the nondifferentiable and convex segment contact points of \(\varphi_i\) with differentiable points of \(f_{ij}\). The third double loop matches the nondifferentiable and convex segment contact points of \(f_{ij}\) with differentiable points of \(\varphi_i\). Notice that we do not need to consider nondifferentiable and concave segment contact points of either \(\varphi_i\) or \(f_{ij}\) because a minimizer for those cases can always be replaced by a minimizer that lies in one of the two neighboring smooth regions that define the contact point. Similarly, we do not need to separately consider the case when nondifferentiable and convex contact points of \(\varphi_i\) match with nondifferentiable and convex contact points of \(f_{ij}\) because the latter two double loops already cover this case. Finally, note that whenever a contact point is discontinuous, we cannot apply a standard stationarity condition. Instead, we compute a proposal \(\varphi_i + f_{ij}(x_j - x_i)\) with full domain where either \(x_i\) or \(x_j - x_i\) is held constant.
Algorithm 5.1. DP for piecewise quadratic + piecewise quadratic.
Input: Tree \(\left(\mathcal{V}, \mathcal{E}\right)\), unary costs \(\left(f_i\right)_{i \in \mathcal{V}}\), and pairwise costs \((f_{ij})_{ij \in \mathcal{E}}\).
Output: A minimizer \(x^\star := (x^\star_1, x^\star_2, \dots, x^\star_{n})\).
1:// Message passing
2:Initialize unary messages as \(\varphi_i := f_i\) for all nodes \(i \in \mathcal{V}\)
3:for \(ij \in \mathcal{E}_{\text{postorder}}\) do
4: \(S := \emptyset\)
5: for \(\left(l_k, r_k, p_k\right) \in f_{ij}\) do
6:  for \(\left(l_t, r_t, \pi_t\right) \in \varphi_i\) do
7:   if \(\varphi_t\) and \(p_k\) are both linear functions then continue
8:   Compute \(\beta (x_j) := m_{ij} x_j + q_{ij} = x_i\) from stat. condition \(\pi^{\prime }_t\left(x_i\right) = p^{\prime }_k\left(x_j - x_i\right)\)
9:   \(p(x_j) := \pi_t\left(m_{ij} x_j + q_{ij}\right) + p_k\left(\left(1 - m_{ij}\right) x_j - q_{ij}\right)\)
10:   Compute constraint: \(x_j - x_i \in \left[l_k, r_k\right] \Leftrightarrow l_k \leq \left(1 - m_{ij}\right) x_j - q_{ij} \leq r_k\)
11:   Compute constraint: \(x_i \in \left[l_t, r_t\right] \Leftrightarrow l_t \leq m_{ij} x_j + q_{ij} \leq r_t\)
12:   Intersect the constraints: \(l \leq x_j \leq r\)
13:   if \(l \leq r\) then insert \(\left(l, r, p, \beta \right)\) into \(S\)
14:  end for
15: end for
16: 
17: Gather the nondifferentiable (and convex) and discontinuous contact points of \(\varphi_i\) into \(\mathcal{K}_i\)
18: for \(z_t \in \mathcal{K}_i\) do
19:  for \(\left(l_k, r_k, p_k\right) \in f_{ij}\) do
20:   if \(\varphi_i\) is continuous at \(z_t\) then
21:    Find the largest interval \(\left[L, R\right] \subseteq \left[ l_k, r_k \right]\) such that \(p_k^{\prime }\left(\left[L, R\right]\right) \subseteq \partial \varphi_i\left(z_t\right)\)
22:   else
23:    \([L, R] := [-\infty, \infty ]\)
24:   end if
25:   if \(\left[L, R\right] \neq \emptyset\) then insert \(\left(z_t + L, z_t + R, \varphi_i\left(z_t\right) + p_k\left(x_j - z_t\right), z_t\right)\) into \(S\)
26:  end for
27: end for
28: 
29: Gather the nondifferentiable (and convex) and discontinuous contact points of \(f_{ij}\) into \(\mathcal{K}_{ij}\)
30: for \(z_k \in \mathcal{K}_{ij}\) do
31:  for \(\left(l_t, r_t, \pi_t\right) \in \varphi_i\) do
32:   if \(f_{ij}\) is continuous at \(z_k\) then
33:    Find the largest interval \(\left[L, R\right] \subseteq \left[ l_t, r_t \right]\) such that \(\pi_t^{\prime }\left(\left[L, R\right]\right) \subseteq \partial f_{ij}\left(z_k\right)\)
34:   else
35:    \([L, R] := [-\infty, \infty ]\)
36:   end if
37:   if \(\left[L, R\right] \neq \emptyset\) then insert \((z_k + L, z_k + R, \pi_t(x_j - z_k) + f_{ij}(z_k), x_j - z_k)\) into \(S\)
38:  end for
39: end for
40: 
41: \(\varphi_{i \to j} := \text{lower envelope of all pieces in } S\)
42: \(\varphi_j := \varphi_j + \varphi_{i\to j}\)
43:end for
44: 
45:// Backtracking
46:\(x^\star_r := \operatorname{arg\,min}_x \varphi_r\left(x\right)\)
47:for \(ij \in \mathcal{E}_{\text{preorder}}\) do
48: Find the segment \(\left(l, r, p, \beta \right) \in \varphi_{i \to j}\) that contains \(x^\star_j\)
49: \(x^\star_i := \beta (x^\star_j)\)
50:end for
Similar to the DP algorithm for solving truncated TV problems in [21], Algorithm 5.1 also exhibits exponential worst-case time and memory complexities but works well in practice for pairwise costs that are truncated, and possibly smoothed TV terms. We conjecture that this is partly due to the truncation keeping the number of pieces in each message tractable. To argue this more formally, consider a scanline label estimation problem of the form
\begin{align} \min_{x} \sum_{i \in \mathcal{V}} f_i(x_i) + P \left\|Dx\right\|_0 \end{align}
(5.1)
that contains a continuous version of the Potts prior and (possibly nonconvex) piecewise quadratic unary cost terms \(f_i\) consisting of \(\mathcal{O}(1)\) pieces. This problem can be interpreted as a limit case of a truncated TV label estimation problem, because each term of the truncated TV regularizer is of the form \(\min \left\{w \cdot |z|, P\right\}\), and if we let \(w \to \infty\), then we obtain the continuous version of the Potts prior. Informally speaking, this is the most aggressive truncation strategy that we could employ for truncated TV label estimation. Similar to its widely popular discrete counterpart, the infimal convolution for this problem can be computed as
\begin{align} \varphi_{i\to i+1}\left(x_i\right) := \min \left\{\varphi_i\left(x_i\right), P + \min_z \varphi_i\left(z\right)\right\}. \end{align}
(5.2)
We prove in Corollary 5.1 that the worst-case time complexity of the message passing algorithm is \(\mathcal{O}(n^4)\) for the general case. However, if we assume that the set of all interval endpoints over the unary cost terms \(f_i\) is of constant size, the worst-case time complexity goes down to \(\mathcal{O}(n^3)\). This assumption is quite lax, as most practical instances of \(f_i\) either are simple functions with consistent breakpoints or are obtained by interpolating a discrete label cost volume, in which case all \(f_i\) share the same set of interval endpoints.
Polynomial runtime of Johnson’s algorithm. A continuous message passing algorithm for solving (5.1) with quadratic unary cost terms was introduced in [16], where the author established an \(\Omega \left(n^2\right)\) lower bound on the worst-case time complexity for the algorithm and left the question open if the algorithm runs in polynomial time. Given that quadratic unary terms clearly satisfy the assumptions of Corollary 5.1, including the additional assumption, we confirm that the algorithm has worst-case time complexity \(\mathcal{O}(n^3)\).
Corollary 5.1.
Suppose that the unary cost terms \(f_i\) are (possibly nonconvex) piecewise quadratic functions consisting of \(\mathcal{O}(1)\) pieces. Let \(\chi\) denote the set of all interval endpoints over the unary cost terms \(f_i\), i.e., \(\chi := \cup_{i \in \mathcal{V}} \cup_{(l, r, p) \in f_i} \{l, r\}\). Then, the message passing algorithm based on the infimal convolution (5.2) for solving problem (5.1) has a worst-case time and memory complexity of
\(\mathcal{O}(n^4)\) in the general case,
\(\mathcal{O}(n^3)\) under the assumption \(|\chi | = \mathcal{O}(1)\).
Proof.
First of all, note that each unary cost term consists of \(\mathcal{O}(1)\) pieces, so \(|\chi | = \mathcal{O}(n)\) follows by definition. Consider the interval endpoints in \(\chi\) in ascending order, and let \(\chi_i, \chi_{i+1}\) denote a pair of neighboring values. Let us collect all neighboring pairs to form intervals \(b_i := (\chi_i, \chi_{i+1}]\), and let \(\mathcal{B}\) be the set \(\cup_i b_i = \cup_i (\chi_i, \chi_{i+1}]\). From this point on, we will refer to the intervals in \(\mathcal{B}\) as buckets. Clearly, each bucket \(b_i \in \mathcal{B}\) is atomic with respect to the intervals in \(f_i\). In other words, the intervals in \(f_i\) can all be formed by a union of appropriate buckets. In general, from \(|\chi | = \mathcal{O}(n)\) it follows by construction that \(|\mathcal{B}| = \mathcal{O}(n)\). Similarly, if the assumption \(|\chi | = \mathcal{O}(1)\) holds, then \(|\mathcal{B}| = \mathcal{O}(1)\) also holds. To establish the runtime bound, we analyze the worst-case computational cost for each individual bucket. Multiplying this by the total number of buckets gives us a bound on the total runtime of the algorithm.
Observe that the infimal convolution (5.2) truncates all function values above a certain threshold. By the coercivity of the unary cost terms \(f_i\), and hence the coercivity of the node messages, the truncation creates at least two cut-off regions for any message and may remove some (possibly zero) existing message pieces. The cut-off regions are constant pieces with value \(P + \min_z \varphi_i\left(z\right)\), so all truncated pieces share the same quadratic coefficients.
To simplify the ensuing analysis, we partition the intervals of the node message \(\varphi_i\) into buckets. If an interval overlaps with the bucket partially, a new endpoint is placed so that the overlapped part lies within the bucket. Since we are only modifying the representation of the message, and the underlying function remains the same, the infimal convolution will give the same result. Also, the computational cost can only go up with an increase in the number of intervals, which is irrelevant given that we are proving an upper bound.
Consider what happens to the set of truncated pieces within a bucket, after adding a new unary cost term in the node message pass (2.2). Since the buckets are atomic, the unary cost on the bucket is just a quadratic function, so all truncated pieces will still share the same quadratic coefficients after the unary cost has been added. This conclusion also holds for pieces that have previously shared the same set of quadratic coefficients.
Let us consider the number of distinct quadratic coefficients in a bucket over the iterations. In the beginning we start with a single interval per bucket, so the number of distinct coefficients is one. In each infimal convolution (5.2), the truncation creates a single new distinct set of quadratic coefficients, and the old sets of coefficients are either kept or deleted. Hence, after iteration \(k\) there are at most \(k+1\) unique sets of quadratic coefficients per bucket.
If we consider the full domain quadratic spawned by a unique set of coefficients, the infimal convolution would yield at most two truncated pieces. Out of the quadratics pieces within a bucket that correspond to this unique set of coefficients, at most two can be partially truncated. The rest are either fully truncated or not truncated at all. The partially truncated pieces are split into a truncated and a nontruncated part. This increases the number of intervals in the bucket by at most two per unique set of quadratic coefficients—either one partially truncated piece is split into three or two partially truncated pieces are split into two. Thus, the number of intervals in the bucket can increase by at most \(2k\) in iteration \(k\). Consequently, the maximum possible number of intervals over all buckets after iteration \(k\) satisfies the recurrence relation
\begin{align*} \mathcal{I}(k) = \mathcal{I}(k - 1) + \sum_{b \in \mathcal{B}} \mathcal{O}(k) = \mathcal{I}(k - 1) + \mathcal{O}(k |\mathcal{B}|) \end{align*}
with base case \(\mathcal{I}(1) = \mathcal{O}(1)\). Solving this recurrence yields \(\mathcal{I}(k) = \mathcal{O}(k^2 |\mathcal{B}|)\). The forward and backward passes both take \(\mathcal{O}(\mathcal{I}(k)) = \mathcal{O}(k^2 |\mathcal{B}|)\) time to process node \(k\), so the total number of operations \(\mathcal{T}\left(n\right)\) taken by the algorithm is
\begin{align*} \mathcal{T}\left(n\right) &= \sum_{k=1}^{n} \mathcal{O}(\mathcal{I}(k)) = \sum_{k=1}^{n} \mathcal{O}(k^2 |\mathcal{B}|) = \mathcal{O}(n^3 |\mathcal{B}|). \end{align*}
In the general case we have \(\mathcal{B} = \mathcal{O}(n)\), which establishes a worst-case time complexity of \(\mathcal{O}(n^4)\). If we combine this with the bonus assumption \(|\chi | = \mathcal{O}(1)\), then \(\mathcal{B} = \mathcal{O}(1)\) and we acquire an \(\mathcal{O}(n^3)\) worst-case time complexity. For the purpose of reconstructing the solution via backtracking, we need to know for each edge message interval whether or not it resulted from truncation. This means that we utilize \(\sum_k \mathcal{O}(\mathcal{I}(k))\) memory in total, which is \(\mathcal{O}(n^4)\) in the general case, and \(\mathcal{O}(n^3)\) when paired with the bonus assumption. This concludes the proof.  ▪
Informally speaking, this result suggests that the more aggressively we truncate, the easier the truncated TV label estimation problem becomes. Moreover, while the established upper bound might be loose, it nonetheless proves that the algorithm from [16] runs in polynomial time.

6. Applications.

Within this subsection we will demonstrate a few example applications for the proposed algorithms. We focus mainly on denoising and stereo applications, but similar approaches can be designed for other inverse problems. All experiments were run on a 16 core (32 threads) AMD Ryzen 9 5950X desktop processor with 64 GB of DDR4 memory, and all image experiments have been parallelized to run on 32 CPU threads.

6.1. Scanline runtime experiments.

Our DP algorithms allow us to exactly solve some special cases of (P) for which, to the best of our knowledge, no exact solvers have been presented before. For instance, Algorithm 4.1 can be used to exactly solve convex scanline denoising problems with Huber smoothed TV pairwise costs and \(\ell_1\), \(\ell^2_2\) and Huber unary costs. Recall that the Huber function \(\left| \cdot \right|_\varepsilon : \mathbb{R} \to \mathbb{R}\) with parameter \(\varepsilon \gt 0\) is defined by
\begin{align*} \left| t \right|_\varepsilon := \begin{cases} \frac{1}{2 \varepsilon } \cdot t^2 & \text{for } |t| \leq \varepsilon, \\ |t| - \frac{1}{2} \varepsilon &\text{otherwise}. \end{cases} \end{align*}
Similarly, Algorithm 5.1 can be used to exactly solve nonconvex scanline denoising problems with smooth truncated TV pairwise costs and \(\ell^2_2\) and smooth truncated \(\ell_1\) unary costs. Both the smooth truncated TV pairwise costs and smooth truncated \(\ell_1\) unary costs are realized via the smooth truncated absolute value function \(|\cdot |_{\varepsilon }^C\) defined by
\begin{align*} |t|_{\varepsilon }^C := \begin{cases} C& \text{for } t \le -C - \varepsilon, \\ C - \frac{1}{2\varepsilon } (t+C + \varepsilon )^2& \text{for } -C - \varepsilon \lt t \le -C,\\ -t - \frac{1}{2} \varepsilon& \text{for } -C \lt t \le -\varepsilon, \\ \frac{1}{2 \varepsilon } t^2& \text{for } -\varepsilon \lt t \le 0,\\ |-t|_{\varepsilon }^C& \text{otherwise} \end{cases} \end{align*}
with parameters \(\varepsilon \gt 0\) and \(C \gt 0\). Figure 1 illustrates how the smooth truncated absolute value function is obtained by smoothing the nondifferentiable cusps of a truncated absolute value function.
Figure 1. Smoothed truncated absolute value function.
The runtime versus problem size for the aforementioned scanline problems are shown in Figure 2. To construct the experiment, we have extracted scanlines from a piecewise affine test image and stitched the scanlines together so that we can easily vary the problem size. Furthermore, we have fixed the weight of the pairwise costs for all problems. Notice that the runtimes for all problems scale approximately linearly with problem size despite the predicted worst-case quadratic and exponential runtimes of Algorithms 4.1 and 5.1, respectively. Furthermore, notice that the convex problems can be solved in below 1 s runtime for problem sizes of up to \(4 \cdot 10^6\) and that the nonconvex problems can be solved in below 1 s runtime for problem sizes of up to \(10^5\).
Figure 2. Runtime versus problem size for the considered scanline denoising problems.
Remark 6.1.
We have also tried the nonconvex problems on various test signals and different weight parameter settings and have still observed the same linear runtime scaling.

6.2. Convex and nonconvex TGV2 scanline denoising.

As discussed in section 1, we can rely on a smooth approximation or use the coupling heuristic to construct converging BCD approaches that make use of our proposed continuous DP algorithms to solve TGV2-regularized scanline denoising problems. However, it turns out that in this particular case, we can even design two highly efficient primal-dual algorithms that rely on continuous DP and that severely outperform the standard primal-dual algorithm that is typically used for solving this problem. We will briefly discuss those two approaches first, before we turn to the smoothing and coupling experiments.
The standard way for solving TGV2 scanline denoising problems of the form
\begin{align} \min_{x, v} \frac{1}{2} \left\|x - b\right\|^2 + \lambda_1 \left\|Dx - v\right\|_1 + \lambda_2 \left\|Dv\right\|_1 \end{align}
(6.1)
is to dualize the second and third terms and obtain the equivalent saddle-point problem
\begin{align*} \max_{y, z} \min_{x, v} \frac{1}{2} \left\|x - b\right\|^2 + \left\langle Dx - v, y\right\rangle + \left\langle Dv, z \right\rangle \text{ s.t. } \left\|y\right\|_{\infty } \le \lambda_1 \text{ and } \left\|z\right\|_{\infty } \le \lambda_2 \end{align*}
that can be solved via the primal-dual iterations (cf. [5]) of the form
\begin{align} \begin{cases} x^{k+1} := \left(x^k - \tau D^T y^k + \tau b\right)/ (1 + \tau ), \\ v^{k+1} := v^k - \tau \left(D^T z^k - y^k\right), \\ \bar{x}^{k+1} := x^{k+1} + \theta \left(x^{k+1} - x^k\right), \\ \bar{v}^{k+1} := v^{k+1} + \theta \left(v^{k+1} - v^k\right), \\ y^{k+1} := \text{proj}_{\left\|\cdot \right\|_\infty \leq \lambda_1}\left(y^k + \sigma \left(D \bar{x}^{k+1} - \bar{v}^{k+1}\right)\right), \\ z^{k+1} := \text{proj}_{\left\|\cdot \right\|_\infty \leq \lambda_2}\left(z^k + \sigma D \bar{v}^{k+1}\right), \end{cases} \end{align}
(6.2)
where \(\tau, \sigma, \theta\) are positive step size parameters satisfying \(\tau \sigma \lt 1/ 2.562^2\) and \(\theta \in (0, 1]\). These iterations come with a theoretically optimal worst-case convergence rate of \(\mathcal{O}(1/ k)\), where \(k\) in this context denotes the number of primal-dual iterations that are performed. Note also that the per iteration time and memory complexity of this approach is \(\mathcal{O}(n)\).
A faster primal-dual algorithm for solving (6.1) can be obtained by only dualizing the second term, which leads to the equivalent saddle-point problem
\begin{align} \max_{y} \min_{x, v} \frac{1}{2} \left\|x - b\right\|^2 + \left\langle Dx - v, y\right\rangle + \lambda_2 \left\|Dv\right\|_1 \text{ s.t. } \left\|y\right\|_{\infty } \le \lambda_1 \end{align}
(6.3)
that can be solved by the primal-dual iterations
\begin{align} \begin{cases} x^{k+1} := \left(x^k - \tau D^T y^k + \tau b\right)/ (1 + \tau ), \\ v^{k+1} := \text{prox}_{\tau \left(\lambda_2 \left\|D \cdot \right\|_1\right)}\left(v^k + \tau y^k\right), \\ \bar{x}^{k+1} := x^{k+1} + \theta \left(x^{k+1} - x^k\right), \\ \bar{v}^{k+1} := v^{k+1} + \theta \left(v^{k+1} - v^k\right), \\ y^{k+1} := \text{proj}_{\left\|\cdot \right\|_\infty \leq \lambda_1}\left(y^k + \sigma \left(D \bar{x}^{k+1} - \bar{v}^{k+1}\right)\right), \end{cases} \end{align}
(6.4)
where \(\tau, \sigma, \theta\) are positive step size parameters satisfying \(\tau \sigma \lt 1/ 5\) and \(\theta \in (0, 1]\). The key observation here is that the proximal operator in \(v\) reduces to a standard TV scanline denoising problem that can be solved directly in linear time and memory by some of the algorithms proposed in [21] and the references therein. Consequently, the per iteration time and memory complexity of this approach is again \(\mathcal{O}(n)\).
A considerably faster primal-dual algorithm for solving (6.1) can be obtained by observing that the saddle-point problem (6.3) is strictly convex in \(x\), which allows us to explicitly minimize over \(x\) to obtain the equivalent saddle-point problem
\begin{align*} \max_{y} \min_{v} -\frac{1}{2} \left\|D^T y\right\|^2 + \left\langle b, D^T y \right\rangle - \left\langle v, y\right\rangle + \lambda_2 \left\|Dv\right\|_1 \text{ s.t. } \left\|y\right\|_{\infty } \le \lambda_1 \end{align*}
that can be solved by the primal-dual iterations
\begin{align} \begin{cases} v^{k+1} := \text{prox}_{\tau \left(\lambda_2 \left\|D \cdot \right\|_1\right)}\left(v^k + \tau y^k\right), \\ y^{k+1} := \text{prox}_{\sigma \left(\frac{1}{2} \left\|D^T \cdot \right\|^2 - \left\langle b, D^T \cdot \right\rangle + \delta_{\left\|\cdot \right\|_\infty \leq \lambda_1}\right)}\left(y^k - \sigma \left(v^{k+1} + \theta \left(v^{k+1} - v^k\right)\right)\right), \end{cases} \end{align}
(6.5)
where \(\tau, \sigma, \theta\) are positive step size parameters satisfying \(\tau \sigma \lt 1\) and \(\theta \in (0, 1]\). Here \(\delta_{\left\|\cdot \right\|_\infty \leq \lambda_1}\) denotes the indicator function for the box constraint \(\left\|\cdot \right\|_\infty \leq \lambda_1\) defined as
\begin{align*} \delta_{\left\|\cdot \right\|_\infty \leq \lambda_1}\left(y\right) := \begin{cases} 0 &\text{for } \left\|y\right\|_\infty \leq \lambda_1, \\ \infty &\text{otherwise}. \end{cases} \end{align*}
The proximal operator in \(v\) here is again a standard TV denoising problem that can be directly evaluated in linear time and memory, and the obvious question at this point is how to evaluate the proximal operator in \(y\) efficiently. It can be shown that (see section SM3 of the supplementary materials for details)
\begin{align*} \text{prox}_{\sigma \left(\frac{1}{2} \left\|D^T \cdot \right\|^2 - \left\langle b, D^T \cdot \right\rangle + \delta_{\left\|\cdot \right\|_\infty \leq \lambda_1}\right)}\left(\bar{y}\right) = \text{proj}_{\left\|\cdot \right\|_\infty \leq \lambda_1}\left(\bar{y} + \sigma D x^\star \right), \end{align*}
where
\begin{align*} x^\star := \mathop{\operatorname{arg\,min}}_{x} \frac{1}{2} \left\|x - b\right\|^2 + \lambda_1 \left\|Dx - \left(-\bar{y}/ \sigma \right)\right\|_{1, \varepsilon } \end{align*}
with \(\varepsilon = \lambda_1/ \sigma\) and where \(\left\|\cdot \right\|_{1, \varepsilon }\) is a smoothed version of the \(\ell_1\) norm defined as \(\left\| z \right\|_{1, \varepsilon } := \sum_{i=1}^l \left|z_i\right|_\varepsilon\) for all vectors \(z \in \mathbb{R}^l\) of arbitrary dimension \(l \geq 1\). In other words, evaluating the proximal operator in \(y\) in the primal-dual iterations (6.5) is no harder than solving a shifted and Huber smoothed TV scanline denoising problem, which can be solved directly in quadratic time by Algorithm 4.1 and our first reduction. Consequently, the per iteration time and memory complexity of this approach is \(\mathcal{O}(n^2)\). However, we observed in the scanline experiments from the previous subsection that solving a TV-Huber scanline denoising problem scales roughly linearly in time with problem size when applying Algorithm 4.1. Thus, we expect runtimes comparable to the previous two primal-dual iterations for the same number of iterations.
Experimental convergence rate results that compare these three approaches are shown in Figure 3, and the experimental setup is the same as in section SM1 for the best performing set of TGV2 parameters with respect to PSNR value. Here primal-dual refers to the iterations (6.2), DP primal-dual to the iterations (6.4), and faster DP primal-dual to the iterations (6.5). All three approaches were run for 500 iterations, which resulted in (relatively comparable) running times of 48, 74, and 132ms, respectively for the considered test signal of length \(n=512\). A ground truth minimizer was computed to evaluate the optimality gap by performing 10000 iterations of the faster DP primal-dual approach. To make the comparison as fair as possible, we have manually tuned the step size parameters for each approach to obtain the fastest convergence rate possible. Apparently, both DP-based primal-dual approaches severely outperform the baseline primal-dual approach (6.2). Notice that it takes 30000 iterations of the baseline primal-dual approach to compute an approximate minimizer of comparable energy than what the slower DP primal-dual approach computed in only 500 iterations. Furthermore, the faster DP primal-dual approach computes a solution that is visually indistinguishable from the ground truth minimizer in only 10 iterations.
Figure 3. Optimality gap versus iterations for the three different primal-dual variants.
Remark 6.2.
Notice that all three approaches can be trivially extended to tree graphs, spatially varying regularization weights, and higher-order TGV regularizers, which we omitted for the sake of simplicity. Furthermore, these extensions would exhibit the same per iteration time and memory complexities as described above.
However, the key problem with these DP primal-dual approaches is that they do not generalize to the nonconvex setting as they rely on strong duality. Approaches relying on smooth approximations or the coupling heuristic from [24] do not suffer from these shortcomings. They are also arguably easier to tune, as they lead to BCD algorithms where no step size tuning is necessary. Hence, we now turn our attention to those approaches.
To construct a smooth approximation, we can replace the TGV2 scanline denoising problem with
\begin{align*} \min_{x, v} \frac{1}{2} \left\|x - b\right\|^2 + \lambda_1 \left\|Dx - v\right\|_{1, \varepsilon } + \lambda_2 \left\|Dv\right\|_1 \end{align*}
and consider BCD subproblems of the form
\begin{align*} \min_x \frac{1}{2} \left\|x - b\right\|^2 + \lambda_1 \left\|Dx - v\right\|_{1, \varepsilon } \quad \text{and} \quad \min_v \lambda_1 \left\|Dx - v\right\|_{1, \varepsilon } + \lambda_2 \left\|Dv\right\|_1 \end{align*}
that can be exactly solved by combining Algorithm 4.1 with our first reduction. Consequently, the per iteration time and memory complexity for BCD, in this case, is \(\mathcal{O}(n^2)\). Notice that the second BCD subproblem could be solved faster by one of the continuous DP algorithms proposed in [21], but the per iteration complexities would still be dominated by the first subproblem. Furthermore, our Algorithm 4.1 empirically exhibits a linear runtime for both subproblems and consequently is quite fast in practice.
Remark 6.3.
Notice that the smoothing approach comes with many approximation, convergence, and convergence rate guarantees (for both the convex and the nonconvex case) that are summarized in section SM2 of the supplementary materials.
Similarly, to apply the coupling heuristic from [24], we can consider BCD subproblems of the form
\begin{align*} \min_{x, v} \frac{1}{2} \left\|x - b\right\|^2 + \lambda_1 \left\|Dx - v\right\|_1 \text{ s.t. } \left\|v - \tilde{v}\right\|_\infty \leq \varepsilon \quad \text{and} \quad \min_v \lambda_1 \left\|Dx - v\right\|_1 + \lambda_2 \left\|Dv\right\|_1. \end{align*}
The first subproblem can be solved in \(\mathcal{O}\left(n^2\right)\) worst-case time and memory by combining Algorithm 4.1 with our two reductions. The second subproblem can be solved in worst-case \(\mathcal{O}\left(n \log \log n\right)\) time and \(\mathcal{O}\left(n\right)\) memory by combining one of the continuous DP algorithms proposed in [21] with our two reductions. The resulting BCD per iteration time and memory complexity, in this case, is \(\mathcal{O}(n^2)\), but in practice again scales approximately linearly with problem size.
Experimental TGV2 scanline denoising results are shown in Figure 4, and the experimental setup is again the same as in section SM1. The parameters \(\lambda_1\) and \(\lambda_2\) were chosen via grid search to produce the best performing TGV2 denoising parameters with respect to PSNR value. The upper part of the figure shows that the smoothed approach converges in a few BCD iterations to a minimizer of the smoothed problem. The bottom left part shows that the box-constrained approach needs considerably more iterations to converge to an approximate minimizer of the original problem. The bottom right part shows the obtained denoising results, and the important observation here is that both BCD approaches are able to provide piecewise affine solutions. With respect to the ground truth signal \(x^{\text{gt}}\), the solution for the smoothed variant attains a PSNR value of \(40.68\) dB, the box-constrained variant attains a PSNR value of \(38.82\) dB, while the computed TGV2 minimizer attains a PSNR value of \(41.03\) dB. Note also that even though the smoothed variant solves an approximate problem, in most of our experiments, we actually observed that it outperforms standard TGV2 denoising with respect to PSNR value if the \(\lambda\) parameters are tuned with respect to the smoothed BCD approach. The running time for performing 20 iterations was 7ms for both BCD approaches.
Figure 4. TGV2 denoising results for \(\lambda_1=0.182\) and \(\lambda_2=7.197\).
An important conclusion that we can draw here is that for the same computational and memory budget, the smoothing approach as proposed in this paper considerably outperforms (both visually and with respect to PSNR value) the coupling heuristic from [24] on this simple convex scanline denoising problem. This suggests that focus should be put on the smoothing approach for more difficult problems like nonconvex extensions or inverse problems defined over images.
We will demonstrate the former by considering a (nonconvex) truncated TGV2 scanline denoising problem
\begin{align*} \min_{x, v} \frac{1}{2} \left\|x - b\right\|^2 + \lambda_1 \left\|Dx - v\right\|^{C_1}_{1, \varepsilon } + \lambda_2 \left\|Dv\right\|^{C_2}_1 \end{align*}
that can be approximatively solved via BCD by considering subproblems of the form
\begin{align*} \min_x \frac{1}{2} \left\|x - b\right\|^2 + \lambda_1 \left\|Dx - v\right\|^{C_1}_{1, \varepsilon } \quad \text{and} \quad \min_v \lambda_1 \left\|Dx - v\right\|^{C_1}_{1, \varepsilon } + \lambda_2 \left\|Dv\right\|^{C_2}_1, \end{align*}
both of which can be exactly solved by Algorithm 5.1, although with a predicted exponential worst-case time and memory complexity per BCD iteration. Note also that, strictly speaking, this formulation does not satisfy the requirements of Theorem SM2.7 from the supplementary materials, since the third term is nonsmooth. However, we have always observed a decrease behavior of this BCD approach on all of our considered test examples. In fact, this approach usually computes slightly better solutions than the alternative BCD approach where the third term is also smoothed (which is an approach that does satisfy the requirements of Theorem SM2.7).
Figure 5 illustrates the performance that can be achieved when we apply the nonconvex truncated TGV2 denoising approach to the same example. The computed solution clearly qualitatively outperforms the convex approaches shown in Figure 4, but also attains a PSNR value of \(43.62\) dB when compared against the ground truth signal \(x^{\text{gt}}\). This means that this solution outperforms the TGV2 minimizer by more than \(2.5\) dB. This is quite notable because the parameters for this approach were manually chosen with minimal effort. Again, despite the predicted exponential worst-case complexities, the running time of this approach for 20 BCD iterations was only 131ms with a tractable memory footprint. Furthermore, on the left plot in Figure 5, we can see that an approximate minimizer for the problem was computed in roughly 10 iterations, which makes the approach very useful in practice.
Figure 5. Truncated TGV2 denoising results for \((\lambda_1, \lambda_2, C_1, C_2, \varepsilon ) = (0.45, 4, 0.015, 0.7, 10^{-3})\).
Remark 6.4.
The purpose of this BCD example was to show that the smoothing approach as proposed in this paper outperforms the box-constrained heuristic from [24]. Better denoising performance for this particular example can likely be obtained by the approaches [2729] that are based on segment penalties.

6.3. TV-regularized stereo via continuous semiglobal matching.

Consider a nonconvex optimization problem of the form
\begin{align} \min_{x \in \mathbb{R}^{mn}} \sum_{i \in \mathcal{V}} f_i\left(x_i\right) + \text{TV}^C_{h, \varepsilon }\left(x\right) + \text{TV}^C_{v, \varepsilon }\left(x\right), \end{align}
(6.6)
where the unary cost terms \(f_i\) are nonconvex piecewise quadratic functions, and \(\text{TV}^C_{\varepsilon, h}\) and \(\text{TV}^C_{\varepsilon, v}\) denote smoothed and truncated total variation terms defined as
\begin{align} \text{TV}^C_{d, \varepsilon }\left(x\right) := \sum_{ij \in \mathcal{E}_d} w_{ij} \cdot \left|x_j - x_i\right|^C_\varepsilon, \quad d \in \left\{h, v\right\} =: \mathcal{D}. \end{align}
(6.7)
Here the obvious assumption is that the underlying graph is a 4-connected grid graph, which in turn can be decomposed into horizontal edges \(\mathcal{E}_h\) and vertical edges \(\mathcal{E}_v\) (i.e., \(\mathcal{E} := \mathcal{E}_h \cup \mathcal{E}_v\)).
Problems of this form arise naturally in the context of stereo vision. A very popular and efficient DP-based heuristic to solve problems of this form over discrete value spaces is the SGM algorithm introduced in [13]. As hinted earlier in sections 1 and 2, we can also use SGM to solve problems of this form defined over continuous value spaces as our messages are computed via partial minimization. To do so, we can just compute the messages \(\varphi_{i \to j}\) (via Algorithm 5.1) over scanlines from left to right, right to left, top to bottom, and bottom to top and apply the SGM fusion step
\begin{align*} x^\star_j = \mathop{\operatorname{arg\,min}}_{x_j} f_j\left(x_j\right) + \sum_{i \in \mathcal{N}\left(j\right)} \varphi_{i \to j}\left(x_j\right) \end{align*}
to compute an approximate minimizer for each pixel \(j\). Here \(\mathcal{N}\left(j\right)\) denotes pixels that belong to the 4-connected neighborhood around pixel \(j\).
We will consider the motorcycle stereo sample (as shown in Figure 6) from the Middlebury stereo dataset [26] to demonstrate continuous SGM in this setting. We obtained the unary cost terms \(f_i\) by computing a quadratic interpolation of negative zero-normalized cross-correlation matching costs for 140 candidate disparity values. This results in piecewise quadratic unary cost functions with 140 contact points. We used the well-known contrast guided heuristic from [3] to set up the weights \(w_{ij}\). The obtained SGM stereo results are shown in Figure 7, from which we can see that the discontinuities of the ground truth disparity map were preserved (except for a few outliers that can be removed via standard stereo postprocessing techniques), and that the approach inherently produces subpixel accurate disparity estimates. The result was computed in roughly \(2.5\) minutes, and the SGM algorithm had a maximal memory usage of 33 GB, which demonstrates again that Algorithm 5.1 is fast in practice despite its theoretically predicted exponential worst-case time and memory complexities.
Figure 6. Motorcycle stereo sample.
Figure 7. Continuous SGM stereo results.

6.4. Stereo with convex and nonconvex TGV2 regularization.

Consider a nonconvex stereo problem of the form
\begin{align} \min_{x, v_h, v_v} \sum_{i \in \mathcal{V}} f_i\left(x_i\right) + \text{TGV}^{C_1, C_2}_{2, \varepsilon }\left(x, v_h, v_v\right), \end{align}
(6.8)
where the unary cost terms \(f_i\) are smooth nonconvex piecewise quadratic functions, \(x \in \mathbb{R}^{MN}\), \(v_h \in \mathbb{R}^{M(N - 1)}\) and \(v_v \in \mathbb{R}^{(M - 1) N}\). Here \(\text{TGV}^{C_1, C_2}_{2, \varepsilon }\) denotes a second-order smoothed truncated TGV2 regularizer of the form
\begin{align*} \text{TGV}^{C_1, C_2}_{2, \varepsilon }(x, v_h, v_v) & = \widehat{\text{TV}}^{C_1}_{h, \varepsilon }\left(x, v_h\right) + \widehat{\text{TV}}^{C_1}_{v, \varepsilon }\left(x, v_v\right) + \sum_{d \in \mathcal{D}} \left(\text{TV}^{C_2}_{h}\left(v_d\right) + \text{TV}^{C_2}_{v}\left(v_d\right)\right). \end{align*}
The truncated \(\text{TV}\) terms are given by (6.7), while \(\widehat{\text{TV}}\) denotes the shifted and smoothed truncated total variation terms, defined as
\begin{align*} \widehat{\text{TV}}^C_{d, \varepsilon }\left(x, v_d\right) := \sum_{ij \in \mathcal{E}_d} w_{d, ij} \cdot \left|x_j - x_i - v_{d, ij}\right|^C_\varepsilon \text{ for } d \in \mathcal{D}. \end{align*}
Note that each individual total variation term in the TGV regularizer has its own positive edge weight vector \(w_d\) that contains the positive entries \(w_{d, ij}\). Also, for \(C_1 = C_2 = \infty\), the TGV2 regularizer becomes convex (i.e., we remove the truncation). We will not handle the convex case separately but rather consider it as a special case of the nonconvex variant.
This problem can be approximatively solved via alternating minimization by considering subproblems of the form
\begin{align} \min_{x} \sum_{i \in \mathcal{V}} f_i\left(x_i\right) + \widehat{\text{TV}}^{C_1}_{h, \varepsilon }\left(x, v_h\right) + \widehat{\text{TV}}^{C_1}_{v, \varepsilon }\left(x, v_v\right) \end{align}
(6.9)
and
\begin{align} \min_{v_d} \widehat{\text{TV}}^{C_1}_{h, \varepsilon }\left(x, v_d\right) + \text{TV}^{C_2}_{h}\left(v_d\right) +\text{TV}^{C_2}_{v}\left(v_d\right) \text{ for } d \in \mathcal{D}. \end{align}
(6.10)
Notice that problem (6.9) is a shifted and smoothed nonconvex total variation problem, while both subproblems in (6.10) are smooth truncated TV-\(\ell_1\) problems. As discussed previously, the regularizer can be either convex or nonconvex depending on the values of \(C_1\) and \(C_2\). Thus, the subproblems in (6.10) simplify to convex smooth TV-\(\ell_1\) problems for \(C_1 = C_2 = \infty\). However, the same is not true for (6.9), since the unary cost functions \(f_i\) are inherently nonconvex.
Adopting the approach of [21, subsection 5.3] yields primal-dual iterations for computing a stationary point of (6.9),
\begin{align} \begin{cases} x^{k+1}_h := \text{prox}_{\tau_k \left(\frac{1}{2} \sum_{i \in \mathcal{V}} f_i + \widehat{\text{TV}}^{C_1}_{h, \varepsilon }\left(\cdot, v_h\right) \right)}\left(x^k_h - \tau_k \bar{y}^k\right), \\ x^{k+1}_v := \text{prox}_{\tau_k \left(\frac{1}{2} \sum_{i \in \mathcal{V}} f_i + \widehat{\text{TV}}^{C_1}_{v, \varepsilon }\left(\cdot, v_v\right) \right)}\left(x^k_v + \tau_k \bar{y}^k\right), \\ y^{k+1} := y^k + \sigma_k \left(x^{k+1}_h - x^{k+1}_v\right),\\ \bar{y}^{k+1} := y^{k+1} + \theta^k \left(y^{k+1} - y^k\right), \end{cases} \end{align}
(6.11)
and primal-dual iterations for computing a stationary point of (6.10),
\begin{align} \begin{cases} v^{k+1}_{d, h} := \text{prox}_{\tau_k \left(\frac{1}{2} \widehat{\text{TV}}^{C_1}_{h, \varepsilon }\left(x, \cdot \right) + \text{TV}^{C_2}_{h} \right)}\left(v^k_{d, h} - \tau_k \bar{w}^k_d\right), \\[2pt] v^{k+1}_{d, v} := \text{prox}_{\tau_k \left(\frac{1}{2} \widehat{\text{TV}}^{C_1}_{h, \varepsilon }\left(x, \cdot \right) + \text{TV}^{C_2}_{v} \right)}\left(v^k_{d, v} + \tau_k \bar{w}^k_d\right), \\[2pt] w^{k+1}_d := w^k + \sigma_k \left(v^{k+1}_{d, h} - v^{k+1}_{d, v}\right),\\[2pt] \bar{w}^{k+1}_d := w^{k+1}_d + \theta^k \left(w^{k+1}_d - w^k_d\right), \end{cases} \quad \text{ for } d \in \mathcal{D}, \end{align}
(6.12)
where \(\tau_k, \sigma_k, \theta_k\) are positive step size parameters satisfying \(\tau_k \sigma_k \lt 1/2\) and \(\theta_k \in (0, 1]\). Notice again that in comparison to the nonconvex experiment in [21, subsection 5.3], we are able to compute exact proximal operators in the arising iterations (6.11) and (6.12) via Algorithm 5.1.
We will use the playtable sample (as shown in Figure 8) from the Middlebury stereo dataset [26] to demonstrate this approach. The reason for choosing this particular image is that it contains many slanted surfaces that should be recovered by the truncated TGV2 regularizer. We obtained the unary cost terms \(f_i\) by computing a quadratic interpolation of negative zero-normalized cross-correlation matching costs for 69 candidate disparity values. This results in piecewise quadratic unary cost functions with 69 contact points. We used again the contrast guided heuristic from [3] to set up the weights \(w_{d, ij}\). The weights of the noncoupled TV terms were additionally scaled by a factor of 10. The TV truncation values were set to \(C_1 = \infty\) and \(C_2=100\). Experimental TGV2 image denoising results are shown in Figure 9. Figure 9(a) shows the winner takes all stereo solution. Figure 9(b) shows the estimated piecewise affine disparity map. Figure 9(c) shows the energy for the BCD approach over the 20 iterations that were computed, and within those BCD iterations we have used 100 primal-dual iterations (6.11) to solve subproblem (6.9) and 100 primal-dual iterations (6.12) to solve subproblem (6.10). These BCD iterations required a CPU runtime of around 66 minutes with a memory usage of around 2.6 gigabytes, which again demonstrates that Algorithm 5.1 is fast and tractable in practice despite the theoretically predicted exponential worst-case time and memory complexities. Figure 10 further zooms into the estimated disparity map and shows the worst, median, and best estimated vertical scanlines with respect to mean squared error.
Figure 8. Playtable stereo sample.
Figure 9. Truncated TGV2 stereo results.
Figure 10. Truncated TGV2 stereo vertical scanline results.
Remark 6.5.
The purpose of this example was to show how stereo vision problems can be addressed with our continuous DP algorithms and not to produce state-of-the-art results in stereo vision. Another interesting state-of-the-art approach for piecewise affine motion estimation is the one from [11] that uses an ADMM splitting scheme where the inner problems are models based on segment penalties.

7. Conclusion.

We have explored continuous DP approaches and how they can be combined with BCD to tackle TGV regularized inverse problems defined over tree graphs. We have shown via a simple numerical example that BCD pathologically misbehaves for TGV regularized problems. This pathological behavior can be overcome by modifying the resulting BCD subproblems by either smoothing the TGV regularizer or using a box-constrained heuristic. This naturally leads to consider continuous optimization problems with shifted pairwise costs and their box-constrained generalization. We have shown that, under some mild conditions, these resulting continuous optimization problems can be reduced to their nonshifted counterparts in linear time and memory, which allows us to employ existing state-of-the-art continuous DP algorithms to exactly solve many of the resulting BCD subproblems while inheriting their worst-case memory and time complexities. The BCD subproblems of the smoothing approach, however, contain piecewise quadratic unary and pairwise costs, for which we proposed two general continuous DP algorithms that can exactly solve the resulting BCD subproblems both in the convex and nonconvex case.
Our continuous DP algorithm for the convex case has quadratic worst-case time and memory complexities, and our continuous DP algorithm for the nonconvex case has exponential worst-case time and memory complexities but works well in practice for smooth truncated TV pairwise costs. These algorithms are rather general and can be used as exact solvers for a wide range of problems defined over chains and trees. Notice also that as a byproduct of our analysis for the nonconvex case, we have shown that the message passing algorithm for the Potts model with nonconvex piecewise quadratic unaries runs in polynomial time on chains. Moreover, as a special case of this approach, we have shown that Johnson’s message passing algorithm [16] for solving Potts denoising problems with \(\ell^2_2\) unaries on chains runs in polynomial time.
Finally, we have demonstrated the applicability of our continuous DP algorithms in the context of solving different denoising and stereo vision problems. In particular, we have shown that our DP algorithms can be used to exactly and efficiently solve various scanline denoising problems (empirically in linear time despite the aforementioned worst-case complexities) to construct efficient solvers for TGV regularized denoising problems, to compute subpixel accurate stereo results by combining our algorithms with popular heuristics like SGM, or to solve TGV2-regularized stereo vision problems by combining our algorithms with Lagrangian decomposition schemes.
Future work will be focused on trying to design direct solvers for TGV regularized inverse problems defined over tree graphs based on continuous DP algorithms, and their application to inverse problems defined over images.

Acknowledgments.

The publication was written at Virtual Vehicle Research GmbH in Graz, Austria. The authors would like to acknowledge the financial support within the COMET K2 Competence Centers for Excellent Technologies from the Austrian Federal Ministry for Climate Action (BMK), the Austrian Federal Ministry for Labour and Economy (BMAW), the Province of Styria (Dept. 12), and the Styrian Business Promotion Agency (SFG). The Austrian Research Promotion Agency (FFG) has been authorized for the program management.
We would like to thank the anonymous reviewers for their constructive comments and additional references, which helped to improve the quality of our manuscript. We would also like to thank our colleague Martin Zach for the many fruitful discussions and helpful comments during the writing phase of the manuscript.

Footnote

*
Muhamed Kuric and Jan Ahmetspahic contributed equally to this work.

Supplementary Materials

Index of Supplementary Materials
Title of paper: Total Generalized Variation on a Tree
Authors: Muhamed Kuric, Jan Ahmetspahic and Thomas Pock
File: M155691_01.pdf
Type: PDF
Contents: The supplementary contains a numerical example for the pathological BCD behaviour when applied to TGV-regularized problems, summarized theoretical results for the smoothing approach, and the derivation of the proximal operator for one of the considered primal-dual variants in the paper.

References

1.
D. Bickson, Gaussian Belief Propagation: Theory and Application, preprint, arXiv:0811.2518, 2008.
2.
A. Blake, Comparison of the efficiency of deterministic and stochastic algorithms for visual reconstruction, IEEE Trans. Pattern Anal. Mach. Intell., 11 (1989), pp. 2–12, https://doi.org/10.1109/34.23109.
3.
Y. Y. Boykov and M.-P. Jolly, Interactive graph cuts for optimal boundary & region segmentation of objects in N-D images, in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2001, pp. 105–112, https://doi.org/10.1109/ICCV.2001.937505.
4.
K. Bredies, K. Kunisch, and T. Pock, Total generalized variation, SIAM J. Imaging Sci., 3 (2010), pp. 492–526, https://doi.org/10.1137/090769521.
5.
A. Chambolle and T. Pock, A first-order primal-dual algorithm for convex problems with applications to imaging, J. Math. Imaging Vision, 40 (2011), pp. 120–145, https://doi.org/10.1007/s10851-010-0251-1.
6.
T. F. Chan and S. Esedoglu, Aspects of total variation regularized \(L_1\) function approximation, SIAM J. Appl. Math., 65 (2005), pp. 1817–1837, https://doi.org/10.1137/040604297.
7.
P. L. Davies and A. Kovac, Local extremes, runs, strings and multiresolution, Ann. Statist., 29 (2001), pp. 1–65, https://doi.org/10.1214/aos/996986501.
8.
L. Dümbgen and A. Kovac, Extensions of smoothing via taut strings, Electron. J. Stat., 3 (2009), pp. 41–75, https://doi.org/10.1214/08-EJS216.
9.
G. Facciolo, C. de Franchis, and E. Meinhardt, MGM: A significantly more global matching for Stereovision, in Proceedings of the British Machine Vision Conference (BMVC), 2015.
10.
P. F. Felzenszwalb and D. P. Huttenlocher, Efficient belief propagation for early vision, Int. J. Comput. Vis., 70 (2006), pp. 41–54, https://doi.org/10.1007/s11263-006-7899-4.
11.
D. Fortun, M. Storath, D. Rickert, A. Weinmann, and M. Unser, Fast piecewise-affine motion estimation without segmentation, IEEE Trans. Image Process., 27 (2018), pp. 5612–5624, https://doi.org/10.1109/TIP.2018.2856399.
12.
F. Friedrich, A. Kempe, V. Liebscher, and G. Winkler, Complexity penalized M-estimation, J. Comput. Graph. Statist., 17 (2008), pp. 201–224, https://doi.org/10.1198/106186008X285591.
13.
H. Hirschmüller, Stereo processing by semiglobal matching and mutual information, IEEE Trans. Pattern Anal. Mach. Intell., 30 (2008), pp. 328–341, https://doi.org/10.1109/TPAMI.2007.1166.
14.
H. Hirschmüller and D. Scharstein, Evaluation of cost functions for stereo matching, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2007, pp. 1–8, https://doi.org/10.1109/CVPR.2007.383248.
15.
K. Hohm, M. Storath, and A. Weinmann, An algorithmic framework for Mumford-Shah regularization of inverse problems in imaging, Inverse Problems, 31 (2015), 115011, https://doi.org/10.1088/0266-5611/31/11/115011.
16.
N. A. Johnson, A dynamic programming algorithm for the fused lasso and \(L_0\)-segmentation, J. Comput. Graph. Statist., 22 (2013), pp. 246–260, https://doi.org/10.1080/10618600.2012.681238.
17.
J. H. Kappes, B. Andres, F. A. Hamprecht, C. Schnörr, S. Nowozin, D. Batra, S. Kim, B. X. Kausler, J. Lellmann, N. Komodakis, and C. Rother, A comparative study of modern inference techniques for discrete energy minimization problems, in Proceedings of the IEEE Internation Conference on Computer Vision and Pattern Recognition (CVPR), 2013, pp. 1328–1335, https://doi.org/10.1109/CVPR.2013.175.
18.
R. Killick, P. Fearnhead, and I. A. Eckley, Optimal detection of changepoints with a linear computational cost, J. Amer. Statist. Assoc., 107 (2012), pp. 1590–1598, https://doi.org/10.1080/01621459.2012.737745.
19.
P. Knöbelreiter, C. Sormann, A. Shekhovtsovand, F. Fraundorfer, and T. Pock, Belief propagation reloaded: Learning BP-layers for labeling problems, in Proceedings of the IEEE/CVF International Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 7897–7906, https://doi.org/10.1109/CVPR42600.2020.00792.
20.
V. Kolmogorov, Convergent tree-reweighted message passing for energy minimization, IEEE Trans. Pattern Anal. Mach. Intell., 28 (2006), pp. 1568–1583, https://doi.org/10.1109/TPAMI.2006.200.
21.
V. Kolmogorov, T. Pock, and M. Rolinek, Total variation on a tree, SIAM J. Imaging Sci., 9 (2016), pp. 605–636, https://doi.org/10.1137/15M1010257.
22.
K. P. Murphy, Y. Weiss, and M. I. Jordan, Loopy belief propagation for approximate inference: An empirical study, in Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI), 1999, pp. 467–475.
23.
M. Nikolova, A variational approach to remove outliers and impulse noise, J. Math. Imaging Vision, 20 (2004), pp. 99–120, https://doi.org/10.1023/B:JMIV.0000011326.88682.e5.
24.
R. Ranftl, T. Pock, and H. Bischof, Minimizing TGV-based variational models with non-convex data terms, in Proceedings of the International Conference on Scale Space and Variational Methods in Computer Vision (SSVM), 2013, pp. 282–293, https://doi.org/10.1007/978-3-642-38267-3_24.
25.
L. I. Rudin, S. Osher, and E. Fatemi, Nonlinear total variation based noise removal algorithms, Phys. D, 60 (1992), pp. 259–268, https://doi.org/10.1016/0167-2789(92)90242-F.
26.
D. Scharstein, H. Hirschmüller, Y. Kitajima, G. Krathwohl, N. Nešić, X. Wang, and P. Westling, High-resolution stereo datasets with subpixel-accurate ground truth, in Proceedings of the German Conference on Pattern Recognition (GCPR), 2014, pp. 31–42, https://doi.org/10.1007/978-3-319-11752-2_3.
27.
M. Storath, L. Kiefer, and A. Weinmann, Smoothing for signals with discontinuities using higher order Mumford-Shah models, Numer. Math., 143 (2019), pp. 423–460, https://doi.org/10.1007/s00211-019-01052-8.
28.
M. Storath and A. Weinmann, Fast partitioning of vector-valued images, SIAM J. Imaging Sci., 7 (2014), pp. 1826–1852, https://doi.org/10.1137/130950367.
29.
M. Storath and A. Weinmann, Smoothing Splines for Discontinuous Signals, preprint, arXiv:2211.12785, 2022.
30.
M. Storath, A. Weinmann, and M. Unser, Jump-penalized least absolute values estimation of scalar or circle-valued signals, Inf. Inference, 6 (2017), pp. 225–245, https://doi.org/10.1093/imaiai/iaw022.
31.
J. Sun, N.-N. Zheng, and H.-Y. Shum, Stereo matching using belief propagation, IEEE Trans. Pattern Anal. Mach. Intell., 25 (2003), pp. 787–800, https://doi.org/10.1109/TPAMI.2003.1206509.
32.
M. Tappen and W. T. Freeman, Comparison of graph cuts with belief propagation for stereo, using identical MRF parameters, in Proceedings of the IEEE International Conference on Computer Vision (ICCV), Vol. 2, 2003, pp. 900–906, https://doi.org/10.1109/ICCV.2003.1238444.
33.
A. Weinmann, M. Storath, and L. Demaret, The \(L^1\)-Potts functional for robust jump-sparse reconstruction, SIAM J. Numer. Anal., 53 (2015), pp. 644–673, https://doi.org/10.1137/120896256.
34.
Y. Weiss and W. T. Freeman, Correctness of belief propagation in Gaussian graphical models of arbitrary topology, Neural Comput., 13 (2001), pp. 2173–2200, https://doi.org/10.1162/089976601750541769.
35.
G. Winkler and V. Liebscher, Smoothers for discontinuous signals, J. Nonparametr. Stat., 14 (2002), pp. 203–222, https://doi.org/10.1080/10485250211388.
36.
S. J. Wright, Coordinate descent algorithms, Math. Program., 151 (2015), pp. 3–34, https://doi.org/10.1007/s10107-015-0892-3.

Information & Authors

Information

Published In

cover image SIAM Journal on Imaging Sciences
SIAM Journal on Imaging Sciences
Pages: 1040 - 1077
ISSN (online): 1936-4954

History

Submitted: 6 March 2023
Accepted: 17 November 2023
Published online: 30 May 2024

Keywords

  1. inverse problems
  2. total generalized variation
  3. infimal convolution
  4. continuous dynamic programming
  5. block coordinate descent
  6. higher-order regularization

MSC codes

  1. 68U10
  2. 90C25
  3. 90C39

Authors

Affiliations

Muhamed Kuric Contact the author
Virtual Vehicle Research GmbH, 8010 Graz, Austria.
Jan Ahmetspahic
Virtual Vehicle Research GmbH, 8010 Graz, Austria.
Thomas Pock
Institute of Computer Graphics and Vision, Graz University of Technology, 8010 Graz, Austria.

Funding Information

Austrian Federal Ministry for Climate Action (BMK)
NEO
Funding: The work of the authors was partially supported by the ADACORSA project, which received funding from the ECSEL Joint Undertaking (JU) under grant agreement 876019. The JU receives support from the European Union’s Horizon 2020 research and innovation programme and Germany, Netherlands, Austria, Romania, France, Sweden, Cyprus, Greece, Lithuania, Portugal, Italy, Finland, Turkey. In Austria the project was also funded by the program “IKT der Zukunft” of the Austrian Federal Ministry for Climate Action (BMK). The third author was partially supported by the EIC Pathfinder project Next Generation Molecular Data Storage (NEO).

Metrics & Citations

Metrics

Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited By

There are no citations for this item

View Options

View options

PDF

View PDF

Figures

Tables

Media

Share

Share

Copy the content Link

Share with email

Email a colleague

Share on social media