Dual-Domain Filtering

We propose dual-domain filtering, an image processing paradigm that couples spatial domain with frequency domain filtering. Our dual-domain defined filter removes artifacts like residual noise of other image denoising methods and compression artifacts. Moreover, iterating the filter achieves state-of-the-art image denoising results, but with a much simpler algorithm than competing approaches. The simplicity and versatility of the dual-domain filter makes it an attractive tool for image processing.


Introduction.
Image enhancement and reconstruction are important tasks in image processing. Images may be degraded by additive white Gaussian noise, by arbitrary method noise, or by compression artifacts. To improve such images, specialized tools are often developed for each type of degradation. Some image processing tools are generally potent for attacking such problems. The bilateral filter (BF) [32] and its variant, the joint-bilateral filter [27], have become popular tools due to their simplicity and effectiveness in removing named artifacts. For example, bilateral filtering can be used for denoising images contaminated with weak noise or for removing unwanted details. Also, adaptive bilateral filtering has been proven effective for JPEG deblocking, as proposed by Zhang and Gunturk [36] and Nath, Hazarika, and Mahanta [25].
However, the BF, due to its spatial definition, makes a trade-off between removal of noise and loss of contrast and detail. Typically, details are better preserved using transform domain methods, which is why some JPEG deblocking methods inspect the discrete cosine transform (DCT) coefficients of the blocks. Sophisticated image denoising methods operate in both spatial and frequency domains. Moreover, it is common practice to cast artifact removal as a denoising problem, by simply using existing denoising methods [15,31,3,16,9,8]. However, the best denoising methods are complex to implement and are not part of every image processing engineer's toolbox like the common BF is.
In this work, we introduce a simple but powerful image processing filter that we call the dual-domain filter (DDF). We build on dual-domain image denoising (DDID), which was recently introduced by Knaus and Zwicker [20] as a simple, but equally powerful, alternative to more complex image denoising methods. At the core, both DDID and DDF combine proposed improved regression-based denoising algorithms with data-adapted, anisotropic kernel functions, which are steered to align with image edges. Because their parametric kernel functions are prone to corruption by the noise in the input, they implement an iterative approach to denoise and re-estimate the kernel parameters in several steps. Bouboulis, Slavakis, and Theodoridis [2] propose a different approach in order to exploit kernels. They formulate denoising as a projection of the noisy input onto a reproducing kernel Hilbert space (RKHS), which is enriched with a semiparametric model that can explicitly represent sharp edges. Our approach is more related to kernel regression [31]. We also use an iterative approach to reestimate a kernel in several steps. Instead of using a parametric kernel, however, we use a nonparametric bilateral kernel. In addition, instead of denoising using regression, we denoise using a form of wavelet shrinkage, that is, transform domain filtering.
The nonlocal means (NLM) filter generalizes the BF by considering the differences between pairs of small patches around a neighbor pixel and the center pixel instead of just the pixel differences. Comparing patches instead of pixels leads to weights that are much more robust to noise in the input, and NLM is significantly more effective than bilateral filtering for noise removal. NLM was first proposed by Buades, Coll, and Morel [3], and the basic idea has been refined and extended in many ways. An important problem is to estimate parameters of the algorithm in a data-adaptive manner. Kevrann and Boulanger [19] developed a technique to locally adapt the size of the neighborhood window. Van De Ville and Kocher [33,34] estimate parameters of NLM using Stein's unbiased risk estimate (SURE). The computation of NLM can be accelerated by preselecting contributing patches based on various properties [23,11,26]. Denoising performance can be improved by combining it with kernel regression [6], clustering and principal component analysis [7], or spectral analysis [28] and by using more sophisticated patch similarity metrics, such as those based on principal component analysis [1] or exploiting rotational invariance [17].
The BM3D algorithm of Dabov et al. [9] combines the advantages of patch-based techniques like NLM with transform domain filtering. Instead of simply averaging pixels (or patches), the key idea in BM3D is to perform transform domain filtering on 3D (threedimensional) blocks of similar patches. The denoising quality of BM3D is still considered state-of-the-art today. The success of BM3D inspired many variations of the basic scheme of collecting and jointly denoising similar patches. Several approaches are based on building statistical models of the collected patches. For example, Dabov et al. [10] use shape-adaptive principal component analysis (SAPCA), Chatterjee and Milanfar [8] propose the patch-based locally optimal Wiener filter (PLOW), and Lebrun, Buades, and Morel [22] use a nonlocal Bayes (NLB) approach that assumes a Gaussian distribution of patches and applies maximum a posteriori (MAP) estimation to obtain denoised patches. While interesting from a theoretical perspective, in practice these extensions often provide modest gains over the original BM3D algorithm. In contrast to these approaches, our algorithm is not patch-based. We do not rely on collecting similar patches nor on evaluating patch similarities. Instead, we operate directly on entire 2D neighborhood windows.
Classical transform domain methods rely on image representations using sets of suitable basis functions that are chosen such that the signal can be represented accurately by few coefficients. That is, the image representation in the transform domain is sparse, and noise corrupts mostly the small coefficients. Denoising in the transform domain is the problem of c 2015 SIAM. Published by SIAM under the terms of the Creative Commons 4.0 license Downloaded 06/08/16 to 130.92.9.55. Redistribution subject to CCBY license estimating the basis coefficients of the denoised image, where one can exploit the sparsity of the representation. The most popular transforms are the DCT [35] and wavelets [29] and their many variations. Our approach is related to these techniques since it includes a transform domain filtering step based on local, windowed Fourier transforms. We combine this, however, with a bilateral kernel to avoid ringing artifacts, which otherwise often hamper pure transform domain approaches that rely on simple, data independent transforms like the Fourier transform. Our approach is related to denoising using shape adaptive DCT (SA-DCT) from Foi, Katkovnik, and Egiazarian [16]. Key differences between our work and theirs are that they use binary masks restricted to simple polygonal shapes, while we use bilateral kernels with continuous weights and arbitrary support. We directly apply the DFT to the masked data instead of using SA-DCT, and we iteratively refine the bilateral kernels and the denoising filters in several steps.
Learning-based approaches have become popular more recently. Burger, Schuler, and Harmeling [4] train a multilayer perceptron (MLP) for denoising. A disadvantage of this approach is that the perceptron has to be trained individually for each noise level. Dictionary learning-based methods construct patch-based representations by training overcomplete patch dictionaries from natural images. A classical approach is denoising with a dictionary learned using the K-SVD algorithm [15]. Learned dictionaries can also be combined with nonlocal techniques. The idea is to ensure that similar image patches are restored simultaneously using similar dictionary elements [24], which is called learned simultaneous sparse coding (LSSC). Dong, Shi, and Li [12] further build on this approach using low-rank techniques, and they propose spatially adaptive iterative singular-value thresholding (SAIST) for image denoising. Our approach achieves similar denoising performance with a much simpler algorithm that does not require any learning stage.
The problem of compression and denoising artifact removal is highly related to image denoising and often addressed with similar algorithms. For example, adaptive bilateral filtering has been proven effective for JPEG deblocking [36,25]. It is common to simply cast artifact removal as a denoising problem and use existing denoising methods as discussed above to solve it. Similarly, we will show that our DDF is highly effective for addressing these problems too.

The dual-domain filter (DDF).
We formulate DDF as a robust noise estimator in two domains, the spatial and frequency domains. Typical image denoising filters estimate a signal x directly from a noisy input y, attempting a decomposition y = x+n, where n is the noise. In contrast to such filters, our filter first estimates the noise n, which is then subtracted from the noisy signal y to obtain x. This seemingly subtle difference allows us to directly express noise estimation in both domains in an analogous fashion using robust kernels. While we make no assumptions about the signal, we assume to know the noise statistics. The noise statistics are used to robustly estimate the noise first in the spatial domain, then in the frequency domain.
For every pixel p, DDF estimates the noisen p in two steps. DDF first uses a BF in the spatial domain to obtain an intermediate noise estimaten p in the pixel value y p . The BF is defined over a square neighborhood of pixels q ∈ N p , where N p is a filter window, centered around pixel p and limited by radius r. DDF then re-estimates the noisen p in the frequency domain using the frequencies f ∈ F p , where 2. In addition, in subsection 3.3 we show a way to let DDF be guided by a second guide image. We will leverage guided filtering for our applications of DDF to denoising artifact removal (section 4.1) and iterative image denoising (section 4.3).

Noise estimation in the spatial domain.
The spatial domain filter is a reformulation of the BF, now designed to estimate the noisen p . We first subtract the pixel value y p from the neighbor pixel values y q , q ∈ N p , forming the differences d q as In the following all symbols with subscripts q are 2D arrays over q ∈ N p . We next introduce a bilateral kernel function k(|d q | 2 , |q − p| 2 ) based on the squared norms of the differences d q and the distances q − p between pixels. We assume that k(·, ·) includes a robust range kernel that normalizes the squared differences according to the known noise statistics (for concrete examples see section 4), and its purpose is to reject large values in |d q | 2 as outliers (that is, signal), and retain small values as our noise estimates. Using the bilateral kernel function, we compute the discrete bilateral kernel which is a 2D array over all pixels q ∈ N p . Finally, we obtain the intermediate noise estimaten p by locally convolving the differences d q with the normalized, discrete bilateral kernel We additionally introduce a confidence factor a ranging from 0 to 1.

Noise re-estimation in the frequency domain.
The frequency domain filter is designed to obtain our final noise estimaten p by exploiting the intermediate results from the previous section. We apply a reformulation of wavelet shrinkage based on the discrete Fourier transform (DFT) for this purpose. First, we leverage the intermediate noise estimaten p and the discrete bilateral kernel k q to avoid bias when re-estimating noise in the frequency domain. We start by subtracting the spatially estimated noisen p from the differences d q . Intuitively, all differences in d q are biased by the noise in the center pixel y p . By subtracting the estimated noisen p , we remove this bias. Then we mask the resulting signal using the discrete bilateral kernel k q to remove large differences in d q , which correspond to high contrast edges. They would otherwise bias the spectrum by introducing low-amplitude ringing at high frequencies, which would be confused with noise. Now we are ready to perform noise estimation using the DFT. We first obtain the DFT by computing inner products of the preprocessed signal (d q −n p ) k q with the Fourier basis functions, yielding the Fourier coefficients D f as with frequencies f ∈ F p . As before, the subscript f in D f denotes that the symbol is a 2D array over all frequencies f ∈ F p in the neighborhood window N p .
Next, we introduce the range kernel K(·) in the frequency domain, which is a function of a frequency coefficient, that is, a complex amplitude. We assume this to be a robust kernel that rejects large-amplitude signals, normalized by the energy according to the known noise statistics, and retains small values as our noise estimates (for concrete examples see section 4). Hence, K(·) serves the same purpose in the frequency domain as the bilateral kernel function k(·, ·) does in the spatial domain. Evaluating K(·) for all frequencies f ∈ F p leads to a discrete frequency domain kernel, Here we normalize the energy of the Fourier coefficients by the energy of the bilateral kernel k q , since the variance of a scaled signal is proportional to the squared factors. The pointwise product D f K f , where f ∈ F p , of the Fourier coefficients and the discrete frequency domain kernel now is devoid of high-amplitude coefficients, and it retains the low-amplitude coefficients as the desired noise estimates.
Finally, we reconstruct the center pixel noisen p by applying an inverse DFT to the noise estimates D f K f . This amounts to taking the dot product in the frequency domain between the Fourier coefficients D f and the frequency kernel K f aŝ which we define as the output of DDF at pixel p. The normalization factor 1/(2r +1) 2 corrects for the fact that the DFT is nonunitary. The parameter A is another confidence factor between 0 and 1. Now we have the final noise estimaten p , and we can subtract it from the noisy pixel y p to get the estimatex Durand and Dorsey [13] have made the connection of the BF to robust statistics and explored the replacement of the Gaussian kernels with other robust estimators, such as the Lorentzian and the Tukey estimator. In addition, Elad [14] showed that the BF is the first step in an iterative minimization of a local cost that is defined by the robust error norm corresponding to the robust kernel. The same space is available for exploration to DDF, to define the bilateral kernel function k(·, ·) and the new range kernel K(·) in the frequency domain. Concrete examples of these kernel functions are provided in section 4.

Guided DDF.
We can also formulate DDF as a "guided filter," similar to the jointbilateral filter [27] and the guided image filter [18]. Instead of having a single input image, we have an additional guide image g that defines the filter, which is then applied to the noisy input image y. Since the same computations are performed on both images, we can use a trick by using the complex substitution y → g + i y. We only have to make minor adaptations. First, we extract the real part as the guide to define the bilateral kernel, and (3.2) becomes Second, the Fourier transform now computes two real Fourier transforms simultaneously, one for the guide image g and another for the noisy image y. We extract the Fourier coefficients of the real part as the guide with Finally, the estimated noise is in the imaginary part of the output of guided DDF. Hence we write the noise estimate asn p = Im DDF p (g + iy), and the estimate of the denoised pixel as . The MATLAB implementation of DDF given by Algorithm 1 (see the appendix) works for both guided and unguided DDF.

Applications.
We demonstrate three applications using DDF. In subsection 4.1, we remove residual noise from common image denoising algorithms. In subsection 4.2, we perform deblocking of JPEG images. Finally, in subsection 4.3, we iterate the DDF to perform highquality image denoising. The code for artifact removal and image denoising of grayscale images is given by Algorithms 2 and 3 in the appendix and uses the DDF implementation of Algorithm 1. For all three applications we follow the adaptations made by DDID [20] to process color images (see Algorithms 4, 5, and 6 in the appendix). We perform a color-space transformation using DCT, and the range kernel in the BF relies on normalized Euclidean distances.

Removal of denoising artifacts.
To remove the artifacts of a denoising method, we postprocess the denoised output g with DDF. Specifically, we use g as the guide image to filter the original, noisy input image y. Hence, our output x is We configure DDF using the following confidence factors and kernels: Here, σ 2 is the noise variance in the noisy input y. The bilateral kernel function k(·, ·) in the spatial domain is the ordinary bilateral kernel with a range parameter γ r and a scale parameter σ s . For the range kernel in the frequency domain K(·), we choose the Epanechnikov estimator, introducing the range parameter γ f . Both range kernels normalize their input by dividing it by the noise variance σ 2 . We set the window radius for DDF as r = 15, the spatial scale as c 2015 SIAM. Published by SIAM under the terms of the Creative Commons 4.0 license Downloaded 06/08/16 to 130.92.9.55. Redistribution subject to CCBY license σ s = 7, the spatial range as γ r = 0.7, and the frequency range as γ f = 2.3 for all denoising methods and independent of input noise levels σ 2 .
The top four rows of Figure 1 give visual examples for removing denoising artifacts. Lowfrequency noise of K-SVD, graininess of NLM, outliers of NLB, and wavy patterns of LSSC: all these artifacts are reduced or removed by DDF. Tables 1 and 2 show that nearly all grayscale output of most methods can be numerically improved by postprocessing with DDF. Table 3 shows the same for color images. The more smooth regions an image has, the larger the gain is in PSNR. This is not surprising, since most methods excel at denoising natural images, whereas they have difficulties denoising synthetic images where smooth regions dominate. Images denoised by SAIST show almost no artifacts and can be improved only for high-noise situations where the signal is more homogeneous. For grayscale and color images with noise sigma σ = 40, nearly all images show improvement.

JPEG artifact removal.
For JPEG deblocking, we have only the artifact contaminated image, so we use the same image for both the guide image g and the noisy image y. Otherwise, we use the same kernel functions as in the previous section to define the DDF. We compressed grayscale and color images using three quality settings in MATLAB, Q ∈ {30, 20, 10}, and used empirically found corresponding noise sigma, σ ∈ {20, 25, 40}. For grayscale images, we use the parameters r = 15, σ s = 7, γ r = 1.7, and γ f = 1.1. For color images, we change γ r = 2.8 and γ f = 4.2. We compare our results against SA-DCT, a stateof-the-art JPEG deblocker by Foi, Katkovnik, and Egiazarian [16]. We also compare against the bilaterally filtered image, using the same parameters as for DDF.
The last row in Figure 1 shows the deblocking of a JPEG image. The removed artifacts are the typical block patterns. The image improves almost everywhere. Table 4 numerically summarizes the results for deblocking JPEG images. For grayscale images, DDF approaches the quality of SA-DCT. For color images, the results are nearly identical.

Image denoising.
Here we formulate an iterative image denoiser with DDF using N iteration steps. We initialize our estimate x N with the noisy input y and then perform the guided iteration where n counts down from N to 1. A key idea is that we parameterize both the bilateral kernel function k n (·, ·) in the spatial domain and the range kernel K n (·) in the frequency domain depending on the iteration number n. Specifically, for iterative denoising we define them as The bilateral kernel function k n (·, ·) uses a clamped cosine raised to the nth power as its range and a Gaussian as its spatial kernel. It has a scale S n and a range parameter T n that depend on the iteration step n as  where σ 2 again is the noise variance. Intuitively, as n counts down from N to 1, the scale S n becomes larger; that is, the bilateral kernel function considers a larger and larger neighborhood of pixels. On the other hand, the range parameter T n becomes smaller, which means that as the support of the bilateral kernel function grows, it becomes more sensitive to pixel differences. We visualize this behavior in Figure 2 on the left. The definition implies that at the end of the iteration the scale is S 1 = 2σ 2 s and the range is T 1 = γ r σ 2 , where σ s and γ r are parameters that will need to be specified. The base α controls the initial values T N and S N . The range kernel in the frequency domain K n (·) is also defined as a raised cosine.  V , however, is constant over the iterations and defined as V = γ f σ 2 , where the parameter γ f will need to be specified. By raising the cosines to the nth power, we change the shape of the range kernels over the iterations. We found that Gaussians work better in the beginning, and functions with strong outlier rejection like the clamped cosine kernel work better at the end of the iterations. Since the Gaussian can be approximated by powers of cosines, we use this relationship to dynamically adjust the shapes of our range kernels in both domains. We also scale the cosines by a factor 1/ √ n such that the exponent n affects mostly the shapes of the functions but not their overall widths. As visualized in Figure 2 in the middle and on the right, the range kernels in both the spatial and frequency domains start as approximate Gaussians, become steeper over time, and end as clamped cosines. The range kernel in the spatial domain additionally becomes narrower as T n gets smaller over the iterations, while the width of the range kernel in the frequency domain stays practically constant.
We also specify the confidence factors a and A dynamically. In the beginning, due to the small scale parameters, the spatially estimated noise cannot be trusted. Over time, the noise estimate becomes more accurate until, in the end, it can be fully trusted. The same applies for the estimated noise in the frequency domain. We therefore specify that the confidence factors a n and A n follow a sine ramp from 0 to 1, expressed as a n = A n = cos n − 1 N π 2 .      Finally, for improved computational performance, we dynamically adapt the window radius. We define the window radius r n to be twice the spatial standard deviation S n /2 and at least 4 pixels large. The window radius r n is thus r n = max 4, round 2 S n /2 . Below, we visualize the bilateral kernels k n (from (4.6)) and the overall DDF kernels for three center pixels. In the leftmost column we show the corresponding neighborhood windows (see also red squares in the top left image) with the center pixels marked white, before and after denoising. The bilateral weights are always between 0 and 1, visualized using black and white, respectively. The overall DDF kernels consist of filter weights that correspond to (4.5). We compute these weights by expressing the window around a center pixel as a vector (by unrolling the window), and writing the application of DDF on this window as a sequence of matrix multiplications. For example, applying k n (from (4.6)) and K n (from (4.7)) are multiplications with diagonal matrices, and taking the DFT and its inverse can also be expressed as matrices, etc. We extract the filter weights for the center pixel as the central row of this matrix. The DDF weights can be negative; hence in the visualizations the gray background represents 0, brighter values are positive, and darker ones negative. The kernels are normalized such that the largest magnitude appears as white (or black). The visualizations show how, over the DDID2 iterations, the DDF kernels more and more accurately detect and follow the structures in the input image.
We show similar visualizations of modified versions of DDID2 in Figures 4 and 5 to provide more intuition. In Figure 4 we show DDID2 using DDF without the bilateral masking step; that is, we set k n (·, ·) ≡ 1 and we ignore the noise estimate   n p = 0. This means that denoising relies only on robust noise estimation in the Fourier domain. Without bilateral masking, however, ringing artifacts appear. Also, the resulting kernels do not adapt to structures in the image well. On the other hand, Figure 5 illustrates DDID2 using DDF without noise estimation in the Fourier domain; that is, we rely only on the spatial domain noise estimaten p (from (3.3)). In this case DDID2 is a form of iterative joint-bilateral filtering, similar to the "rolling guidance filter" recently proposed by Zhang et al. [37]. While the filter adapts to and preserves strong edges, we lose most low-contrast image detail. The combination of both steps, however, is surprisingly effective (Figure 3). Similar denoising filters have been proposed previously, like DDID [20] and the more recent PID [21]. The differences between their approaches and ours are the following: DDID relies on three steps of guided DDF steps with somewhat less effective kernels than DDID2. It uses a fixed scale σ s for the spatial Gaussian, and Gaussians instead of raised cosines with dynamically changing shapes for the range kernels. In DDID the spatial and frequency range parameters γ r and γ f are set manually for each of the three iterations. computes its range parameters as a function of the iteration number. A key improvement of DDID2 over DDID is that DDID2 allows an arbitrary choice of the number of iteration steps N , and using more than three iteration steps leads to significant improvements, as can be seen in Figure 6. PID uses a bilateral kernel with range and scale parameters similar to (4.8) and (4.9) due to deterministic annealing. However, PID does not change the shape of the range kernels as we propose in (4.6) and (4.7). More importantly, the PID formulation is not guided and needs at least 30 iterations and an additional guided DDID step for high-quality results. Our formulation requires only 8 guided iterations and is as simple as DDID, but it achieves consistently higher quality results. In addition, neither of these previous works investigated the effectiveness of the DDF framework for other tasks such as compression and denoising artifact removal.

Results.
We achieved the best results by using the constants N = 8, σ s = 13, level σ. These parameters lead to a sequence {4, 4, 4, 4, 6, 10, 16, 26} of window radii r n for n = 8, . . . , 1. In Figure 6 we illustrate the robustness of our method, which we call DDID2, with respect to parameter changes. The plots show PSNR values as functions either of a perturbation value β that modifies the default parameter values, or as a function of the number N of iteration steps. For the analysis, we used the Cameraman image and fixed the noise sigma to σ = 25. The plots for other images and noise levels are similar. Changing the number of iterations N or the frequency range parameter γ f has the biggest influence on the PSNR. The other parameters are robust against change, and PSNR values remain close to the optimum within a range on the order of 0.1 dB.
We provide MATLAB code for DDF and its applications to artifact removal and DDID2 denoising for both color and grayscale in the appendix. The MATLAB implementation of DDID2 takes about 100 seconds to process a 512 × 512 grayscale image using twelve threads on a Xeon E5-2630 CPU at 2.3 GHz, independent of the noise level. Figures 7 and 8 visually compare our new denoiser, DDID2, against other state-of-theart denoising methods that support color images. DDID2 and PID produce the cleanest and smoothest results. Table 5 [22], and DDID2 has the best median rank. Some algorithms like BM3D-SAPCA [10], however, are not available for color image denoising. It is interesting to compare the results on denoising artifact removal using DDF from section 4.1 with the denoising quality of DDID2. In some cases, denoising using one of the third-party methods and a DDF artifact-removal step outperforms DDID2. More often this is the case when the third party method on its own outperforms DDID2, but in rare cases adding DDF helps a third-party method that is slightly inferior on its own to overtake DDID2. The differences in PSNR values are typically small, however, and it is hard to make a systematic argument when this happens.

Conclusions.
We have introduced dual-domain filtering (DDF), a generalization of the spatial bilateral filter (BF) by including a frequency domain filter. A key idea in DDF is to estimate noise in both the spatial and frequency domains using robust kernels. We have demonstrated that DDF can improve most images with denoising and compression artifacts. By using DDF iteratively, we also implemented a new state-of-the-art image denoiser, DDID2. The simplicity and quality of DDF suggest that it may have the potential to become a universal tool for image enhancement and restoration like the BF.