A Block Coordinate Descent Method for Regularized Multiconvex Optimization with Applications to Nonnegative Tensor Factorization and Completion

Abstract

This paper considers regularized block multiconvex optimization, where the feasible set and objective function are generally nonconvex but convex in each block of variables. It also accepts nonconvex blocks and requires these blocks to be updated by proximal minimization. We review some interesting applications and propose a generalized block coordinate descent method. Under certain conditions, we show that any limit point satisfies the Nash equilibrium conditions. Furthermore, we establish global convergence and estimate the asymptotic convergence rate of the method by assuming a property based on the Kurdyka--Łojasiewicz inequality. The proposed algorithms are tested on nonnegative matrix and tensor factorization, as well as matrix and tensor recovery from incomplete observations. The tests include synthetic data and hyperspectral data, as well as image sets from the CBCL and ORL databases. Compared to the existing state-of-the-art algorithms, the proposed algorithms demonstrate superior performance in both speed and solution quality. The MATLAB code of nonnegative matrix/tensor decomposition and completion, along with a few demos, are accessible from the authors' homepages.

Keywords

  1. block multiconvex
  2. block coordinate descent
  3. Kurdyka--Łojasiewicz inequality
  4. Nash equilibrium
  5. nonnegative matrix and tensor factorization
  6. matrix completion
  7. tensor completion
  8. proximal gradient method

MSC codes

  1. 49M20
  2. 65B05
  3. 90C26
  4. 90C30
  5. 90C52

Get full access to this article

View all available purchase options and get full access to this article.

References

1.
M. Aharon, M. Elad, and A. Bruckstein, K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation, IEEE Trans. Signal Process., 54 (2006), pp. 4311--4322.
2.
S. Amari, A. Cichocki, and H. H. Yang, A new learning algorithm for blind signal separation, in Advances in Neural Information Processing Systems 8, MIT Press, Cambridge, MA, 1996, pp. 757--763.
3.
H. Attouch and J. Bolte, On the convergence of the proximal algorithm for nonsmooth functions involving analytic features, Math. Program., 116 (2009), pp. 5--16.
4.
H. Attouch, J. Bolte, P. Redont, and A. Soubeyran, Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the Kurdyka-Lojasiewicz inequality, Math. Oper. Res., 35 (2010), pp. 438--457.
5.
A. Auslender, Optimisation. Méthodes numériques, Masson, Paris, New York, Barcelona, 1976.
6.
B. W. Bader and T. G. Kolda, Efficient MATLAB computations with sparse and factored tensors, SIAM J. Sci. Comput., 30 (2007), pp. 205--231.
7.
B. W. Bader, T. G. Kolda, et al., MATLAB Tensor Toolbox Version 2.5, http://www.sandia.gov/ $\sim$tgkolda/TensorToolbox/ (January, 2012).
8.
A. Beck and M. Teboulle, A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM J. Imaging Sci., 2 (2009), pp. 183--202.
9.
M. W. Berry, M. Browne, A. N. Langville, V. P. Pauca, and R. J. Plemmons, Algorithms and applications for approximate nonnegative matrix factorization, Comput. Statist. Data Anal., 52 (2007), pp. 155--173.
10.
J. Bobin, Y. Moudden, J. L. Starck, J. Fadili, and N. Aghanim, SZ and CMB reconstruction using generalized morphological component analysis, Stat. Methodol., 5 (2008), pp. 307--317.
11.
J. Bochnak, M. Coste, and M.-F. Roy, Real Algebraic Geometry, Ergeb. Math. Grenzgeb. (3) 36, Springer-Verlag, Berlin, 1998.
12.
P. Bofill and M. Zibulevsky, Underdetermined blind source separation using sparse representations, Signal Process., 81 (2001), pp. 2353--2362.
13.
J. Bolte, A. Daniilidis, and A. Lewis, The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems, SIAM J. Optim., 17 (2007), pp. 1205--1223.
14.
J. Bolte, A. Daniilidis, A. Lewis, and M. Shiota, Clarke subgradients of stratifiable functions, SIAM J. Optim., 18 (2007), pp. 556--572.
15.
S. S. Chen, D. L. Donoho, and M. A. Saunders, Atomic decomposition by basis pursuit, SIAM Rev., 43 (2001), pp. 129--159.
16.
Y. Chen, M. Rege, M. Dong, and J. Hua, Non-negative matrix factorization for semi-supervised data clustering, Knowl. Inf. Syst., 17 (2008), pp. 355--379.
17.
E. C. Chi and T. G. Kolda, On Tensors, Sparsity, and Nonnegative Factorizations, preprint, arXiv:1112.2414v1 [math.NA], 2011.
18.
S. Choi, A. Cichocki, H. M. Park, and S. Y. Lee, Blind source separation and independent component analysis: A review, Neural Inform. Process.---Lett. Rev., 6 (2005), pp. 1--57.
19.
A. Cichocki and A.-H. Phan, Fast local algorithms for large scale nonnegative matrix and tensor factorizations, IEICE Trans. Fundamentals, 92 (2009), pp. 708--721.
20.
D. Donoho and V. Stodden, When does non-negative matrix factorization give a correct decomposition into parts?, in Advances in Neural Information Processing Systems 16, MIT Press, Cambridge, MA, 2004, pp. 1141--1149.
21.
M. P. Friedlander and K. Hatz, Computing non-negative tensor factorizations, Optim. Methods Softw., 23 (2008), pp. 631--647.
22.
L. Grippo and M. Sciandrone, On the convergence of the block nonlinear Gauss-Seidel method under convex constraints, Oper. Res. Lett., 26 (2000), pp. 127--136.
23.
S. P. Han, A successive projection method, Math. Programming, 40 (1988), pp. 1--14.
24.
C. Hildreth, A quadratic programming procedure, Naval Res. Logistics Quart., 4 (1957), pp. 79--85.
25.
P. O. Hoyer, Non-negative matrix factorization with sparseness constraints, J. Mach. Learn. Res., 5 (2004), pp. 1457--1469.
26.
T. P. Jung, S. Makeig, C. Humphries, T. W. Lee, M. J. McKeown, V. Iragui, and T. J. Sejnowski, Removing electroencephalographic artifacts by blind source separation, Psychophysiology, 37 (2000), pp. 163--178.
27.
C. Jutten and J. Herault, Blind separation of sources, Part 1: An adaptive algorithm based on neuromimetic architecture, Signal Process., 24 (1991), pp. 1--10.
28.
J. Karhunen, A. Hyvarinen, R. Vigário, J. Hurri, and E. Oja, Applications of neural blind separation to signal and image processing, in Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP-97), Vol. 1, 1997, pp. 131--134.
29.
H. A. L. Kiers, Towards a standardized notation and terminology in multiway analysis, J. Chemometrics, 14 (2000), pp. 105--122.
30.
H. Kim and H. Park, Nonnegative matrix factorization based on alternating nonnegativity constrained least squares and active set method, SIAM J. Matrix Anal. Appl., 30 (2008), pp. 713--730.
31.
H. Kim, H. Park, and L. Eldén, Non-negative tensor factorization based on alternating large-scale non-negativity-constrained least squares, in Proceedings of the 7th IEEE International Conference on Bioinformatics and Bioengineering (BIBE 2007), 2007, pp. 1147--1151.
32.
J. Kim and H. Park, Toward faster nonnegative matrix factorization: A new algorithm and comparisons, in Proceedings of the 2008 Eighth IEEE International Conference on Data Mining (ICDM'08), 2008, pp. 353--362.
33.
J. Kim and H. Park, Fast nonnegative tensor factorization with an active-set-like method, in High-Performance Scientific Computing, Springer, London, 2012, pp. 311--326.
34.
Y.-D. Kim and S. Choi, Nonnegative Tucker decomposition, in Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition (CVPR '07), 2007, pp. 1--8.
35.
T. G. Kolda and B. W. Bader, Tensor decompositions and applications, SIAM Rev., 51 (2009), pp. 455--500.
36.
K. Kurdyka, On gradients of functions definable in o-minimal structures, Ann. Inst. Fourier (Grenoble), 48 (1998), pp. 769--783.
37.
M.-J. Lai and J. Wang, An unconstrained $\ell_q$ minimization with $0<q\leq1$ for sparse solution of underdetermined linear systems, SIAM J. Optim., 21 (2011), pp. 82--101.
38.
D. D. Lee and H. S. Seung, Learning the parts of objects by non-negative matrix factorization, Nature, 401 (1999), pp. 788--791.
39.
D. D. Lee and H. S. Seung, Algorithms for non-negative matrix factorization, in Advances in Neural Information Processing Systems 13, MIT Press, Cambridge, MA, 2001, pp. 556--562.
40.
H. Lee, A. Battle, R. Raina, and A. Y. Ng, Efficient sparse coding algorithms, in Advances in Neural Information Processing Systems 19, MIT Press, Cambridge, MA, 2007, pp. 801--808.
41.
S. Z. Li, X. W. Hou, H. J. Zhang, and Q. S. Cheng, Learning spatially localized, parts-based representation, in Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2001), Vol. 1, 2001, pp. 207--212.
42.
C.-J. Lin, Projected gradient methods for nonnegative matrix factorization, Neural Comput., 19 (2007), pp. 2756--2779.
43.
J. K. Lin, D. G. Grier, and J. D. Cowan, Feature extraction approach to blind source separation, in Proceedings of the 1997 IEEE Workshop on Neural Networks for Signal Processing [1997] VII, 1997, pp. 398--405.
44.
J. Liu, J. Liu, P. Wonka, and J. Ye, Sparse non-negative tensor factorization using columnwise coordinate descent, Pattern Recogn., 45 (2012), pp. 649--656.
45.
J. Liu, P. Musialski, P. Wonka, and J. Ye, Tensor completion for estimating missing values in visual data, IEEE Trans. Pattern Anal. Mach. Intell., 25 (2013), pp. 208--220.
46.
S. Łojasiewicz, Sur la géométrie semi-et sous-analytique, Ann. Inst. Fourier (Grenoble), 43 (1993), pp. 1575--1595.
47.
D. G. Luenberger, Introduction to Linear and Nonlinear Programming, Addison-Wesley, Reading, MA, 1973.
48.
Z. Q. Luo and P. Tseng, Error bounds and convergence analysis of feasible descent methods: A general approach, Ann. Oper. Res., 46 (1993), pp. 157--178.
49.
J. Mairal, F. Bach, J. Ponce, and G. Sapiro, Online dictionary learning for sparse coding, in Proceedings of the 26th Annual International Conference on Machine Learning, ACM, New York, 2009, pp. 689--696.
50.
O. L. Mangasarian and R. Leone, Parallel successive overrelaxation methods for symmetric linear complementarity problems and linear programs, J. Optim. Theory Appl., 54 (1987), pp. 437--446.
51.
M. Mørup, L. K. Hansen, and S. M. Arnfred, Algorithms for sparse nonnegative Tucker decompositions, Neural Comput., 20 (2008), pp. 2112--2131.
52.
J. M. Ortega and W. C. Rheinboldt, Iterative Solution of Nonlinear Equations in Several Variables, Academic Press, New York, London, 1970.
53.
P. Paatero and U. Tapper, Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values, Environmetrics, 5 (1994), pp. 111--126.
54.
V. P. Pauca, J. Piper, and R. J. Plemmons, Nonnegative matrix factorization for spectral data analysis, Linear Algebra Appl., 416 (2006), pp. 29--47.
55.
V. P. Pauca, F. Shahnaz, M. W. Berry, and R. J. Plemmons, Text mining using non-negative matrix factorizations, in Proceedings of the 2004 SIAM International Conference on Data Mining, Orlando, FL, 2004, pp. 452--456.
56.
M. J. D. Powell, On search directions for minimization algorithms, Math. Programming, 4 (1973), pp. 193--201.
57.
M. Razaviyayn, M. Hong, and Z.-Q. Luo, A unified convergence analysis of block successive minimization methods for nonsmooth optimization, SIAM J. Optim., 23 (2013), pp. 1126--1153.
58.
B. Recht, M. Fazel, and P. A. Parrilo, Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization, SIAM Rev., 52 (2010), pp. 471--501.
59.
R. T. Rockafellar, Monotone operators and the proximal point algorithm, SIAM J. Control Optim., 14 (1976), pp. 877--898.
60.
R. T. Rockafellar and R. J.-B. Wets, Variational Analysis, Grundlehren Math. Wiss. 317, Springer-Verlag, New York, 1998.
61.
R. W. H Sargent and D. J. Sebastian, On the convergence of sequential minimization algorithms, J. Optim. Theory Appl., 12 (1973), pp. 567--575.
62.
C. Servière and P. Fabry, Principal component analysis and blind source separation of modulated sources for electro-mechanical systems diagnostic, Mech. Syst. Signal Process., 19 (2005), pp. 1293--1311.
63.
A. Shashua and T. Hazan, Non-negative tensor factorization with applications to statistics and computer vision, in Proceedings of the 22nd International Conference on Machine Learning, ACM, New York, 2005, pp. 792--799.
64.
R. Tibshirani, Regression shrinkage and selection via the lasso, J. Roy. Statist. Soc. Ser. B, 58 (1996), pp. 267--288.
65.
P. Tseng, Dual coordinate ascent methods for non-strictly convex minimization, Math. Programming, 59 (1993), pp. 231--247.
66.
P. Tseng, Convergence of a block coordinate descent method for nondifferentiable minimization, J. Optim. Theory Appl., 109 (2001), pp. 475--494.
67.
P. Tseng and S. Yun, A coordinate gradient descent method for nonsmooth separable minimization, Math. Program., 117 (2009), pp. 387--423.
68.
L. R. Tucker, Some mathematical notes on three-mode factor analysis, Psychometrika, 31 (1966), pp. 279--311.
69.
L. Wang, J. Zhu, and H. Zou, Hybrid huberized support vector machines for microarray classification and gene selection, Bioinformatics, 24 (2008), pp. 412--419.
70.
J. Warga, Minimizing certain convex functions, J. Soc. Indust. Appl. Math., 11 (1963), pp. 588--593.
71.
M. Welling and M. Weber, Positive tensor factorization, Pattern Recogn. Lett., 22 (2001), pp. 1255--1261.
72.
Z. Wen, D. Goldfarb, and K. Scheinberg, Block coordinate descent methods for semidefinite programming, in Handbook on Semidefinite, Conic and Polynomial Optimization, Springer, New York, 2012, pp. 533--564.
73.
W. Xu, X. Liu, and Y. Gong, Document clustering based on non-negative matrix factorization, in Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, New York, 2003, pp. 267--273.
74.
Y. Xu, W. Yin, Z. Wen, and Y. Zhang, An alternating direction algorithm for matrix completion with nonnegative factors, Front. Math. China, 7 (2012), pp. 365--384.
75.
M. Yuan and Y. Lin, Model selection and estimation in regression with grouped variables, J. R. Stat. Soc. Ser. B Stat. Methodol., 68 (2006), pp. 49--67.
76.
S. Zafeiriou, Algorithms for nonnegative tensor factorization, in Tensors in Image Processing and Computer Vision, Springer, London, 2009, pp. 105--124.
77.
Q. Zhang, H. Wang, R. J. Plemmons, and V. P. Pauca, Tensor methods for hyperspectral data analysis: A space object material identification study, J. Opt. Soc. Amer. A, 25 (2008), pp. 3001--3012.
78.
Y. Zhang, An Alternating Direction Algorithm for Nonnegative Matrix Factorization, Technical report TR10-03, Department of Computational and Applied Mathematics, Rice University, Houston, TX, 2010.
79.
M. Zibulevsky and B. A. Pearlmutter, Blind source separation by sparse decomposition in a signal dictionary, Neural Comput., 13 (2001), pp. 863--882.

Information & Authors

Information

Published In

cover image SIAM Journal on Imaging Sciences
SIAM Journal on Imaging Sciences
Pages: 1758 - 1789
ISSN (online): 1936-4954

History

Submitted: 13 August 2012
Accepted: 20 May 2013
Published online: 24 September 2013

Keywords

  1. block multiconvex
  2. block coordinate descent
  3. Kurdyka--Łojasiewicz inequality
  4. Nash equilibrium
  5. nonnegative matrix and tensor factorization
  6. matrix completion
  7. tensor completion
  8. proximal gradient method

MSC codes

  1. 49M20
  2. 65B05
  3. 90C26
  4. 90C30
  5. 90C52

Authors

Affiliations

Metrics & Citations

Metrics

Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited By

View Options

View options

PDF

View PDF

Figures

Tables

Media

Share

Share

Copy the content Link

Share with email

Email a colleague

Share on social media