Abstract.

The nonconvexity of the artificial neural network (ANN) training landscape brings optimization difficulties. While the traditional back-propagation stochastic gradient descent algorithm and its variants are effective in certain cases, they can become stuck at spurious local minima and are sensitive to initializations and hyperparameters. Recent work has shown that the training of a ReLU-activated ANN can be reformulated as a convex program, bringing hope to globally optimizing interpretable ANNs. However, naively solving the convex training formulation has an exponential complexity, and even an approximation heuristic requires cubic time. In this work, we characterize the quality of this approximation and develop two efficient algorithms that train ANNs with global convergence guarantees. The first algorithm is based on the alternating direction method of multipliers. It can solve both the exact convex formulation and the approximate counterpart, and it generalizes to a family of convex training formulations. Linear global convergence is achieved, and the initial several iterations often yield a solution with high prediction accuracy. When solving the approximate formulation, the per-iteration time complexity is quadratic. The second algorithm, based on the “sampled convex programs” theory, is simpler to implement. It solves unconstrained convex formulations and converges to an approximately globally optimal classifier. The nonconvexity of the ANN training landscape exacerbates when adversarial training is considered. We apply the robust convex optimization theory to convex training and develop convex formulations that train ANNs robust to adversarial inputs. Our analysis explicitly focuses on one-hidden-layer fully connected ANNs, but can extend to more sophisticated architectures.

Keywords

  1. robust optimization
  2. convex optimization
  3. adversarial training
  4. neural networks

MSC codes

  1. 68Q25
  2. 82C32
  3. 49M29
  4. 46N10
  5. 62M45

Get full access to this article

View all available purchase options and get full access to this article.

Supplementary Materials

PLEASE NOTE: These supplementary files have not been peer-reviewed.

Index of Supplementary Materials

Title of paper: Efficient Global Optimization of Two-layer ReLU Networks: Quadratic-time Algorithms and Adversarial Training
Authors: Yatong Bai, Tanmay Gautam, and Somayeh Sojoudi
File: Supplement.pdf
Type: PDF
Contents: Additional anaysis, additional experiments, detailed experiment settings, extensions, and proofs.

References

1.
A. Agrawal, R. Verschueren, S. Diamond, and S. Boyd, A rewriting system for convex optimization problems, J. Control Decis., 5 (2018), pp. 42–60.
2.
B. Anderson, Z. Ma, J. Li, and S. Sojoudi, Tightened convex relaxations for neural network robustness certification, in Proceedings of the IEEE Conference on Decision and Control, 2020.
3.
B. Anderson and S. Sojoudi, Certified robustness via locally biased randomized smoothing, in Proceedings of the 4th Annual Learning for Dynamics and Control Conference, 2022, pp. 207–220.
4.
B. Anderson and S. Sojoudi, Data-driven certification of neural networks with random input noise, IEEE Transactions on Control Network Systems, 10 (2023), pp. 249–260.
5.
The MOSEK Optimization Toolbox for MATLAB Manual. Version 9.0, Mosek ApS, 2019.
6.
R. Arora, A. Basu, P. Mianjy, and A. Mukherjee, Understanding deep neural networks with rectified linear units, in Proceedings of the International Conference on Learning Representations, 2018.
7.
A. Athalye, N. Carlini, and D. Wagner, Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples, in Proceedings of the International Conference on Machine Learning, 2018.
8.
F. Bach, Breaking the curse of dimensionality with convex neural networks, J. Mach. Learn. Res., 18 (2017), pp. 1–53.
9.
Y. Bai, B. G. Anderson, and S. Sojoudi, Improving the Accuracy-Robustness Trade-off of Classifiers via Adaptive Smoothing, preprint, https://arxiv.org/pdf/2301.12554.pdf, 2023.
10.
Y. Bai, T. Gautam, Y. Gai, and S. Sojoudi, Practical convex formulation of robust one-hidden-layer neural network training, in Proceedings of the American Control Conference, 2022.
11.
E. Belilovsky, M. Eickenberg, and E. Oyallon, Greedy layerwise learning can scale to ImageNet, in Proceedings of the International Conference on Machine Learning, 2019.
12.
Y. Bengio, N. Roux, P. Vincent, O. Delalleau, and P. Marcotte, Convex neural networks, in Proceedings of the Annual Conference on Neural Information Processing Systems, 2006.
13.
S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, Distributed optimization and statistical learning via the alternating direction method of multipliers, Found. Trends Mach. Learn., 3 (2011), pp. 1–122.
14.
A. Brutzkus and A. Globerson, Globally optimal gradient descent for a convnet with gaussian inputs, in Proceedings of the International Conference on Machine Learning, 2017.
15.
J. Cohen, E. Rosenfeld, and Z. Kolter, Certified adversarial robustness via randomized smoothing, in Proceedings of the International Conference on Machine Learning, 2019.
16.
S. Diamond and S. Boyd, CVXPY: A Python-embedded modeling language for convex optimization, J. Mach. Learn. Res., 17 (2016), pp. 1–5.
17.
S. S. Du, X. Zhai, B. Poczos, and A. Singh, Gradient descent provably optimizes over-parameterized neural networks, in Proceedings of the International Conference on Learning Representations, 2019.
18.
D. Dua and C. Graff, UCI Machine Learning Repository, University of California, Irvine, 2017.
19.
J. Eckstein and W. Yao, Approximate ADMM algorithms derived from lagrangian splitting, Comput. Optim. Appl., 68 (2017), pp. 363–405.
20.
T. Ergen and M. Pilanci, Global optimality beyond two layers: Training deep ReLU networks via convex programs, in Proceedings of the International Conference on Machine Learning, 2021.
21.
T. Ergen and M. Pilanci, Implicit convex regularizers of CNN architectures: Convex optimization of two- and three-layer networks in polynomial time, in Proceedings of the International Conference on Learning Representations, 2021.
22.
T. Ergen and M. Pilanci, Path Regularization: A Convexity and Sparsity Inducing Regularization for Parallel Relu Networks, preprint, https://arxiv.org/abs/2110.09548, 2021.
23.
T. Ergen, A. Sahiner, B. Ozturkler, J. Pauly, M. Mardani, and M. Pilanci, Demystifying batch normalization in ReLU networks: Equivalent convex optimization models and implicit regularization, in Proceedings of ICLR 2022.
24.
C. Gallicchio and S. Scardapane, Deep Randomized Neural Networks, preprint, https://arxiv.org/abs/2002.12287, 2020.
25.
I. J. Goodfellow, J. Shlens, and C. Szegedy, Explaining and harnessing adversarial examples, in Proceedings of the International Conference on Learning Representations, 2015.
26.
M. Grant and S. Boyd, CVX: Matlab Software for Disciplined Convex Programming, version 2.1, CVX Research, 2014.
27.
K. He, X. Zhang, S. Ren, and J. Sun, Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification, in Proceedings of the IEEE International Conference on Computer Vision, 2015.
28.
M. R. Hestenes, Multiplier and gradient methods, J. Optim. Theory Appl., 4 (1969), pp. 303–320.
29.
G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, Extreme learning machine: A new learning scheme of feedforward neural networks, in Proceedings of the IEEE International Joint Conference on Neural Networks, Vol. 2, 2004, pp. 985–990.
30.
R. Huang, B. Xu, D. Schuurmans, and C. Szepesvári, Learning with a Strong Adversary, preprint, https://arxiv.org/abs/1511.03034, 2015.
31.
S. H. Huang, N. Papernot, I. J. Goodfellow, Y. Duan, and P. Abbeel, Adversarial attacks on neural network policies, in Proceedings of the International Conference on Learning Representations, 2017.
32.
D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, in Proceedings of the International Conference on Learning Representations, 2015.
33.
A. Krizhevsky, Learning Multiple Layers of Features from Tiny Images, 2009.
34.
A. Kurakin, I. J. Goodfellow, and S. Bengio, Adversarial machine learning at scale, in Proceedings of the International Conference on Learning Representations, 2017.
35.
Y. Wang, J. Lacote, and M. Pilanci, The Hidden Convex Optimization Landscape of Regularized Two-Layer Re{LU} Networks: An Exact Characterization of Optimal Solutions, in International Conference on Learning Representations, 2022.
36.
Z. Lu and L. Xiao, On the complexity analysis of randomized block-coordinate descent methods, Math. Program., 152 (2015), pp. 615–642.
37.
Z. Ma and S. Sojoudi, A sequential framework towards an exact SDP verification of neural networks, in International Conference on Data Science and Advanced Analytics, 2021.
38.
A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, Towards deep learning models resistant to adversarial attacks, in Proceedings of the International Conference on Learning Representations, 2018.
39.
A. Mishkin, A. Sahiner, and M. Pilanci, Fast Convex Optimization for Two-Layer ReLU Networks: Equivalent Model Classes and Cone Decompositions, preprint, https://arxiv.org/abs/2202.01331, 2022.
40.
S. Moosavi-Dezfooli, A. Fawzi, and P. Frossard, DeepFool: A simple and accurate method to fool deep neural networks, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016.
41.
A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, PyTorch: An imperative style, high-performance deep learning library, in Advances in Neural Information Processing Systems, 2019.
42.
M. Pilanci and T. Ergen, Neural networks are convex regularizers: Exact polynomial-time convex optimization formulations for two-layer networks, in Proceedings of the International Conference on Machine Learning, 2020.
43.
L. Prechelt, Early stopping—but when?, in Neural Networks: Tricks of the Trade, Lecture Notes in Comput. Sci. 7700, 2nd ed., Springer, New York, 2012, pp. 53–67.
44.
A. Raghunathan, J. Steinhardt, and P. Liang, Certified defenses against adversarial examples, in Proceedings of the International Conference on Learning Representations, 2018.
45.
D. E. Rumelhart, G. E. Hinton, and R. J. Williams, Learning representations by back-propagating errors, Nature, 323 (1986), pp. 533–536.
46.
A. Sahiner, T. Ergen, J. M. Pauly, and M. Pilanci, Vector-output ReLU neural network problems are copositive programs: Convex analysis of two layer networks and polynomial-time algorithms, in Proceedings of the International Conference on Learning Representations, 2021.
47.
C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. J. Goodfellow, and R. Fergus, Intriguing properties of neural networks, in Proceedings of the International Conference on Learning Representations, 2014.
48.
G. Taylor, R. Burmeister, Z. Xu, B. Singh, A. Patel, and T. Goldstein, Training neural networks without gradients: A scalable ADMM approach, in Proceedings of the 33rd International Conference on Machine Learning, 2016.
49.
J. Wang, F. Yu, X. Chen, and L. Zhao, ADMM for efficient deep learning with global convergence, in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2019.
50.
E. Wong and Z. Kolter, Provable defenses against adversarial examples via the convex outer adversarial polytope, in Proceedings of the International Conference on Machine Learning, 2018.
51.
H. Xiao, K. Rasul, and R. Vollgraf, Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms, preprint, https://arxiv.org/pdf/1708.07747.pdf, 2017.

Information & Authors

Information

Published In

cover image SIAM Journal on Mathematics of Data Science
SIAM Journal on Mathematics of Data Science
Pages: 446 - 474
ISSN (online): 2577-0187

History

Submitted: 6 January 2022
Accepted: 24 October 2022
Published online: 1 June 2023

Keywords

  1. robust optimization
  2. convex optimization
  3. adversarial training
  4. neural networks

MSC codes

  1. 68Q25
  2. 82C32
  3. 49M29
  4. 46N10
  5. 62M45

Authors

Affiliations

Department of Mechanical Engineering, University of California, Berkeley, Berkeley, CA 94720 USA.
Tanmay Gautam
Department of Electrical Engineering and Computer Science, University of California, Berkeley, Berkeley, CA 94720 USA.
Somayeh Sojoudi
Department of Mechanical Engineering and Department of Electrical Engineering and Computer Science, University of California, Berkeley, Berkeley, CA 94720 USA.

Funding Information

Funding: This work was supported by grants from ONR and NSF.

Metrics & Citations

Metrics

Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited By

There are no citations for this item

Media

Figures

Other

Tables

Share

Share

Copy the content Link

Share with email

Email a colleague

Share on social media

The SIAM Publications Library now uses SIAM Single Sign-On for individuals. If you do not have existing SIAM credentials, create your SIAM account https://my.siam.org.