Understanding Nonsmooth Nonconvex Optimization part3(Machine Learning optimization 2024)

2 min readApr 17, 2024

Proximal Dogleg Opportunistic Majorization for Nonconvex and Nonsmooth Optimization(arXiv)

Abstract : We consider minimizing a function consisting of a quadratic term and a proximable term which is possibly nonconvex and nonsmooth. This problem is also known as scaled proximal operator. Despite its simple form, existing methods suffer from slow convergence or high implementation complexity or both. To overcome these limitations, we develop a fast and user-friendly second-order proximal algorithm. Key innovation involves building and solving a series of opportunistically majorized problems along a hybrid Newton direction. The approach directly uses the precise Hessian of the quadratic term, and calculates the inverse only once, eliminating the iterative numerical approximation of the Hessian, a common practice in quasi-Newton methods. The algorithm’s convergence to a critical point is established, and local convergence rate is derived based on the Kurdyka-Lojasiewicz property of the objective function. Numerical comparisons are conducted on well-known optimization problems. The results demonstrate that the proposed algorithm not only achieves a faster convergence but also tends to converge to a better local optimum compare to benchmark algorithms.

2.Zeroth-order Gradient and Quasi-Newton Methods for Nonsmooth Nonconvex Stochastic Optimization (arXiv)

Author : Luke Marrinan, Uday V. Shanbhag, Farzad Yousefian

Abstract : We consider the minimization of a Lipschitz continuous and expectation-valued function defined as f(x)≜E[f~(x,ξ)], over a closed and convex set. Our focus lies on obtaining both asymptotics as well as rate and complexity guarantees for computing an approximate stationary point (in a Clarke sense) via zeroth-order schemes. We adopt a smoothing-based approach reliant on minimizing fη where fη(x)=Eu[f(x+ηu)], u is a random variable defined on a unit sphere, and η>0. It has been observed that a stationary point of the η-smoothed problem is a 2η-stationary point for the original problem in the Clarke sense. In such a setting, we develop two sets of schemes with promising empirical behavior. (I) We develop a smoothing-enabled variance-reduced zeroth-order gradient framework (VRG-ZO) and make two sets of contributions for the sequence generated by the proposed zeroth-order gradient scheme. (a) The residual function of the smoothed problem tends to zero almost surely along the generated sequence, allowing for making guarantees for η-Clarke stationary solutions of the original problem; (b) To compute an x that ensures that the expected norm of the residual of the η-smoothed problem is within ε requires no greater than O(η−1ε−2) projection steps and O(η−2ε−4) function evaluations. (II) Our second scheme is a zeroth-order stochastic quasi-Newton scheme (VRSQN-ZO) reliant on a combination of randomized and Moreau smoothing; the corresponding iteration and sample complexities for this scheme are O(η−5ε−2) and O(η−7ε−4), respectively

Understanding Nonsmooth Nonconvex Optimization part3(Machine Learning optimization 2024)

Written by Monodeep Mukherjee