Understanding Saddle Point Problems part1(Optimization)

Monodeep Mukherjee
3 min readSep 22, 2022
Photo by TAN Erica on Unsplash

1.On Scaled Methods for Saddle Point Problems (arXiv)

Author : Aleksandr Beznosikov, Aibek Alanov, Dmitry Kovalev, Martin Takáč, Alexander Gasnikov

Abstract : Methods with adaptive scaling of different features play a key role in solving saddle point problems, primarily due to Adam’s popularity for solving adversarial machine learning problems, including GANS training. This paper carries out a theoretical analysis of the following scaling techniques for solving SPPs: the well-known Adam and RmsProp scaling and the newer AdaHessian and OASIS based on Hutchison approximation. We use the Extra Gradient and its improved version with negative momentum as the basic method. Experimental studies on GANs show good applicability not only for Adam, but also for other less popular methods

2. Decentralized Saddle-Point Problems with Different Constants of Strong Convexity and Strong Concavity(arXiv)

Author : Dmitriy Metelev, Alexander Rogozin, Alexander Gasnikov, Dmitry Kovalev

Abstract : Large-scale saddle-point problems arise in such machine learning tasks as GANs and linear models with affine constraints. In this paper, we study distributed saddle-point problems (SPP) with strongly-convex-strongly-concave smooth objectives that have different strong convexity and strong concavity parameters of composite terms, which correspond to minand max variables, and bilinear saddle-point part. Two type of oracles are considered at each node: deterministic (gradient) and stochastic (unbiased stochastic gradient). For deterministic oracle the linear rate of convergence was also obtained in the case of not strongly convex-concave SPP.

3.Generalized Optimistic Methods for Convex-Concave Saddle Point Problems (arXiv)

Author : Ruichen Jiang, Aryan Mokhtari

Abstract : The optimistic gradient method has seen increasing popularity as an efficient first-order method for solving convex-concave saddle point problems. To analyze its iteration complexity, a recent work [arXiv:1901.08511] proposed an interesting perspective that interprets the optimistic gradient method as an approximation to the proximal point method. In this paper, we follow this approach and distill the underlying idea of optimism to propose a generalized optimistic method, which encompasses the optimistic gradient method as a special case. Our general framework can handle constrained saddle point problems with composite objective functions and can work with arbitrary norms with compatible Bregman distances. Moreover, we also develop an adaptive line search scheme to select the stepsizes without knowledge of the smoothness coefficients. We instantiate our method with first-order, second-order and higher-order oracles and give sharp global iteration complexity bounds. When the objective function is convex-concave, we show that the averaged iterates of our p-th-order method (p≥1) converge at a rate of O(1/Np+12). When the objective function is further strongly-convex-strongly-concave, we prove a complexity bound of O(L1μlog1ε) for our first-order method and a bound of O((LpDp−12/μ)2p+1+loglog1ε) for our p-th-order method (p≥2) respectively, where Lp (p≥1) is the Lipschitz constant of the p-th-order derivative, μ is the strongly-convex parameter, and D is the initial Bregman distance to the saddle point. Moreover, our line search scheme provably only requires an almost constant number of calls to a subproblem solver per iteration on average, making our first-order and second-order methods particularly amenable to implementation.

--

--

Monodeep Mukherjee

Universe Enthusiast. Writes about Computer Science, AI, Physics, Neuroscience and Technology,Front End and Backend Development