Convex Optimization in Deep Learning

Neil Wu
LSC PSD
Published in
4 min readJan 23, 2020

NeurIPS is indeed one of the most important conference in development of Deep Learning. At this year’s NeurIPS 2019, out of all the accepted papers, there’re 32 papers related to convex optimization.
Compares to past NeurIPS, convex optimization obviously becomes a trend. Therefore, I’ll talk about convex in less-math way, and talk about a paper “Differentiable Convex Optimization Layers” in NeurIPS2019 which implement convex optimization in pytorch and tensorflow and makes convex optimization Deep Learning avilable.

What is convex?

In general, convex optimization means that object function’s

Local Minimum = Global Minimum

It’s an ideal problems to solve since once you find a minimum, you can say its global minimum with confidence.
Convex optimization is to optimize the problem described as convex function, and convex function by its definition is:

If the math description didn’t ring your bell, let this picture (and my aweful hand writing) translate that for you:

Visualize convex function (source)

For detailed introductions of convex function, I strongly recommend note from Miriam Huang, her and written notes about help me a lot. Although her articles are written in Chinese, the note she wrote (link after “附上檔案:”) is pure English, so language shouldn’t be a problem.

Optimization problems is roughly categorized in two: Convex optimizations and Non-convex optimizations. Compares to Non-convex problems, Convex problems only needs to find minimum once, so it needs less computational intensitive and provides stable and exact output.

Are Neural Network Convex?

The answer is No.
You might want to argue that convex optimization shouldn’t be that interesting for machine learning since we often encounter loss surfaces like image below, that are far from convex.

Loss surface of VGG-56 (source)

But convex optimization is faster, simpler and less computationally intensive, so it is often easier to “convexify” a problem.
However, Neural Network is also known as Differentiable Problems. When it can be differentiable, means that it could be decomposed to numbers of convex problems. It brings out the paper mentioned earlier, “Differentiable Convex Optimization Layers”.

CVXPY

CVXPY is a python package that solve convex problems with easy steps. Published by Stanford Convex group. This paper is practically an upgrade of CVXPY package. Whenever there’s something “convex” and famous, there’s Stephen Boyd. This library was published by him and he also joined the research in this paper.

As paper described, they convert problems into a Disciplined Convex Programs and differentiate it. And then canonical the problem, and finally apply the coniv solver. An Affine-Solver-Affine form.
The nobelty of this paper is, previous works has to manually change the problems into canonical forms, but in latest CVXPY, it will automatically finishe this process. Here’s an example of CVXPY:

One step convex optimizations in python (source)

For practical use in Deel Learning, they also provides PyTorch and Tensorflow implementations of CVXPY called cvxpylayers. CVXPY also provides fool proof to detect if the objective function is DCP(disciplined convex problem) to prevent misset.

Pytorch implementation of CVXY (source)

However, the objective functions still need to be pre-difined. Even if CVXPY simplify the workflow of implementing convex optimization into deep learning, there’s still a huge wall ahead.

What Now?

“Efficiency” is the most important words in recent machine learning research. Since it’s nearly impossible for a model that costs hundreds of GPU and train for weeks, to develope a low-cost trainable model is important. For that point, Stochastic gradient descent is NOT a best method for learning Neural Network. Convex problems, if possible, will be one of the best alternative.

However, convex optimizations in Neural Networks are still in development with the nature that Neural Networks is non-convex. CVXPY still needs to define the objective function to solve, and current cost functions in use isn’t suitable for it. Looking forward for more convexify usage in Neural Networks.

if you like(this_article):
please(CLAPS)
follow(LSC_PSD)
# Thanks :)

--

--