Solving an optimization problem

Sourav Agarwal (Youtube/datahat--simplified ai)

7 min readSep 17, 2022

Optimization in Python | Using SciPy | Optimization techniques

plot representing the feature and output space and local minima — Finding the minima

Optimization refers to the process of making the best or the most effective use of a situation or response. In Mathematics and in Machine Learning, this corresponds to selecting the best elements (parameters) in a given space with respect to various limitations (constraints)

Optimization problems are of the following form: Find the largest (or smallest) value of f(x) when a≤x≤b. Sometimes a or b are infinite, but frequently the real world imposes some constraint on the values that x may have.

In this article, we are going to discuss about Mathematical optimization and how it effects Machine Learning. Additionally, we shall explore some of the applications and how to apply algorithmic tools and libraries (here SciPy) to solve these optimization problems

So what are the various types of Optimization problems?

Before understanding that, we need to first understand the notion of variables and parameters.

Let us consider an example.

straight line equation to understand the variables and parameters — equation of a straight line

Here, ‘x’ is the independent variable and ‘y’ is the dependent variable. y also represents the function f(x). Whereas, ‘m’ and ‘c’ are the parameters of this function

An entity which takes its values from the input space or denotes the outcome of a function, are the variables. While, the entities whose value alters or defines the function itself are known as parameters.

Another, related term is hyperparameters which denotes the values pre-decided by the programmer to set the scope of the functions. For example, the value of ‘k’ in k-nearest neighbor algorithm or the max-depth of the tree in decision tree algorithm

In any optimization, the values of hyperparameters are already know, while we estimate the values of the parameters based on the input space

The parameters of a function are chosen based on the decision to either maximize or minimize the function output against the given objective. For example, a manufacturing company whose revenue is denoted by the function f, would have the objective to maximize the revenue. The optimization process thus would try to find the parameters of the function such that it maximizes the objective

Now, coming to the various types of optimization problems

depending on whether the variables are continuous or discrete: continuous optimization | discrete optimization
depending on the approach of finding the objective: local optimization | global optimization
depending on the convergence of the function: Convex optimization | non-convex optimization
based on constraints: constrained optimization | non-constrained optimization
based on the approach to optimization: gradient based optimization | gradient-free optimization

We shall explore some of the popular optimization techniques under the various categories and understand the approach to optimization

Nelder-mead

Assuming that you have no knowledge of calculus or algebra. Now, given a function, your interviewer asks you to find the lowest value. What would be your approach??

…

Well, the simplest of the approach could be to sample 2 points relatively near each other and repeatedly take a step away from the larger outcome. Interesting, isn’t it??

However, it runs through a problem. The step size is fixed. The closest it can get to the lowest outcome would be step size away from it. Now keeping a very small step size would spend too much time inching towards the solution :-,

Nelder-mead algorithm comes with a solution by dynamically adjusting the step size based on the loss w.r.t to the new point. If the newer point is better than the previous, the step size is enlarged to accelerate towards the solution.

Nelder-mead algorithm is a gradient-free pattern search approach and works with non-differentiable functions as well

One of the disadvantages of Nelder-mead is that it fails with higher dimensionality. To learn more about Nelder-mead refer here

Particle Swarm Optimization

According to the sociologists, “a school of fish or a flock of birds that moves in a group can profit from the experience of all other members.”

Each bird is to help us find the optimal solution in a high-dimensional space and the best solution by the flock is the best solution in space. It is however a heuristic solution as the presence of a global optima can never be proved.

To learn more about PSO, refer here

BFGS (Broyden-Fletcher-Goldfarb-Shanno)

Iterative algorithm for solving unconstrained non-linear optimization problems. BFGS decides the descent direction by preconditioning the gradient with curvature information. [to learn more, refer here]

Basin Hopping

It is a global optimization algorithm developed for use in chemical physics. The algorithm involves cycling two steps: a perturbation of good candidate solutions and the application of local search to the perturbed solution.

The perturbation allows the search algorithm to jump to new regions of the search space and potentially locate a new basin leading to a different optima. The local search then allows the algorithm to traverse the new basin to the optima. The decision to keep the new solution is controlled by a stochastic decision function with a “temperature” variable, like simulated annealing

The algorithm is much like an iterated local search with different starting points. [to learn more, refer here]

Having primary understanding on various optimization techniques, let us now understand How to choose the optimization technique??

BFGS and L-BFGS is preferred when we do not have the knowledge of gradient. However, for well-conditioned problems (the change in dependent variable is relative and comparable to the change in inputs), gradient-free methods such as Powell and Nelder-mead perform better

Scipy.org provides guidelines to choose the appropriate optimization technique — Choosing the right optimization

Note: Newton methods calculate the Hessian matrix, “by scratch”, at each iteration of the algorithm, either exactly, or by finite-differences of the gradient at that iteration. Quasi-Newton methods build up an approximation of the Hessian matrix by using the gradient differences across iterations

What are some of the applications of Optimization in real world??

Portfolio Optimization: Allocate funds to stocks to minimize risk for a target rate of return
Fleet routing: Determine which aircraft to fly on each route, and the sequence of segments flown by each aircraft
Employee Scheduling: Schedule park employees for weekly shifts to minimize payroll costs while meeting varying demand for each day of the week

Any Optimization problem would have 3 main parts: an objective function, decision variables and constraints. [Fact: The field of study of optimization problems is known as “Operations Research”]. When one talks of about formulating an optimization problem, it means translating a “real-world” problem into the mathematical equations and variables.

The objective function, often denotes as f or z, reflects a single quantity to be either maximized or minimized. Examples from the transportation industry include, “minimize congestion”, “maximize safety”, “minimize cost”, etc. [Note: A problem could have 1 or more objectives to fulfil]

The choice of decision variables (often denoted by x) is based on whether it influences the objective function, either directly or influence another decision variable that affects the objective function. Learn more

Let us now understand the approach to solving an optimization problem

We shall consider an example for the same

The ideal way to approach any optimization is to first understand it in totality and define it (objective, variables and constraints) in simple language before moving on to notations and equations.

You have 60 feet of fence available, and wish to enclose the largest rectangular area possible. What dimensions should you choose for the fenced-off area?

The objective is clear from the problem that you need to maximize the area. The decision variables can be inferred as Length (L) and Width (W) as the area must be rectangular.

There are no indirect variables as L and W are sufficient to control the objective.

What constraints do you have? The total length of the fence available, which constitutes the parameter (P) of the rectangle.

P = 2L + 2W ≤ 60

Additionally, the length and width cannot be negative: L≥0, W≥0

representing the optimization problem mathematically — mathematical representation

Hope now you have a better understanding on the approach to optimization. Here are a few more examples for reference

Solving an optimization problem using SciPy

SciPy optimize provides functions for minimizing (or maximizing) objective functions, possibly subject to constraints. It includes solvers for nonlinear problems (for both local and global optimization), linear programming, constrained and non-linear least squares, root finding and curve fitting.

Some methods under optimize module include:

‘minimize_scalar()’: minimizing of scalar functions of one variable
‘minimize()’: allows local (multivariate) optimization with methods such as ‘Nelder-mead’, ‘Powell’, ‘BFGS’, etc.
‘basinhopping()’: global optimization using basin hopping algorithm

[refer to the link here for detailed description]

Sample code snippet for optimizing using SciPy

sample optimization using SciPy — source: machinelearningmastery

Conclusion

Thank you for reading till the end. Hope you learnt something new :)

In this story we learnt about mathematical optimization along with various categories of optimization. We further explored how to choose an optimization algorithm.

Thereupon, we learnt about the approach to solving an optimization problem using an example.

If you found the story useful, don’t forget to show your appreciation ❤