KneeFinder: simple tool to find the Knee / Elbow point.

Vincenzo Lavorini
3 min readNov 1, 2022

This post describe KneeFinder, a tool that you can use to find the knee / elbow point.

Update, 31 Mai 2023: a 2-dimensional version of this tool exist: HyperKnee finder. The new tool optimize two parameters at the same time.

If you landed here, probably you are trying to tune some parameter of an algorithm.

You also read somewhere that, in order to tune that parameter(s), the best way is through the “knee” or “elbow” method: run your algorithm with different values of the parameter you want to tune, take the outcomes, plot them and find the point where “the slope changes the most”.

A better definition (from Satopää, Albrecht, Irwin, and Raghavan, 2011, p.1) states the knee point as the point where the

“relative costs to increase [or decrease, NdC] some tunable parameter is no longer worth the corresponding performance benefit”

While the reason for which using the knee/elbow method is clear, the methodologies used to deploy it change between scopes and algorithms. What they have in common is the final part: having the plot outcome VS parameter, find the optimal value for that parameter.

This post explain how to find it in a straightforward way, with a simple and robust tool.

The example

Say we have the following data:

Figure 1: cost as function of the parameter value.

What one usually do in such situation is to take as optimal parameter p= 2 or p=3, just following their sense. The problem is that the sense of one can be different from the sense of another, and both of them can’t be automatized.

KneeFinder

Here we propose an empirical definition of the knee point:

The knee point is the point farther from the line passing by the first and the last point of the data

While this definition is theoretically weak, empirically is very strong: if your parameter can be tuned by the knee method, the data will be such that this definition coincide with the point you want to find.

Note that other and more sophisticated methods exists (like Kneed), this one is easier to use (no hyper-parameters to define) and so it is more robust and easy to insert in an automated pipeline.

Let’s develop our definition in a graphical way:

  • draw a segment passing between the first and the last point of the data
  • draw the distances between the data points and the segment
  • take the biggest of those distances
Figure 2: graphical explanation of the proposed algorithm.

Here it is: the optimal value for your parameter is two.

You can find the KneeFinder at this link. Note that it’s still in testing phase, so please fill a issue on GitHub if you find any.

Inside the algorithm

Here some technicality inside the code, for your pleasure.

In order to be efficient, instead of looping between all the data points and calculate the distances you see in Figure 2, the algorithm we wrote calculates the cross product between the vector given by the difference between the first and the last point (the orange segment of Figure 2, if you want), and the vectors which define the data points between which we have to choose. All of them at once, thanks to Numpy vectorization capabilities.

The result is an array of non-normalized lengths, and the biggest of them define our knee point.

Thank you for reading!

--

--