Detecting knee- / elbow points in a graph of a function

Making use of the “Kneedle” algorithm implemented in the Python package “kneed”

Daniel Kleine
Towards Data Science

--

Photo by Lucaxx Freire on Unsplash [1]

Theory

When working with data, it is sometimes important to know where a data point’s “relative costs to increase some tunable parameter is no longer worth the corresponding performance benefit” (Satopää, Albrecht, Irwin, and Raghavan, 2011, [2], p.1 ). The algorithm “Kneedle” detects those beneficial data points showing the best balance inherent tradeoffs — called “knees” (curves that have negative concavity) or sometimes “elbows” (curves that have positive concavity) — in discrete data sets based on the mathematical definition of curvature for continuous functions. With this article, I want to summarize the steps “Kneedle” goes through, demonstrate the benefits of this algorithm and show applications with the Python package “kneed”.

The “Kneedle” algorithm has been published by Satopää, Albrecht, Irwin, and Raghavan (2011, [2]) using the concept of curvature as a mathematical measure of much a function differs from a straight line. Satopää et al. (2011, [2], p. 2) conclude that “as a result, the maximum curvature captures the leveling off effects operators use to identify knees”.

For continuous functions, the curvature is described as shown here:

Let’s create for instance N = 10,000 random standard normal distributed data points and display them in a histogram:

Random standard normal distributed data plotted in a histogram with Kernel Density Estimation (Image by author)

Now we can sort the values in an ascending order to get a cumulative distribution function (CDF) showing each x-value on the x-axis and its probability of occurrence on the y-axis:

Sorted data displayed as CDF with maximum curvature indicated at x = 1.16 (Image by author, inspired by Figure 1 in Satopää et al., 2011 [2])

Satopää et al. (2011, [2], p. 2) point out that “curvature is well-defined for continuous functions”, but there is no clear definition of curvature for discrete data. In this instance, the curvature can be fitted to a continuous function — but when the data is noisy, this fitting can become even more difficult. This is where the “Kneedle” algorithm comes into play.

Let’s take a deeper look how this algorithm works. For this, I have used the original discrete data from Figure 2 in the conference paper published by Satopää et alia (2011, [2]). It’s provided by the Python package “kneed”:

import kneed
kneed.DataGenerator.figure2()

This is the raw data being plotted:

Raw data (Image by author)

Let’s get through the steps the “Kneedle” algorithm follows:

1. Smoothing

Using a spline to “smooth” the shape of the raw data.

Smoothed data (Image by author)

2. Normalizing to the unit square

Range of x- and y-values will be normalized to 0 to 1 (see axes).

Normalized data (Image by author)

3. Calculating the difference curve changes

Calculating perpendicular distances (Euclidean distances from the data points to diagonal given line from first to last data point)…

Perpendicular lines indicating perpendicular distances (Image by author, inspired by Figure 2a in Satopää et al., 2011 [2])

… and rotating them by 45 degrees counter clockwise. The magnitude of these rotated perpendicular lines indicates the difference values.

Perpendicular lines rotated by 45 degrees representing difference values (Image by author, inspired by Figure 2b in Satopää et al., 2011 [2])

4. Calculating the local maxima of the difference curve

Finding the “knee” points at the local maxima of the difference curve in the normalized curve (for detecting elbow points, the graph will be inverted).

Maximum curvature at x = 0.22 (Image by author, inspired by Figure 2c in Satopää et al., 2011 [2])

5. Calculating threshold for each local maximum in the difference curve

For each local maximum in the difference curve, there is a unique threshold value that is based on the average difference between the x-values and a sensitivity parameter (please see Satopää et al., 2011 [2], p. 4 for further reference how this threshold is computed in detail). This sensitivity parameter measures how many “flat” points are expected to be seen in the original unmodified data before declaring a “knee” and can be adjusted: A small sensitivity value detects knees quicker, whereas a larger one is more conservative.

Threshold indicated at y = 0.43 with a sensitivity of 1 (Image by author, inspired by Figure 2c in Satopää et al., 2011 [2])

6. Each difference value is compared with threshold

If a difference value drops below the threshold before the local maximum is reached, the algorithm is declaring a “knee”. Conversely, the threshold value is reset to zero.

kneed

Now let’s take a deeper look into the Python package “kneed” written by Kevin Arvai that uses the “Kneedle" algorithm:

With this, we can for instance directly plot the normalized difference curve including the “knee” point:

Figure 2 from Satopää et al., 2011 [2] plotted with “kneed” (Image by author)

We can use the function KneedLocator() in order to find a knee/elbow in our data:

# calculate and show knee/elbow
kneedle = kneed.KneeLocator(…)
knee_point = kneedle.knee #elbow_point = kneedle.elbow
print('Knee: ', knee_point) #print('Elbow: ', elbow_point)
kneedle.plot_knee()

These are the most important parameters for KneedLocator():

  • curve: ”concave” for a knee, “convex” for an elbow — based on the negative/positive concavity of the function
  • direction: “increasing” for a positive slope, “decreasing” for a negative slope of the function
  • online: Knee/elbow as first element detected (False) or correcting “old” knee/elbow values if necessary if points are received (True)
  • S: Sensitivity for knee/elbow detection (S=0 or bigger); Satopää et alia [2] state that “kneedle” has perfect information in offline setting when sensitivity is 0 whereas in online settings, overall a sensitivity of 1 shows the best overall performance, but it can vary from the data points received.

Applications

Clustering

Detecting elbows with the “kneedle” algorithm becomes super handy for clustering algorithms:

a) K-Means

With the elbow method, you can calculate the within-cluster sum of squares as a measure of the within-cluster variance and detect the best value for the number of clusters to form (k):

Elbow Method (Image by author)
Elbow detection with Elbow Method with “kneed” (Image by author)

b) DBSCAN

When calculating the distance metric (usually Euclidean distance) for each data point and sorting them in an ascending order, they can be plotted in a k-distances graph to a find a threshold value for defining a cluster:

k-distances graph (Image by author)
Knee detection in k-distances graph with “kneed” (Image by author)

Classification

Even for classification, “Kneedle” becomes extremely useful when looking at the True Positive Rate as well as the False Positive Rate. Both their trade-offs are represented in a receiver operating characteristic curve . The “knee” in a ROC curve can provide a good measure for a threshold value as the TPR’s increase slows down and the FPR starts increasing faster at this point:

Receiver operating characteristic curve (Image by author)
Knee detection for receiver operating characteristic curve with “kneed” (Image by author)

Final remarks

“kneed” also provides an interactive API called “ikneed” on Streamlit, where you can test (small) data sets for “knees” respectively “elbows”. There you can also play around with the parameters (please see link to see their impacts directly):

https://share.streamlit.io/arvkevi/ikneed/main/ikneed.py

In my opinion, “kneed” is currently the most powerful Python package for knee detection as it provides a lot of useful functions. Nevertheless, there are other similar Python packages provided for finding knees of a curve, such as “kneebow”:

References

[1] Photo by Lucaxx Freire on Unsplash

[2] V. Satopää, J. Albrecht, D. Irwin and B. Raghavan, “Finding a “Kneedle” in a Haystack: Detecting Knee Points in System Behavior” (2011), 31st International Conference on Distributed Computing Systems Workshops, 2011, pp. 166–171, doi: 10.1109/ICDCSW.2011.20.

--

--

Work and Organizational Psychologist specialized in the field of Machine Learning