How long should your dog leash be?

Published in

tb.lx insider

7 min readJan 28, 2020

Calculating the Discrete Fréchet Distance between curves.

Measuring trajectory similarity is an essential task in the study of moving objects. It is at the core of hurricane path prediction, the study of animal migrations, the location of your mobile phone in the next hour, and, of course, the study of vehicle behavior. With a measure of similarity between trajectories, we can also compare time series and predict their future fluctuations. Applications in map-matching, where we seek a fit between a vehicle trajectory and a known road map, are also apparent. A trajectory similarity measure is also of immediate use in one of the most commonly-used machine learning algorithms: clustering.

Here we consider the case for discrete curves or poly-lines. Loosely speaking, we define a poly-line as an ordered set of points in the target space. In this article, we address the R² plane for which we specify an appropriate point-wise distance function. This function merely measures the distance between arbitrary points in the space. For R², we use the euclidean distance.

There are many curve similarity measurements out there. Arguably, the best-known ones are Dynamic Time Warping, the Hausdorff distance, and the Fréchet distance, this article’s subject.

Walking the Dog

While explaining the concept behind the Fréchet distance, it is quite common to use the following analogy.

A man is walking a dog on a leash: the man can move on one curve, the dog on the other; both may vary their speed, but backtracking is not allowed. What is the length of the shortest leash that is sufficient for traversing both curves?¹

The discrete Fréchet distance is a restriction of the continuous case described above. A similar analogy for that would require that both the man and his dog would jump like frogs. They would do so among a set of predetermined points along their trajectories, like stepping stones. These represent the poly-line points.

Developing the Intuition

To better understand how to calculate the discrete Fréchet distance, let us build some intuition graphically. In the picture below, there are two very simple poly-lines: AB and CDE. What is the Fréchet distance between the two?

To answer this question, we must calculate all possible couplings between the two poly-lines. A coupling is an arrangement of matches between the points of both lines. One of the imposed restrictions is that the ends of both poly-lines must always match. In this particular case, A must always match C. B must always match E. All other arrangements are possible as long as you don’t go back. For this simple example, there are only two possible couplings:

AC, BD, BE
AC, AD, BE

We can now measure the lengths of each couple for both:

AC = 1, BD = 1, BE = 2
AC = 1, AD = √2/2, BE = 2

The Fréchet distance is the smallest of the maximum pairwise distances. In this case, both maxima are two, so the minimum of both is also 2.

Calculating the Distance

The classical implementation of the discrete Fréchet distance calculation provided by Eiter and Mannila¹ uses dynamic programming. It’s a recursive algorithm that uses at its core an array to store intermediate distance calculations. When the process ends, we can look at the array’s contents as a diagram from where we can read the final distance value. It is the so-called free-space diagram.

To illustrate how the free-space diagram works, let us build a slightly more elaborate pair of poly-line and calculate their distance:

p = [[0.0, 0.0], [1.0, 0.0], [2.0, 0.0], [3.0, 0.0], [4.0, 0.0]]
q = [[0.0, 1.0], [1.0, 1.1], [2.0, 1.2], [3.0, 1.1], [4.0, 1.0]]

The distance calculation uses a rectangular array to store intermediate values. Matrix dimensions depend on the number of points each poly-line has. For a distance measurement between two poly-lines P and Q, the matrix dimensions are p x q. The value p is the number of points in P, and q is the number of points in Q.

We calculate each entry (i, j) as the maximum of the corresponding pairwise distance, and the minimum of the calculated values at (i-1, j), (i-1, j-1), and (i, j-1). The same calculation procedure applies to these three points when they exist. Hence the recursion. Once calculated, the value we want is in the lower right-hand corner of the array.

To calculate the discrete Fréchet distance between the two poly-lines, you need to apply the above calculation to the entry (p, q) of the matrix and recursively iterate. Following this procedure, let us see how the internal calculation array is laid out in the end:

[[1.        , 1.48660687, 2.33238076, 3.19530906, 4.12310563],
 [1.41421356, 1.1       , 1.56204994, 2.28254244, 3.16227766],
 [2.23606798, 1.48660687, 1.2       , 1.48660687, 2.23606798],
 [3.16227766, 2.28254244, 1.56204994, 1.2       , 1.41421356],
 [4.12310563, 3.19530906, 2.33238076, 1.48660687, 1.2       ]]

We calculate the distance between both poly-lines using an intriguing concept. Think of the set of numbers in the matrix that are less than or equal to a value ∂. What is the required minimum ∂ that connects the upper-left corner with the lower-right? In this case, it is 1.2, and the collection we just designed defines the free-space diagram.

The final distance is the value in entry (p, q) of the calculated matrix.

Performance Issues

This simple implementation of the discrete Fréchet distance calculation has two problems. When applied to very large poly-lines, the algorithm’s performance can degrade significantly. Using the “Big-O” notation, we could say that its complexity is in the order of O(nm). For each additional point in the first poly-line, the algorithm complexity grows by the number of points in the second poly-line. Not only does this add up to the required array memory, but it also adds pressure to the call stack. The recursive nature of this algorithm makes this a big concern.

Fortunately, while the distance array grows quadratically, the call stack requirements grow linearly. The most exacting call occurs when the algorithm traverses from the lower-right corner of the distance array to its upper-left one. Moreover, its depth is the sum of points in both poly-lines minus one. You can follow the execution of the recursive algorithm in the accompanying GitHub repository. The required file is

recursive-vs-linear.ipynb

Following the implementation of this recursive code offers an exciting insight. As you print the return order of the values, you can see that they traverse the matrix in an orderly and sequential fashion. Here is a sample from the above file (reformatted for legibility):

(0 0), (0 1), (0 2), (0 3), (0 4), (0 5), (0 6)
(1 0), (1 1), (1 2), (1 3), (1 4), (1 5), (1 6)
...
(7 0), (7 1), (7 2), (7 3), (7 4), (7 5), (7 6)

The code is merely calculating the array in a row-wise order. It works on the first row, from left to right, and then proceeds to the next. If this is so, then we can make the algorithm work linearly, instead of recursively, eliminating the use of the call stack. Refactoring the code to use two nested loops that traverse the array column-first, we get the same result.

This insight is applied to the non-recursive function aptly named:

linear_frechet

The above function uses two nested loops to traverse the matrix, forgoing recursion entirely. Not only do we get better stack usage, but we also get better performance as you can test for yourself by uncommenting the timing instructions.

Can We Do Better?

Converting the recursive algorithm into a linear one was only part of the solution. When calculating distances between very long poly-lines, the algorithm will need to create big matrices to store the distance data. As a consequence of a larger array, we will have more memory consumption and values to calculate.

There are two approaches to solving this issue. The first solution reduces the amount of data and calculations by simplifying the poly-lines by using algorithms such as Ramer-Douglas-Peucker. The purpose of these algorithms is to create a more straightforward representation of the poly-line by removing points that do not add valuable information.

A second approach to solving this problem involves calculating only the values required for drawing the free-space diagram. Most free-space diagrams have their lowest distances running along the main diagonal, with larger values spreading in the orthogonal directions. If somehow it is possible to infer that beyond specific indices, the calculation is useless, then we can do away with it.

But this will be the theme for another article.

Acknowledgments

This article was revised by my dear colleagues Carlos Azevedo and Vasco Fernandes, to whom I am deeply indebted. All errors are my responsibility.

References

[1] Eiter, T. and Mannila, H. Computing the Discrete Fréchet Distance. Christian Doppler Labor für Expertensyteme, 1994

[2] Thomas Devogele, Maxence Esnault, Laurent Etienne. Distance discrète de Fréchet optimisée. Spatial Analysis and Geomatics (SAGEO), Nov. 2016, Nice, France.

[3] GitHub repository

João Paulo Figueira works as a Data Scientist for tb.lx in Lisbon, Portugal.