CODEX
GPS Trip Matching and Strava API
Problem Solving
Many endurance sports require the participant to follow a pre-set course and achieve specific control points by a specific time. Confirmation of a participants adherence to the pre-set course and control guidelines has been accomplished through the use of paper systems, photographs and manned control points.
The use of a participants GPS file, generated from the participants bike computer, sport watch or smartphone is gaining adoption. However that requires the event coordinator to manually gather, compare and manage the participants GPS file for verification purposes. This can be a complicated, and time consuming activity for event coordinators who have limited technical expertise.
So we could automate this task…
Scenario
We have an event organizer who will input the race track in the form of GPS coordinates. We assume the GPS coordinates form a non closing path. There is a starting point and an ending point.
A typical GPS point object may look like this:
Measurement Latest Values Units
------------- ---------------------------------- -------
Latitude 42.30 degrees
Longitude -71.35 degrees
Altitude 40.80 m
We would have an array of these, where order matters:
const arrayGPSPoints = [startingPoint, ..., endingPoint]
Task #1
So given the race track, we will also have the GPS data of the racer. Their GPS object may look like this:
Measurement Latest Values Units
------------- ---------------------------------- -------
Acceleration 12.38 4.80 2.66 m/s^2
Latitude 42.30 degrees
Longitude -71.35 degrees
Altitude 40.80 m
Speed 2.00 m/s
Timestamp 1611634411 s
Our task is to form a path with the race track, then find the nearest point on the track from the current iteration of the racer’s GPS object.

As you can see once we know which two points the current iteration of the racer’s GPS object is, its a simple matter of finding the perpendicular line joining the two closest race track points.

Subtask #1A
We have the current GPS point and an array of coordinates. So we have to find the 2 closest coordinates.
To do this we have to find the distance between the current GPS point and each individual track coordinate. This is simple enough, with this handy function. The only thing is this formula assumes, perfectly round globe, I’m sorry Mark Sargeant. Jokes aside, we will do a more sophisticated altitude adjustment later on.
function degreesToRadians(degrees) {
return degrees * Math.PI / 180;
}--------------------------------------------------------------------function distanceInKmBetweenEarthCoordinates(lat1, lon1, lat2, lon2) {
var earthRadiusKm = 6371;var dLat = degreesToRadians(lat2-lat1);
var dLon = degreesToRadians(lon2-lon1);lat1 = degreesToRadians(lat1);
lat2 = degreesToRadians(lat2);var a = Math.sin(dLat/2) * Math.sin(dLat/2) +
Math.sin(dLon/2) * Math.sin(dLon/2) * Math.cos(lat1) * Math.cos(lat2);
var c = 2 * Math.atan2(Math.sqrt(a), Math.sqrt(1-a));
return earthRadiusKm * c;
}
Then sort by smallest to largest. And take the top 2 coordinates.
Can we apply heuristics to reduce computation?
Well, from the first point to the last, as we calculate the distances. We can imagine something like the following result:
- Really big distance
- Big distance
- Medium distance
- Close distance
- Really close distance
- Really close distance
- Close distance
- Medium distance
- …
Basically after finding a minimum, the distance does seem to get bigger and bigger as we go through the array. So why should we check the entire array, because the race track could be thousands of points, shouldn’t we do an early stopping strategy and be pretty confident that it’s the subsection.
The problem with that is that the race track could be loopy and winding. And actually come even closer later on.

Furthermore, it could be the case that the closest 2 points are actually not the points we’re looking for.
Because chronologically speaking, green point just came after another green point from the other direction.

So in essence, even though the purple points are the two shortest distance points, what we are actually looking for is the green points.
To get the right closest two points, we introduce a biasing parameter into our distance calculation. Graphically this would be:

Since we don’t know exactly how much weight we should give the biasing parameter. I think, the best approach would be to pretty much let the biasing parameter decide which pair gets chosen from the lets say top 3 pairs. Because I’m assuming at most it will be surrounded on 3 sides.
At this point, we have the right pair of coordinates.
Subtask #1B
So given 2 points, we can create a line joining them.
With a line and a point, we can create another line joining those. This is our end goal, in the sense that we need to check if we should consider this GPS point acceptable.
That’s the basic math. But of course, we have to consider tolerances and sampling rates. Further more, we have so far assumed that the path joining two coordinates on the race track is a line. Which of course is not true, in an ideal world, we’d have smoothed curves. But for short distance, it’s a good compromise to significantly speed up calculation, I’m not a mathematician but we talking polynomial time savings, ya bish.
- Sampling rate of the GPS maybe 5 to 10 seconds on the low end and every second on the high. As the cyclist gets slower, the dots get closer and assuming a line between them is not an issue. However on the high end, a cyclist can go about 10 m/s. With a slow GPS that samples every 10 seconds. Our cyclist could be 100m away.
- Given that a city block can be about 100m by 300m. Our cyclist could be at the next intersection. So the tolerance should be around 100m and maybe the coordinates could be 100m apart at the complicated parts and if it’s a straight line, no limitation. Meaning if the race track is from point A to point B and no other midway check points, it would be a straight line on the surface of the earth. And a typical race of hundreds of kilometers, we might have to use the haversine formula.
- The next consideration is the accuracy of GPSs themselves. We can reasonably expect about 5m accuracy under normal conditions and maybe 30m under really bad conditions.
With all this information, and with consideration for all of the worst conditions and most complicated paths. We can start to build our solution.
Haversine
formula:a = sin²(Δφ/2) + cos φ1 ⋅ cos φ2 ⋅ sin²(Δλ/2)
c = 2 ⋅ atan2( √a, √(1−a) )
d = R ⋅ cwhereφ is latitude, λ is longitude, R is earth’s radius (mean radius = 6,371km);
note that angles need to be in radians to pass to trig functions!JavaScript:const R = 6371e3; // metres
const φ1 = lat1 * Math.PI/180; // φ, λ in radians
const φ2 = lat2 * Math.PI/180;
const Δφ = (lat2-lat1) * Math.PI/180;
const Δλ = (lon2-lon1) * Math.PI/180;
const a = Math.sin(Δφ/2) * Math.sin(Δφ/2) +
Math.cos(φ1) * Math.cos(φ2) *
Math.sin(Δλ/2) * Math.sin(Δλ/2);
const c = 2 * Math.atan2(Math.sqrt(a), Math.sqrt(1-a));
const d = R * c; // in metres
The haversine distance is curved, which is more accurate but eventually we do have to find the distance from the point to the line, using this formula:

where Ax + By + C = 0 is the line equation and (m, n) is the point. This stuff assumes cartesian plane. So we will have to convert our longitude and latitude (λ, φ) into (x, y).
You can simply use the horizontal axis x to denote longitude λ, the vertical axis y to denote latitude φ. The ratio between these should not be 1:1, though. Instead you should use cos(φ0) as the aspect ratio, where φ0 denotes a latitude close to the center of your map. And r is the radius of the Earth (6,371km).
- x = r λ cos(φ0)
- y = r φ
Now we need a φ0, which is really just a latitude of the centroid between the 3 data points in the iteration.

So now basically you would convert the yellow points into x and y coordinates. Then turn it into a line equation, and write in Ax + By + C = 0 form.
Then turn the green point latitude and longitude into x and y. Plug it into:

And see if the distance is less than than our tolerance, which I think should be 50m to 100m.
All that’s left is writing the code to loop through every thing.
Strava API
The strava API has an object called segment which looks like this:
{
"id": 229781,
"resource_state": 3,
"name": "Hawk Hill",
"activity_type": "Ride",
"distance": 2684.82,
"average_grade": 5.7,
"maximum_grade": 14.2,
"elevation_high": 245.3,
"elevation_low": 92.4,
"start_latlng": [
37.8331119,
-122.4834356
],
"end_latlng": [
37.8280722,
-122.4981393
],
"climb_category": 1,
"city": "San Francisco",
"state": "CA",
"country": "United States",
"private": false,
"hazardous": false,
"starred": false,
"created_at": "2009-09-21T20:29:41Z",
"updated_at": "2018-02-15T09:04:18Z",
"total_elevation_gain": 155.733,
"map": {
"id": "s229781",
"polyline": "}g|eFnpqjVl@En@Md@HbAd@d@^h@Xx@VbARjBDh@OPQf@w@d@k@XKXDFPH\\EbGT`AV`@v@|@NTNb@?XOb@cAxAWLuE@eAFMBoAv@eBt@q@b@}@tAeAt@i@dAC`AFZj@dB?~@[h@MbAVn@b@b@\\d@Eh@Qb@_@d@eB|@c@h@WfBK|AMpA?VF\\\\t@f@t@h@j@|@b@hCb@b@XTd@Bl@GtA?jAL`ALp@Tr@RXd@Rx@Pn@^Zh@Tx@Zf@`@FTCzDy@f@Yx@m@n@Op@VJr@",
"resource_state": 3
},
"effort_count": 309974,
"athlete_count": 30623,
"star_count": 2428,
"athlete_segment_stats": {
"pr_elapsed_time": 553,
"pr_date": "1993-04-03",
"effort_count": 2
}
}
Specifically the polyline element which is an encoded polyline that in this example is:
“}g|eFnpqjVl@En@Md@HbAd@d@^h@Xx@VbARjBDh@OPQf@w@d@k@XKXDFPH\\EbGT`AV`@v@|@NTNb@?XOb@cAxAWLuE@eAFMBoAv@eBt@q@b@}@tAeAt@i@dAC`AFZj@dB?~@[h@MbAVn@b@b@\\d@Eh@Qb@_@d@eB|@c@h@WfBK|AMpA?VF\\\\t@f@t@h@j@|@b@hCb@b@XTd@Bl@GtA?jAL`ALp@Tr@RXd@Rx@Pn@^Zh@Tx@Zf@`@FTCzDy@f@Yx@m@n@Op@VJr@”
Which looks like:

from:
Which we can decode into an array of coordinates. Therefore making this whole problem solvable.
One Additional Consideration
There could be a situation where the race takes place over multiple days. And there could be multiple segments. But that’s still very easy to check. From several segments, we can check for partial matching on the race track. Obviously with way more lenient tolerances.