Photo by Jamie Street on Unsplash

The best way to locate, in MySQL 8

MySQL 8 has brought a lot of awesome new features, as I’m sure you all know. I will not go into detail about all of these here, there are many others who can do and have done this much better than I can. (Gabriela D'Ávila Ferrara made a nice roundup that I can most certainly recommend!)

Instead I will focus on geolocation, which is one of my favorite improvements in MySQL 8 as I’ve had to deal with this complexity far too many times:

SELECT ( 6371 * acos( cos( radians(some_latitude) ) * cos( radians( lat ) ) * cos( radians( lng ) - radians(some_longitude) ) + sin( radians(some_latitude) ) * sin(radians(lat)) ) ) AS distance  FROM     locations  HAVING     distance < 25  ORDER BY     distance;

Law of Haversine

If you don’t recognize this, consider yourself lucky. But those who have had a need to calculate distances before will recognize this as the Haversine formula.

The Haversine formula determines the great-circle distance between two points on a sphere given their longitudes and latitudes. Important in navigation, it is a special case of a more general formula in spherical trigonometry, the Law of Haversine, that relates the sides and angles of spherical triangles.

If math is neither your hobby nor your job, likely you will have trouble understanding how this works exactly. Using this formula, debugging incorrect distance calculation will be hell.

Spatial data types

To understand the changes related to geolocation, you first need to know about spatial data types. Let’s look at the official explanation according to Microsoft:

Spatial data represents information about the physical location and shape of geometric objects. These objects can be point locations or more complex objects such as countries, roads, or lakes.

In other terms it’s a representation of geometric shapes within a spatial reference system. MySQL 8 supports multiple spatial types but we will concentrate on the Point type as that’s the most important type for geolocation.

CREATE TABLE locations (
location POINT NOT NULL
);

A Point is a collection of two floating point values. One stands for X and one for Y. Within a spherical reference they are called Latitude and Longitude.

Hey that sounds familiar!

So you have been paying attention? Very good. Yes indeed we are now speaking a type of date we more commonly refer to as a GPS coordinate.

In the old days we would create two columns for this like so:

CREATE TABLE locations (
lat FLOAT( 10, 6 ) NOT NULL,
lng FLOAT( 10, 6 ) NOT NULL

);

If you’re lucky, any existing system you need to jump into will have that. But for some reason I’ve often seen people use strings for this. Oh dear…

The Point column is a definite improvement over this, but it’s still less than perfect without an SRID.

What’s a SRID and what do we need it for?

A SRID is a numerical ID that refers to a Spatial reference system. I mentioned spatial/spherical references above, but I haven’t explained what I meant with that.

Having something to represent a coordinate available natively in the database is nice and all, but we need a correct spherical reference to ensure the coordinates we calculate are correct, or transferring these to a map and calculating distances will not work.

We need a spatial system that matches the dimensions we need as a basis for all our calculations. Now where would we find that ?

Oh you mean like earth?

Again, very astute! However it’s not as simple as that. There are quite a few systems used all over the globe and we need the one that’s relevant to our needs.

This brings us to 1984. Instead of the Orwellian dystopia, that year actually brought forth quite some good things. And one of these is WGS84:

The World Geodetic System (WGS) is a standard for use in cartography, geodesy, and satellite navigation including GPS. It comprises a standard coordinate system for the Earth, a standard spheroidal reference surface (the datum or reference ellipsoid) for raw altitude data, and a gravitational equipotential surface (the geoid) that defines the nominal sea level.

Before there were web standards, there were mapping standards, and this lovely system is something I’ve personally had a lot of fun with when GPS devices became readily available and affordable for the public. And now we can use this exact system in our database! All we need is the matching SRID and link it to our column.

SRID coupling in the database

Coming back to the Point type. By default, creating one of these columns will automatically assume an SRID of 0.

By default, MySQL uses SRID 0, which represents an “infinite flat Cartesian plane with no units assigned to its axes”.
- Logan Henson Tighten.co

Doing calculations with that SRID will give vastly different results. See the example below to see the difference. In the same database I created two tables with Point columns. One has a SRID of 0 and the other has SRID 4326, which refers to the WGS84 system:

mysql> SELECT ST_distance_sphere( 
( SELECT location FROM locations_without_srsid LIMIT 1),
(SELECT location FROM locations_without_srsid LIMIT 1, 1)
)/1000 as distance_in_km;
+--------------------+
| distance_in_km |
+--------------------+
| 10.980785022158566 |
+--------------------+
1 row in set (0,00 sec)

mysql> SELECT ST_distance_sphere(
( SELECT location FROM locations_with_srsid LIMIT 1),
(SELECT location FROM locations_with_srsid LIMIT 1, 1)
)/1000 as distance_in_km;
+--------------------+
| distance_in_km |
+--------------------+
| 7.221216762563222 |
+--------------------+
1 row in set (0,00 sec)

The actual distance is indeed 7.22km as you can see here:

So imagine you have a system based on this, but didn’t set a correct SRID, what the impact would be if this was set incorrectly!

So let me show you how to set the SRID correctly when creating the column:

CREATE TABLE locations (
location POINT SRID 4326 NOT NULL
);

If you are using Laravel, like we do at Maatwebsite, then as of Laravel 5.6, it’s really easy to do this in a migration as well:

Schema::create('locations', function (Blueprint $table) {
$table->point('location', 4326);
$table->spatialIndex('location');
});

A sharp eye will have noticed the spatialIndex method called above. 
Yes indeed, spatial columns can be indexed!

Location based searches in MySQL in the past

Now we’ve seen how we can create the needed columns and we know what an SRID is. Let’s quickly look at some examples on how to use this to calculate some distances. I’ve secretly already shown both ways before, but we’ll look at them again in more detail in this section and the next.

First we have the Haversine method

SELECT 
(
6371 * acos(
cos(
radians(some_latitude)
) * cos(
radians(lat)
) * cos(
radians(lng) - radians(some_longitude)
) + sin(
radians(some_latitude)
) * sin(
radians(lat)
)
)
) AS distance
FROM
locations;

Even formatting this query doesn’t make it more readable. But let’s take it one at a time.

Coordinates conversion

We have two coordinates here we’re trying to calculate the distance between. There is “lat,lng” and there is “some_lat,some_lng”. In practice the latter would be entered in your query as the search parameters, where the former represents the value already in the database.

We use radians to convert all coordinates in Radians format. If you’ve had a calculator in school that could switch between RAD and DEG, you know how important it is to use the right one! Below we assume the values have already been converted.

Calculating with SOHCAHTOA

SELECT 
(
6371 * acos(
cos(some_lat) * cos(lat) *
cos(lng - some_lng) +
sin(some_lat) *
sin(lat)
)
) AS distance
FROM
locations;

So the next step is to do some cosine and sine calculations on our converted values. You see that it’s not simply a matter of X-Y=D. And quite frankly, I assume most will be quite lost at this point. Most people I know who’ve used this function just copied it from somewhere and assume it’s correct.

The exact inner workings of the formula itself are not important for the comparison I’m making here. However if you are interested in how this works, there is a wikipedia article about it and its mathematical background.

Correcting for your respective metric

That brings us to the last step, multiplying the results of the formula by 6371 is used to make sure the results are in meters. (Multiplying by 3959 gets you miles).

As you can see, there are quite a few steps involved, and we haven’t even delved into the full depth of the formula. For someone without a Maths background, this is nothing more than a black box. Luckily, there is a better way now!

The better way

With MySQL 8, it all changed into this:

SELECT 
ST_distance_sphere(
location_a,
location_b
) as distance
FROM
locations;

Well that’s hard to misunderstand, isn’t it? Do you want to calculate the distance between two points on a sphere? Use ST_distance_sphere. Easy as that.

But to make it a bit harder, let’s not assume locations that are already in a database, but locations entered in the query directly. That’s a use case you’ll encounter quite often.

SELECT 
ST_distance_sphere(
point(lat, lng),
point(some_lat, some_lng)
) as distance;
But how does it know what SRID to use?

That’s a good question! And the answer is, it doesn’t! Or rather it will assume SRID 0.

SELECT 
ST_distance_sphere(
ST_GeomFromText('POINT(lat lng)', 4326),
ST_GeomFromText('POINT(some_lat, some_lng)', 4326)
) as distance;

There are other ways to get the same results, but this is the easiest I think. 
ST_GeomFromText converts a textual representation of a geometry into the matching geometry format for the database. It can receive an extra parameter that then sets the SRID.

And there you have it, distance calculation with MySQL 8.

Things to consider

While MySQL implements OpenGis compatible formats like PostgreSQL PostGIS for example. It’s the ordering of arguments that does not match. MySQL uses POINT(lat lng) but many others, like PostgreSQL PostGIS use POINT(lng lat).

You’ll not get an error when entering it the wrong way, but the location will then of course be on the other side of the planet.

This is especially clear when fetching the separate lat and lng from a point column.

SELECT 
ST_X(location) as lat,
ST_Y(location) as lng
FROM
locations;

Intuitively this would be how you’d expect to get those respective values. Assuming POINT(lat lng) is the same as POINT(X Y).

In practice however, ST_Y gives you the latitude where ST_X gives you the longitude. Which is, of course, correct, as latitude is the Y axis, where as longitude is the X axis.

SELECT 
ST_X(location) as lng,
ST_Y(location) as lat
FROM
locations;

So the above is the correct way to fetch the respective values.

Conclusion

So working with geolocations in MySQL has improved greatly in version 8. Apart from it being a lot simpler to understand, it’s also index-able and a whole lot faster. I plan to do a comprehensive benchmark in the near future to demonstrate the speed difference.

Other services have jumped in, in the past, to enable fast and easy geo calculations and they are still very viable (Especially combined with a NoSql solution for fast and fuzzy searching). PostgreSQL PostGIS offers much of the same and has been for a while. But from the get-go MySQL 8 has everything you need, and it’s damn fast!

It even offers NoSql, but that’s a topic for a different day…

Sources


Written by Jochem Fuchs, Software engineer at Maatwebsite

Do you need help working with geolocation in your app or do you need help building a Laravel application? At Maatwebsite, we are there to help you on a commercial basis. Contact us via info@maatwebsite.nl or via phone +31 (0)10 744 9312 to discuss the possibilities.