There is something rotten in the state of geohash

or being confused by geohash on Node.js and Redis

Kyle
3 min readMar 30, 2016

This is part 21 of my Node / Redis series. The previous part was Redis Static Charge.

I’m working on an upcoming tutorial about the new GEO features for Redis and how to use them in Node. The Medium article you are reading now won’t go into detail on how to actually use GEO with Redis— I suggest you read up on them in the official documentation until my tutorial goes up, then use that (you read everything I write, correct?). Here we’re going to talk about something a little frustrating — tiny differences.

Let’s take a known location — I’m going to use Elisinore, Denmark(now Helsingør), the setting for Hamlet. I’m getting my coordinates from geohack. First, put that location into Redis:

> GEOADD castles 12.608333 56.036111 helsingør
(integer) 1
> GEOPOS castles helsingør
1) 1) "12.608332335948944"
2) "56.036111033657463"
> GEOHASH castles helsingør
1) "u3byycj4520"
> ZSCORE castles helsingør
"3686712747426526"

You might notice that the coordinates that you input are slightly different than the coordinates that you get out. As far as I can tell, this is because the underlying algorithm Redis uses is actually grid based, so you’re getting a slight approximation — those coordinates are inside a bounding box and the output from GEOPOS is actually the center of the bounding box.

Next up is the “geohash” value. A geohash is an alphanumeric representation of coordinates. The neat thing about the geohash string is that you can remove characters from the right side of the value and it still works, just gets less accurate. Now, If you take the Redis GEOHASH return value and put it into the geohash.org website, you get the coordinates:

12.60832
56.03611

Humm, different again on the longitude and the latitude is just less precise. Okay.

The inverse is also possible, if you take the original coordinates and put them into the geohash website you get the geohash string value of:

u3byycj452nq

So, this value varies as well — but no big deal- it’s just slightly less accurate. Zooming all the way in with Google Maps and switching to satellite (it’s a lovely car park) on geohash.org it looks to be a smaller distance than the front and back seat of a family sedan. I can live with this variance.

Doing some more research, I found that the geohash algorithm has been implemented several times in Javascript. I wonder if any match precisely with Redis? I built this small script to compare the output of three different modules:

In this script, the pre-defined values are what Redis spits out from the original value. Latlon-geohash had the ability to pass in a 52-bit integer (which is how Redis stores geo values) — so I tried that too.

The first set takes in a geohash value and outputs the longitude / latitude value. The second takes in the longitude / latitude and spits out the hash and the last one is seeing how the alphanumeric hash is converted into the integer hash representation.

Here are the results:

redis     56.03611103365746 12.608332335948944
ngeohash 56.03611059486866 12.608324959874153
ngeohash 59.29668501019478 12.608332335948944 (int)
latlon 56.0361106 12.608325
geohasher 56.03611059486866 12.608324959874153
redis u3byycj4520
ngeohash u3byycj45
latlon u3byycj452jy
geohasher u3byycj452jy
redis 3686712747426526
ngeohash 3673876150239882

Notice something? Almost nothing matches anything else! Looking at these, most are in-the-ballpark, save for the ngeohash integer functions — those are just wrong.

The implication here is that the alphanumeric hash strings can’t be fully trusted to exactly match. Depending on the implementation two people standing on the very exact same spot using different implementations may get different geohash results. Being that this is all mathematical, I’m guessing there is subtle differences in how the Node.js, geohash.org, and the Redis implementations round floating point numbers or how they are averaging the bounding box into a final coordinate set or hash.

As far as what implementation is “correct,” I don’t know. I trust Antirez more than random authors of npm packages. So my gut feeling is that Redis (or perhaps geohash.org) is what I’d use as a canonical implementation. If I were writing something that had to seamlessly use two different implementations, I might consider doing a 7–9 character version of the geohash string — at this precision it seems stable among all the implementations.

--

--

Kyle

Developer of things. Node.js + all the frontend jazz. Also, not from Stockholm, don’t do UX. Long story.