Mapping Google Location History

John Bencina
Data Insights
Published in
6 min readJul 8, 2017

Over the last few months, I’ve been noticing more and more notifications from Google Maps asking me to rate locations I’ve been to in the past. It got me wondering how much could you actually infer from my location history? I’ve been opted into Google Location History for some time now, so I was able to download my information from Google Takeout. Not sure to what extent this works if you did not opt-in to location history.

Wrangling data

With Takeout, I was able to download my history from November 2012 to present. One problem I had initially was that the download was a JSON file just over 200MB. I had wanted to just read it into Pandas, but Python was not having it on my computer. So next I thought about putting it into BigQuery, but BQ only takes new-line delimited JSON files. Unfortunately the output is not new line delimited. Why isn’t Google Takeout integrated with BigQuery?

I thought about streaming the file line by line, but that would require work to reassemble each JSON object. A little bit of research actually brought me to the ijson Python library which streams one JSON object at a time from a large file. Using the following code, I was able to read the large file, and convert it to a new-line delimited JSON file.

def convert_data():
with open('location_history.json') as json_data:
d = ijson.items(json_data, 'locations.item')
with open('location_fixed.json', 'w') as json_out:
for t in d:
clean = {
'accuracy': t.get('accuracy'),
'latitudeE7': t.get('latitudeE7'),
'longitudeE7': t.get('longitudeE7'),
'timestampMs': t.get('timestampMs')
}
out = json.dumps(clean)
json_out.write(out + '\n')

I was ready to upload to BigQuery, but thought I’d give Pandas another shot with this format. Turns out it can read this format reasonably fast, and I could do everything from my laptop.

def read_data():
data = pd.read_json('location_fixed.json', lines=True)
data['latitudeE7'] = data['latitudeE7']/10000000
data['longitudeE7'] = data['longitudeE7']/10000000
return data

Selecting the right data

I chose to user ESRI ArcGIS to do the mapping. You could try plotting in Python. But I was having some performance issues. Also, the WGS84 coordinates are geographic coordinates, not projected coordinates (nitpick). So to get this data into ArcGIS, I needed a CSV file of just X, Y coordinate pairs. I noticed some points had very large (bad) accuracy values and 683 happened to capture most of the valid points. The bounding box coordinates just filters down for the area you’re interested in mapping. Enter the coordinates for the lower left and upper right corner respectively.

def clean_data(data):
data = data.loc[data['accuracy']<=683]

bounding_box = [x1, y1, x2, y2]
data = data[
((data['longitudeE7'] >= bounding_box[0]) & (data['longitudeE7'] <= bounding_box[2]))
& ((data['latitudeE7'] >= bounding_box[1]) & (data['latitudeE7'] <= bounding_box[3]))
]
return data
data_clean = clean_data(data)
data_clean[['longitudeE7','latitudeE7']].to_csv('xy_pairs.csv', index=False)

Mapping in ESRI ArcMap

So in ArcMap we basically need to do a couple of things. First, convert the points from geographic coordinates to projected coordinates. Second, run a kernel density estimate on the points. Third, unscientifically tinker with the color ramp and clipping to create a pretty heatmap. Finally, export the image to Photoshop for a few last tweaks.

So to add data, we’ll first want to Add XY Data from the File menu and choose the CSV. Under coordinate system, we should define it as WGS84.

Next we’ll do our projection. The map I originally did was for the Long Island area which is at the southern most point of NY state. The state plane systems are great for mapping specific regions. What they do is map a very localized area with as minimal distortion as possible. This allows for the accurate computing of area, distance, etc.

The projection will be saved to a new layer which we’ll run our KDE against. The KDE will ask us for two numeric parameters. First is the cell size which is the resolution of the heatmap. The units here depend on the projection. The state plane system I chose has a base units in feet. So 20 will give us 20x20 foot boxes as our map resolution. For each 20x20 foot box, I am using a 1,000 foot radius to search for nearby points.

The first number gives you “sharper” maps at the cost of longer computation. The second number gives you smoother heatmaps because you are taking more data into consideration. These numbers were a bit of trial and error.

After the KDE runs, you’ll see a blank, blue layer and think everything failed. The issue is that default color method is a discrete scheme. By opening the layer and switching to Stretched, you’ll start to see the points. I changed the background value (0) to no color which removes places you haven’t been to. Under type, I changed it to Standard Deviations of n=0.25 and a Gamma Stretch of 3. What happens is that you get so many location points for places like home or work that makes the KDE spike and totally drown out other places & driving. Again, I kept unscientifically playing with these numbers til I got a result that looked acceptable. These small values will give you road-level heatmaps. If you upped the radius to a mile or two, you would get bigger blobs.

One other issue is that the color ramp typically goes from white to a color. If you add a base map, like Light Gray Canvas, your points will have white borders. What I did here was change Color 1 to something close to the average basemap color. This makes the blending much cleaner. I also upped the transparency a bit on the basemap to give it a more faded look.

After some tinkering you should be all done! What I did next was export the map as a 300dpi jpeg file and opened it in Photoshop. There I played around with the curves setting to adjust the contrast to something that I thought looked nicer. And there you have it! You should have some really dark points for home/work/school. Faster driving spaces out the points so they appear more faded unless you really drive over the same road many times. Also, areas with low reception have lower location coverage / density.

One interesting thing you can do is export the raster heatmap to KML from ArcGIS and open it in Google Earth. The overlay lets you fly around and see exactly where you’ve been.

--

--