Tutorial for Cleaning Twitter JSON Data

Angela Li
Data Mining the City
2 min readOct 12, 2017

Hey guys, if you’ve downloaded your twitter JSON data and tried to ‘traverse’ it following the tutorial, you might see an error like this:

Don’t be panic! You did nothing wrong. Blame twitter for it.

You saw this error because the JSON file you exported from twitter is actually ‘fake’, which means it’s not organized in a way as a normal JSON file.

The Best Part: To solve the problem and traverse it successfully, just follow the script below:

import jsondef setup():
tweetsWithGeoData = []
with open('json_file_name') as file:
#put your own json file name
#put the json file in the data folder under this sketch folder
for row in file:
data = json.loads(row)
geo = data['geo']
#we put geo here because that's the attribute name for coordinates
#if you want to extract something else, open the json file and find the attribute you need
if geo is not None:
print geo
tweetsWithGeoData.append(data)

print(len(tweetsWithGeoData))
#now you should have everything

Above script credits to William!

Bonus:

The coordinates we just printed have the latitude and longitude in one list. To add markers in slippermap, you need to have latitude and longitude separately.

Follow this script to make that happen:

import json
import spatialpixel.mapping.slippymapper as slippymapper
import spatialpixel.data.geojson as geojson
def setup():
tweetsWithGeoData = []
with open('json_file_name.json') as file:
#put your own json file name
#put the json file in the data folder under the sketch folder
for row in file:
data = json.loads(row)
geo = data['geo']
#we put geo here because that's the attribute name for coordinates
#if you want to extract something else, open the json file and find the attribute you need
if geo is not None:
print geo
tweetsWithGeoData.append(data)

print(len(tweetsWithGeoData))

for data in tweetsWithGeoData:
lat, lon = tuple(data['geo']['coordinates'])
#this step designates latitude and longitude coordinate respectively to 'lat' and 'lon'
print(lat, lon)
#now you should be good to go

Above script credits to Gloria and William!

If you want to load slippermap as a base map and project the coordinates in a twitter JSON on it, you can now just replace the coordinate to ‘lat’ and ‘lon’. (Refer to William’s JSON tutorial for details.)

As you read through the comments and customize the codes to your own needs, make sure you change them accordingly.

You should be able to traverse the ‘fake’ twitter JSON data and show them on slippermap now! Enjoy playing with the data!!!

Thanks William for helping us! Thanks Gloria for offering the script!

--

--