Merge multiple JSON files to a DATAFRAME

Parthvi Shah
2 min readApr 21, 2020

As someone who did not have the experience of dealing with JSON files before this, I have had a lot of issues collecting various JSON files and then combining them.

Just to give you a little background as to why I am merging tweets: Given the current situation as of May, 2020, I am interested in the political discourse of the US Governors with respect to the ongoing pandemic. I would like to analyse how did the two parties — Republican & Democratic Party react to the given situation, COVID-19. What were their main goals at this time? Who focused more on what? What did they care about the most?

We have collected all the tweets from various governors, public + private profiles into various json files.

I hope this makes it easier for you.

Before starting, Don’t forget to import the libraries.

import json
import numpy as np
import pandas as pd

Stepwise:

  1. Add a Path to your files.
#Path to the JSON filesdirectory_path_democrat = “/content/drive/My Drive/Dataset/tweets/democrat/”
directory_path_republican = “/content/drive/My Drive/Dataset/tweets/republican/”

2. Add all the file names to a list.

#Adding all the file names; file_path_dem = []
file_path_republican = []
for ind in range(Number_of_files_demo):
file_path_dem.append(directory_path_democrat + str(ind) + “.json”)
for ind in range(Number_of_files_republican):
file_path_republican.append(directory_path_republican + str(ind) \
+ “.json”)

3. Loading all the JSON files

#Loading all the JSON files using file namesdem_tweets_file=[]
for ind in range(Number_of_files_demo):
with open(file_path_dem[ind]) as f:
dem_tweets_file.append(json.load(f))
rep_tweets_file=[]
for ind in range(Number_of_files_republican):
with open(file_path_republican[ind]) as f:
rep_tweets_file.append(json.load(f))

Now, dem_tweets_file and rep_tweets_file has loaded all the democrat and republican tweets in their respective JSON files.

4. Convert all the JSON files to a DataFrame

#Converting the JSON files to Data Framesdem_tweets = pd.DataFrame()
for ind in range(len(dem_tweets_file)):
dem_tweets=dem_tweets.append(pd.json_normalize \
(dem_tweets_file[ind]))
dem_tweets['Party'] = 'Democrat'
rep_tweets = pd.DataFrame()
for ind in range(len(rep_tweets_file)):
rep_tweets = rep_tweets.append(pd.json_normalize \
(rep_tweets_file[ind]))
rep_tweets['Party'] = 'Republican'

dem_tweets_file and rep_tweets_file is final list appended with all JSON files of these parties into a Dataframe. I have also added their target variable as ‘Party’

Thank you!

--

--