How to Extract Data from Facebook using Graph API

The Graph API is the primary way to get data into and out of the Facebook platform. It’s an HTTP-based API that apps can use to programmatically query data, post new stories, manage ads, upload photos, and perform a wide variety of other tasks.

In today’s post, I will walk you through the steps in extracting data from Facebook using Graph API. We will be using Python to get this done. Let’s jump right in.

Step 1: App Registration

To get started, we will first register as a Facebook Developer by going to Facebook for Developers and click Get started. After the registration, we will create our first app under the Facebook developers.

The next thing is to choose our App type, our app name, and email. Once we’ve completed the app creation flow, our app will be loaded in the App Dashboard as shown below. We can always return to the dashboard and adjust settings as they are needed.

Step 2: Authentication

From our dashboard, we have a client id and a client secret automatically generated for us after we’ve created our app. To authenticate, we will need to generate a token to interact with the API.

To do that, go to the Graph API Explorer tool, which allows us to construct and perform Graph API queries and see their responses for any apps. Under this Graph API Explorer tool, we will click on generate Access token, afterward, we can add permissions here. In this case, we will add permissions related to getting data from our Facebook page.

We will then click on generate token once again in the Graph API Explorer tool and then go further to enhance the longevity of the token by clicking on the question mark(?) beside the token. This opens up a new page where we can extend the token longevity.

Step 3: Generate Page Access Token

After we have the token, we need to generate a page access token before we can get any data from our pages. The access token we got earlier will be parsed in as the authorization under the headers.

Write these lines of code below to generate Access token.

#import the libraries we want to work with here
import requests
import pandas
import json
#We create a function which uses the get method to request the access token.
def pageToken():
# the url to pass to our request
the_url = 'https://graph.facebook.com/me/accounts'
response = requests.get(the_url,
headers = {
'Authorization':'Bearer '+ 'xxxxxxxxx',
'Content-Type': 'application/json'
}
)
#We use json to load our request response
response_data = json.loads(response.text)
#We loop through the result to navigate down to where our data i
for i in response_data['data']:
#We extract the access token here
pageAccessToken=i['access_token']
return pageAccessToken

With this Page Access Token we have right now, we can use this in all our requests to start getting the data we need.

The data we are interested in getting from Facebook are the following; Insights Data(Total Reaction Data from users Number of Fans, Number of Fans by Country), Daily Total Reach, Daily Number of Likes e.t.c.

Daily Total Reaction Data

The daily total reaction data is a metric under the Insights data endpoint. This means we can only get this data by using it as a metric under the insights Endpoint and as well others too. When using any of the endpoints, we can also pass a time period (since & until) of the data we want to get.

We also need to know that the Graph API has a data limit for all API calls we make.

https://graph.facebook.com/xxxxxxxxx/insights( the exact page id is expected to be pass here, not the user id).

#We create an empty list here to append our result
post_reactions_2019_12=[]
#we create a function here to make our request
def total_reaction_2020_03():
#We can pass our metric, time period along with the insight endpoint
#and send a request using the get method.
the_url = 'https://graph.facebook.com/xxxxxxx/insights?metric=page_actions_post_reactions_total,&since=2019-12-16&until=2020-03-15'
r_response = requests.get(the_url,
headers = {
'Authorization':'Bearer '+ pageToken(),
'Content-Type': 'application/json'
}
)
#We load the data and convert to a test
reaction_response_data = json.loads(r_response.text)
#We then append to our earlier opened empty list
post_reactions_2019_12.append(reaction_response_data)

Data cleaning for Total Reaction Data

The data we appended to the empty list will now be exposed and cleaned and convert to a Dataframe

#we opened an empty list here 
a_bucket=[]
#We loop through our response data and navigate down to the data we need and append to the earlier empty list
for i in post_reactions_2019_12:
for k in i['data']:
a_bucket.append(k)
a_size=len(a_bucket)
values=[]
Endtime=[]
#We loop thought the earlier opened list and extract each data we need and we then send them to different opened list
for i in a_bucket:
for j in i['values']:
try:
values.append(j['value']['like'])
except:
values.append('No like')
pass
try:
Endtime.append(j['end_time'])
except:
Endtime.append('Null')
pass
import pandas as pd
#We pass this list of data to a dataframe
datafram_2019_12 = {'Total_No_of_likes': values, 'Time':Endtime}
df_2019_12 = pd.DataFrame(datafram_2019_12 )

Total Daily Number Of Fans Data

We can get the total number of Daily Fans we get per day on our pages by using the insight endpoint and pass it as a metric.

#we opened an empty list
page_fans_2020_09=[]
#We create a function which use get request method to extract our data
def page_fans_2020_12():
# THE URL Endpoint we'll be making use of along with our metric and time period
the_url = 'https://graph.facebook.com/xxxxxxx/insights?metric=page_fans&since=2020-09-15&until=2020-12-15'
page_fans_response = requests.get(the_url,
headers = {
'Authorization':'Bearer '+ pageToken(),
'Content-Type': 'application/json'
}
)
#We load our data and append to an empty list
page_fans_response_data = json.loads(page_fans_response.text)
page_fans_2020_09.append(page_fans_response_data)

Data Cleaning for Total Number of Fans Data

The next thing is to take our data for cleaning and separating each column we need. We then convert to a DataFrame.

No_of_Fans=[]
Time=[]
for i in page_fans_2020_09:
for k in i['data']:
for s in k['values']:
No_of_Fans.append(s['value'])
Time.append(s['end_time'])
import pandas as pd
datafram_2020_09 = {'No_of_Fans': No_of_Fans,'Time':Time}
df_2020_09 = pd.DataFrame(datafram_2020_09)

Total Reach Data (Page Engagement, Page Unique Impressions)

We will also get our Daily Engagement data, impressions data, total post reactions from the Insights Endpoint URL. We pass this as a metric to the Insight Endpoint along with the time period of our interest.

#We create an empty list here
page_post_reached_2019_012=[]
#We create a function which use get request method to extract our data
def post_2020_03_data():
# THE URL Endpoint we'll be making use of along with our metric and time period
the_url = 'https://graph.facebook.com/xxxxxxx/insights?metric=page_impressions_unique,page_engaged_users,post_reactions_like_total&since=2019-12-16&until=2020-03-15'
page_post_reached_response = requests.get(the_url,
headers = {
'Authorization':'Bearer '+ pageToken(),
'Content-Type': 'application/json'
}
)
#We load our data and append to an empty list
page_post_reached_response_data = json.loads(page_post_reached_response.text)
page_post_reached_2019_012.append(page_post_reached_response_data)

Data Cleaning for Total Reach Data

We will clean this data to generate all the columns we need and then convert it to a DataFrame.

#We create an empty list
a_bucket=[]
#we loop throught the data dictionary and append the exact data we need to an empty list
for i in page_post_reached_2019_012:
for k in i['data']:
a_bucket.append(k)
#We create the length of the bucket and open more empty list
a_size=len(a_bucket)
name=[]
title=[]
values=[]
Endtime=[]
#We loop through using the size of the bucket inorder to get through each data
#we then append this data to a list
for i in range(0, a_size):
name.append(a_bucket[i]['name'])
title.append(a_bucket[i]['title'])
values.append([m['value'] for m in a_bucket[i]['values']])
Endtime.append([m['end_time'] for m in a_bucket[i]['values']])
#We convert this data to a dataframe
datafram_2019_012 = {'name': name, 'title': title, 'Daily_values': values,'Time':Endtime}
df_2019_012 = pd.DataFrame(datafram_2019_012)
#We observed all the data are in a list row format, we expoded each values to be line by line
rows=[]
#We use the lambda function to separate the Daily values data in a list by looping through and appended each row to a new opened list
_df= df_2019_012.apply(lambda row: [rows.append([row['name'], row['title'], nn])
for nn in row.Daily_values], axis=1)

#We pass the data appended in a list to a dataframe
df_new_1 = pd.DataFrame(rows, columns=['name','title','No_of_People'])
rowstwo=[]
#We do same thing for the No of people column
_dftwo= df_2019_012.apply(lambda row: [rowstwo.append([row['name'], row['title'], nn])
for nn in row.Time], axis=1)
#We pass the data appended in a list to a dataframe
df_new_2 = pd.DataFrame(rowstwo, columns=['name','title','Time'])
#We then pass a time column from dataframe 2 to 1, so we coumld discard the second dataframe.
df_new_1['Time'] = df_new_2['Time']

Fans by Gender

The Daily fans gender data was also extracted using the Insights Endpoint. We will pass the “page_fans_gender_age“ as a metric under the insight Endpoint.

#we open an empty list here
page_fans_gender_2020_09=[]
#We create a function to send a request to the url endpoint
def page_fans_2020_12():
#the url endpoint along with metric and time period
the_url = 'https://graph.facebook.com/xxxxx/insights?metric=page_fans_gender_age&since=2020-09-15&until=2020-12-15'
page_fans_response = requests.get(the_url,
headers = {
'Authorization':'Bearer '+ pageToken(),
'Content-Type': 'application/json'
}
)
#we load the data and append to an empty list
page_fans_response_data = json.loads(page_fans_response.text)
page_fans_gender_2020_09.append(page_fans_response_data)

Data Cleaning for Fans by Gender

We then pass our data here for cleaning. Afterward, we convert the data to a DataFrame.

#we create an empty list here
bucket=[]
#we loop through our data here and append to the bucket
for i in page_fans_gender_2020_09:
for k in i['data']:
for s in k['values']:
bucket.append(s)
#we create an empty list for each of the columns we need
No_of_fans=[]
Fans_per_gender_age=[]
End_time=[]
#we loop through the earlier opened bucket
for i in bucket:
#we append the data using the keys and values
Fans_per_gender_age.append(i['value'].keys())
No_of_fans.append(i['value'].values())
End_time.append(i['end_time'])
#We pass each list to a dictionary and convert to a dataframe
datafram_2020_09 = {'Fans_per_gender_age': Fans_per_gender_age, 'No_of_fans':No_of_fans ,'Time':End_time}
df_2020_09=pd.DataFrame(datafram_2020_09)
#We expode our dataframe values in a situation where all our data are in a list, we separate each to line by line.
#we create an empty list here
rows=[]
#we use lambda function to loop through each lines of data in Fans_per_gender_age column and append to a new empty list along with others
_df1= df_2020_09.apply(lambda row: [rows.append([row['Time'], nn])
for nn in row.Fans_per_gender_age], axis=1)
#we convert our data to a dataframe
df_new_1 = pd.DataFrame(rows, columns=['Time','Fans_per_gender_age'])
rowstwo=[]
#we use lambda function to loop through each lines of data in Number of Fans column and append to a new empty list along with others
_df2= df_2020_09.apply(lambda row: [rowstwo.append([row['Time'], nn])
for nn in row.No_of_fans], axis=1)
#we convert our data to a dataframe
df_new_2 = pd.DataFrame(rowstwo, columns=['Time','No_of_fans'])
#We then pass the number of fans column from dataframe 2 to 1, so we could discard the second dataframe.
df_new_1['No_of_fans'] = df_new_2['No_of_fans']

Number Of Fans By Country

We will get the daily number of fans by country from the API using the Insights endpoint by passing the page_fans_country as a metric along with the time period(since & until)

#we create an empty list here
page_fans_country_2020_09=[]
#We create a function to send a request to the url endpoint
def page_fans_2020_12():
#the url endpoint along with metric and time period
the_url = 'https://graph.facebook.com/xxxxxx/insights?metric=page_fans_country&since=2020-09-15&until=2020-12-15'
page_fans_response = requests.get(the_url,
headers = {
'Authorization':'Bearer '+ pageToken(),
'Content-Type': 'application/json'
}
)
#we load the data and append to an empty list
page_fans_response_data = json.loads(page_fans_response.text)
page_fans_country_2020_09.append(page_fans_response_data)

Data Cleaning for Number of Fans by Country

The data we generated will be cleaned and we will separate all the columns we’re interested in. Then convert to a DataFrame.

#we create an empty list here
bucket=[]
#we loop through our data here and append to the bucket
for i in page_fans_country_2020_09:
for k in i['data']:
for s in k['values']:
bucket.append(s)
#we create an empty list for each of the columns we need
Countries_code=[]
Fans_per_country=[]
End_time=[]
#we loop through the earlier opened bucket
for i in bucket:
#we append the data using the keys and values
Countries_code.append(i['value'].keys())
Fans_per_country.append(i['value'].values())
End_time.append(i['end_time'])
#We pass each list to a dictionary and convert to a dataframe
datafram_2020_09 = {'Country_codes': Countries_code, 'Fans_per_country':Fans_per_country ,'Time':End_time}
df_2020_09= pd.DataFrame(datafram_2020_09)

I hope you found this post helpful. Thanks for reading and follow for more educative posts.

We will listen to your data. We can also teach you how to! Data analysis is our forte.