How to bypass Facebook API and collect data from posts/users

The Coding Rebel
5 min readFeb 9, 2018

--

Web scraping is a very handy way of collecting information in bulk for your research subject. Whether you are a spammer looking to contact target audiences in bulk, or are just interested in what percentage of posts is fake news in your timeline, Facebook does not support it. They actually made it impossible to gather data from private Facebook pages and personal timelines. Now me being the latter example trying to investigate the legitimacy of posts, I got very upset.

During the 2017 elections there where rumors of fake news influencing the opinion of the voters. This sparked my attention and I soon started looking into how to collect data from my own timeline. And see whether or not I was being affected. However I realized very quickly that a simple web-scraping tool or API call was not going to do it.

The problem

The Facebook API allows people to collect data from specified users, and public pages. But from your own timeline there is no way to collect the posts. Now I could have collected data from major media pages and political figures but that would have missed the point of my research. So really needed access to my timeline.

Web-scraping is not possible either since Facebook requires login to check a personal timeline (obviously). This ment I had to come up with a solution.

Solution

The way I made it work is by using a web driver, in my case selenium, to log in to Facebook and then collect all my timeline posts. This way I was able to collect all the data from Facebook that would otherwise not be accessible for automation. If you already know how this works then congratulations! But if you want to learn how to use the Selenium driver to mine the data yourself than read below.

Disclaimer: if you are not familiar with selenium or web browsing automation then read this website about Selenium. And follow this link to set it up for python.

Step 1 Logging in onto Facebook

After you have installed Selenium correctly and have the required web driver up and running we can go ahead and start the automation!

The first thing we need to do is have selenium set up the web driver and have it open Facebook’s login page(www.facebook.com/login).

Once the page is loaded have the driver search for the for the login boxes where it should put in our email and password to log in. We can find these boxes by locating their CSS id’s.The selenium driver can find any element by searching for their CSS id or class. More on locating CSS elements here. In our case we want to find the username and password form and fill them in with our information.

Now that we have filled in the required login form we need to click the button to log in off course. We do this with the click() function on the dedicated login button.

Now we are logged in and ready to scrape for data!

Step 2 Collecting the data

In my case I was looking to collect data from my own Facebook timeline to search for fake news posts. Since we are now logged in we can simply navigate to the desired page and collect the data. Now bear in mind that I did this for my own timeline but you could also do this for a private group for example.

Now we are all set and ready to collect the posts. We just need to identify the posts on the page and collect them. This time we need to look for an HTML element that contains the post or whatever we are looking for. In my case I needed the post and the person who shared the post.

I found out that the thing all posts have in common is that they have the CSS class “_1dwg”. This would contain both the post itself with the text to parse as well as the person who posted it. We just put these posts into an array by calling:

Now we have collected all the html code that are connected to the post. To extract the data we just need to do one last step

Step 3 Parsing the Data

All the posts that are in our posts_array are not yet usable. We want to get our data in a string variable or anything to work with. So what we do for every post in our array of posts find all the paragraphs in the post and add them together. We find the paragraph with the tag name

inside our post.

Now we just add the paragraphs together and we have a string containing all of the text contained in our found Facebook post.

And if we want let’s say the person that goes with this post with do:

Now in my case I was ready to see how badly I was getting influenced by fake news on Facebook! And I am sure there are many other ways this method could benefit other people trying to get data from Facebook.

--

--