Crawling Twitter Data Without Authentication

Twitter provides us with an API to get tweet data, so we can easily retrieve tweet from an official source. Free version twitter API gives us many limitations; we cannot retrieve tweets that more than seven days old. I try to find the way how we can retrieve tweet data without an official API that gives us limitations. This post is for educational only, do it with your own risk.

Dea Venditama
Analytics Vidhya
3 min readMay 24, 2020

--

Photo by Sara Kurfeß on Unsplash

Twitter is an excellent source of data for a scientist; it gives us the freedom to stream and retrieve the data using an official API. Using the free official API, we can retrieve tweets up to seven days back.

But, how can we retrieve the data more than seven days back?

We can get the premium API, but it’s so expensive for a person like us, not a company.

So, I try to google how to retrieve the data that we want, not limited to 7 days old data.

I found this useful Python library from git, which can crawl twitter data without official API.

The documentation of this library is very clear. We can install this library using pip like the syntax below.

Then we can import and retrieve tweets from an account.

I tried the library to retrieve tweet data from a person who has 27k tweets. But it failed, the library only retrieves about 800 tweets from 27k available tweets.

I try to modify the library with my own code, and I hope I can fix the shortcomings of this library.

But, I failed.

I try to fix it by modifying search URL and the headers, but it’s still no work. Like the library before, I only retrieve about 800 tweets from 27k tweets.

This is my code; I am using a different method to access the data. I’m using BeautifulSoup to get the data from the request. I also add a function to save the data in CSV format.

Then I try to find some online tools on the internet, is it anyone who can help me to retrieve all of the tweets of a person.

Then I found this website

This is the result when we try to retrieve the tweets.

Not only give us the data to download in excel or CSV formats, but it also provides us with the insight of data that we retrieved.

But, again, we can’t retrieve all of the tweets. We can only retrieve 3200 tweets, and we must pay more to retrieve all of the tweets.

Conclusion and Future Work

It’s not possible to retrieve all of the tweet data from a person using unofficial API (correct me if you found the way).

The only way we can do it is by using a robot-like Kapaw or Selenium. I made it before using Selenium. When I try the codes, It does not work because twitter changes its interface.

I am still working on a Selenium-based crawler to retrieve all data. The problem I faced now is Twitter class, and XPath seems very random, and Twitter pages are built full of AJAX.

It is hard, but I think it is possible to get all of the tweet data from a person using Selenium.

Thank You

--

--

Dea Venditama
Analytics Vidhya

Freelance Programmer & Fungsional Pranata Komputer Badan Pusat Statistik RI