Crawling Twitter Data Without Authentication

Twitter provides us with an API to get tweet data, so we can easily retrieve tweet from an official source. Free version twitter API gives us many limitations; we cannot retrieve tweets that more than seven days old. I try to find the way how we can retrieve tweet data without an official API that gives us limitations. This post is for educational only, do it with your own risk.

Published in

Analytics Vidhya

3 min readMay 24, 2020

Twitter is an excellent source of data for a scientist; it gives us the freedom to stream and retrieve the data using an official API. Using the free official API, we can retrieve tweets up to seven days back.

But, how can we retrieve the data more than seven days back?

We can get the premium API, but it’s so expensive for a person like us, not a company.

So, I try to google how to retrieve the data that we want, not limited to 7 days old data.

I found this useful Python library from git, which can crawl twitter data without official API.

bisguzar/twitter-scraper

Twitter’s API is annoying to work with, and has lots of limitations — luckily their frontend (JavaScript) has it’s own…

github.com

The documentation of this library is very clear. We can install this library using pip like the syntax below.

pip3 install twitter_scraper

Then we can import and retrieve tweets from an account.

from twitter_scraper import get_tweetsfor tweet in get_tweets('twitter'):
     print(tweet['text'])

I tried the library to retrieve tweet data from a person who has 27k tweets. But it failed, the library only retrieves about 800 tweets from 27k available tweets.

I try to modify the library with my own code, and I hope I can fix the shortcomings of this library.

But, I failed.

I try to fix it by modifying search URL and the headers, but it’s still no work. Like the library before, I only retrieve about 800 tweets from 27k tweets.

This is my code; I am using a different method to access the data. I’m using BeautifulSoup to get the data from the request. I also add a function to save the data in CSV format.

Then I try to find some online tools on the internet, is it anyone who can help me to retrieve all of the tweets of a person.

Then I found this website

Download and Export User Tweets into Excel for Free — Vicinitas : Twitter Analytics Tool for…

Once up to 3200 recent tweets of a user are downloaded, you can export the details of each of these tweets into Excel…

www.vicinitas.io

This is the result when we try to retrieve the tweets.

Not only give us the data to download in excel or CSV formats, but it also provides us with the insight of data that we retrieved.

But, again, we can’t retrieve all of the tweets. We can only retrieve 3200 tweets, and we must pay more to retrieve all of the tweets.

Conclusion and Future Work

It’s not possible to retrieve all of the tweet data from a person using unofficial API (correct me if you found the way).

The only way we can do it is by using a robot-like Kapaw or Selenium. I made it before using Selenium. When I try the codes, It does not work because twitter changes its interface.

I am still working on a Selenium-based crawler to retrieve all data. The problem I faced now is Twitter class, and XPath seems very random, and Twitter pages are built full of AJAX.

It is hard, but I think it is possible to get all of the tweet data from a person using Selenium.

Thank You