How to Scrape Millions of Tweets using SNSCRAPER

Ibekwe kingsley
Machine learning Mastery
3 min readMay 23, 2022

In my previous post, I detailed how to scrape millions of tweets using Tweepy and your authentication keys step by step. You can check out the post here

There are limitations to scraping tweets with Tweepy. The standard API only allows retrieval of tweets from seven days ago and scraping of 18,000 tweets per 15-minute window. Additionally, Tweepy only returns up to 3,200 of a user’s most recent tweets.

In this post I will show you how to scrape millions of tweets using snscraper and without having to use keys from your twitter developer account.

Snscrape is a scraper for social networking services (SNS). It scrapes details like user profiles, hashtags, or searches and returns the discovered items, e.g. the relevant posts.

Let’s get to the code!

Snscraper works only with python 3.8 or newer versions, so you have to make sure you install python 3.8 or follow the steps in the code

Importing the necessary libraries

We will first intall python 3.8 in our colab

# snscraper works only on python 3.8 or newer versions# install Anaconda3!wget -qO ac.sh https://repo.anaconda.com/archive/Anaconda3-2020.07-Linux-x86_64.sh!bash ./ac.sh -b
# a fake google.colab library
!ln -s /usr/local/lib/python3.7/dist-packages/google \
/root/anaconda3/lib/python3.8/site-packages/google
# start jupyterlab, which now has Python3 = 3.8
!nohup /root/anaconda3/bin/jupyter-lab --ip=0.0.0.0&
# access through ngrok, click the link
!pip install pyngrok -q
from pyngrok import ngrok
print(ngrok.connect(8888))

Intstalling and other necessary libraries

pip install snscrape# Run the pip install command below if you don't already have the library!pip install git+https://github.com/JustAnotherArchivist/snscrape.git# Run the below command if you don't already have Pandas!pip install pandas# Importsimport snscrape.modules.twitter as sntwitterimport pandas as pd
import snscrape.modules.twitter as sntwitter

Setting Variables to be used

We will be scraping a maximum number of 1m tweets for the text “electric mobility” from twitter. The date is set from 1st January 2016 to 10th May 2022.

# Setting variables to be used belowmaxTweets = 1000000
# Creating list to append tweet data to
tweets_list2 = []
# Using TwitterSearchScraper to scrape data and append tweets to list
for i,tweet in enumerate(sntwitter.TwitterSearchScraper('electric mobility since:2016-01-01 until:2022-05-10').get_items()):
if i>maxTweets:breaktweets_list2.append([tweet.date, tweet.id, tweet.content, tweet.username])

Creating the Dataframe and saving as a CSV file

# Creating a dataframe from the tweets list abovetweets_df2 = pd.DataFrame(tweets_list2, columns=['Datetime', 'Tweet Id', 'Text', 'Username'])tweets_df2.to_csv("electric_mobility.csv", index = False)

NB: Some lines of code are not well arranged, you can access the codes on github

Lets see what we’ve got

If this post was helpful don’t forget to leave a clap for me and follow the publication for more posts like this.

Enjoy Twitter Scraping!!

--

--