How to Scrape Tweets from Twitter with Python Twint

Andika Pratama
Analytics Vidhya
Published in
3 min readApr 29, 2020

Twint is an advanced Twitter scraping tool written in Python that allows for scraping Tweets from Twitter.

The advantage of Twint is that you don’t need Twitter’s API to make TWINT work. Twint utilizes Twitter’s search operators to let you :

  • scrape Tweets from specific users
  • scrape Tweets relating to certain topics
  • hashtags & trends
  • or sort out sensitive information from Tweets like e-mail and phone numbers.

Some of the benefits of using Twint :

  • Can fetch almost all Tweets (Twitter API limits to last 3200 Tweets only);
  • Fast initial setup;
  • Can be used anonymously and without Twitter sign up;
  • No rate limitations.

Prerequisites

  • python 3.6
  • Twint 2.1.19
  • venv (Virtual Environment)

Create Virtual Environments (VENV)

In your terminal or cmd typing this command :

  • windows
py -m venv venv
  • Linux
python -m venv venv

To activate your venv type this command in your cmd or terminal linux:

#Windows
.\venv\Scripts\activate.bat
#Linux
source venv/bin/activate

Installing Twint

In your venv install Twint by entering the following command in your terminal.

pip install twint

Wait a moment until the install process is complete and hola now you already have twint installed on your venv

!! make sure you install it in your venv

Create Scraper Program

To use twint we will use python language, After you have installed Twint, you will need to import Twint.

import twint

okay for example we will make a program to do tweet scraping by querying the word of “bitcoin” for 10 tweets

import twint#configuration
config = twint.Config()
config.Search = "bitcoin"
config.Limit = 10
#running search
twint.run.Search(config)

The program will only query tweets on the word “bitcoun” and retrieve as many as 10 recent tweets. to make another complex command, you can add a few additional configurations. so I’m going to query the word “bitcoin” for 100 tweets from “2019–04–29” to “2020–04–29” with English tweets and will save the output in json format file

import twint#configuration
config = twint.Config()
config.Search = "bitcoin"
config.Lang = "en"
config.Limit = 100
config.Since = "2019–04–29"
config.Until = "2020–04–29"
config.Store_json = True
config.Output = "custom_out.json"
#running search
twint.run.Search(config)

so what is the purpose of the above program, I will explain one by one as follows:

  • Search = here you fill in the query that you want to search
  • Lang = you can specify the language of the tweet you want to scrape, for the language code you can see here
  • Limit = Limit the number of tweets that are scraped
  • Since = give a specific time for the date of the tweet that will be scraped, if the limit has not been fulfilled yet has reached this time then the scrap will end
  • Until = like “since ”but “until ” command used for give time to start scraping. Twint started scrap from the biggest time to the smallest, another example for using it “ 2020–01–18 15:51:31
  • Store_json = You can save the output data in the form of json files, the value is True or False, you can also save it in CSV format by changing it to “Store_csv”.
  • Ouput = save the output data by specific name or directory

There are a lot more search features to play with within Twint, For other configurations you can check on the following page :

Closing remarks

Thank you for following this tutorial to the end. Hopefully this article helps you, see you in the next article

--

--

Andika Pratama
Analytics Vidhya

Fresh Graduate of Computer Science at Universitas Syiah Kuala, Software Engineer. Check my github on github.com/Andika7