Image for post
Image for post
Photo by MORAN on Unsplash

Data Science, Data Visualization, Programming

Getting Valuable Insights and Visualizations from Tweets Using Python and Twint

my first for-fun data science project

Zijing Zhu
Nov 8, 2020 · 10 min read

Introducing and installing Twint

Twint is an advanced Twitter scraping tool written in Python that allows for scraping Tweets from Twitter profiles without using Twitter’s API. Twint utilizes Twitter’s search operators to let you scrape Tweets from specific users, scrape Tweets relating to certain topics, hashtags & trends, or sort out sensitive information from Tweets like e-mail and phone numbers. Twint also makes special queries to Twitter allowing you to also scrape a Twitter user’s followers, Tweets a user has liked, and who they follow without any authentication, API, Selenium, or browser emulation.

pip install — user — upgrade -e git+ https://github.com/twintproject/twint.git@origin/master#egg=twint
import twint
import nest_asyncio
nest_asyncio.apply()

What are the goals?

Image for post
Image for post
https://twitter.com/tx_cherimaho

Getting historical tweets

Specify the engine:

Image for post
Image for post
twitter searching box
c = twint.Config()c.Search = '#チェリまほ'

Give extra conditions for the search

c.Since = ‘2020–08–24’#c.Username = "tx_cherimaho"
#c.Limit = 10
#c.Until = ‘2020–01–01’

Format the outputs

c.Hide_output = True
c.Store_json = True
c.Output = ‘tweets.json’

Grant the search

twint.run.Search(c)
def grab_tweets(search,file):
c = twint.Config()
c.Search = search
c.Since = ‘2020–08–24’
c.Hide_output = True
c.Store_json = True
c.Output = file
twint.run.Search(c)

What I found

Image for post
Image for post
%matplotlib inlinedf[‘date’].value_counts().plot()
Image for post
Image for post
Number of tweets by date
df_short = df[df['date']>pd.to_datetime('2020-10-01')]
df_short.shape #(86021, 36)
#episode dates so far
eps_dates = [pd.to_datetime('2020-10-08'),pd.to_datetime('2020-10-15'),pd.to_datetime('2020-10-22'),pd.to_datetime('2020-10-29'),pd.to_datetime('2020-11-05')]
ax = df_short['date'].value_counts().plot(figsize=(10,5))for i in range(len(eps_dates)):
ax.axvline(eps_dates[i], color='r', linestyle='--')
ax.text(eps_dates[i],17000,'EP{}'.format(i+1),color='red')
Image for post
Image for post
Number of tweets by date and episode dates
ax = df_short.groupby("date").sum()[['replies_count','retweets_count', 'likes_count']]
.plot(figsize=(10,5))
for i in range(len(eps_dates)):
ax.axvline(eps_dates[i], color='r', linestyle='--')
ax.text(eps_dates[i],125000,'EP{}'.format(i+1),color='red')
Image for post
Image for post
Tweets’ action metrics by date
#read the json files
df_a= pd.read_json('tweets_a.json' , lines = True)
df_k= pd.read_json('tweets_k.json' , lines = True)
#convert value counts to dataframes
values_a =
df_a[‘date’].value_counts().rename_axis(‘dates’).reset_index(name=’counts’)
values_k = df_k['date'].value_counts().rename_axis('dates').reset_index(name='counts')#merge dataframes and clean it
values = values_k.merge(values_a,on='dates',how='outer')
values.rename(columns={"counts": "Keita Machida","counts_x":"Eiji Akaso"},inplace=True)values.fillna(0,inplace=True)values['dates'] = pd.to_datetime(values['dates'])ax = values.plot(x='dates', y=["Keita Machida", "Eiji Akaso"],figsize = (10,5))for i in range(len(eps_dates)):
ax.axvline(eps_dates[i], color='r', linestyle='--')
ax.text(eps_dates[i],400,'EP{}'.format(i+1),color='red')
Image for post
Image for post
tweets about the two actors
#read the json files and count the use of different languages
lan_t = tweets_total['language'].value_counts().rename_axis('language').reset_index(name='チェリまほ')
lan_c = df_cherry[‘language’].value_counts().rename_axis(‘language’).reset_index(name=’CherryMagic’)lan_tx = df_cherry[‘language’].value_counts().rename_axis(‘language’).reset_index(name=’tx_cherimaho’)#merge datasets
lans = lan_tx.merge(lan_t.merge(lan_c,on='language',how='outer'),on='language',how='outer')
#Use language rather than language code
#source: https://www.w3schools.com/tags/ref_language_codes.asp
lan_list = pd.read_excel('language_code.xlsx')
lan_list.rename(columns={'ISO Code':'language'},inplace=True)
#merge
lans_f = lans.merge(lan_list, on='language',how='left')
#sum, sort, get rid of Japanese, and delete unrecognized languages
lans_f['total'] = lans_f.sum(axis=1)
lans_f.sort_values(by=['total'],ascending=False, inplace=True)
lans_short = lans_f[lans_f['language']!='ja']
lans_short.dropna(inplace=True)
lans_short.plot(x='Language',y='total',kind='bar',figsize=(12,8),rot=75)
Image for post
Image for post
The number of Japanese posts is 136170

Future steps

Some thoughts

Towards AI

The Best of Tech, Science, and Engineering.

Sign up for Towards AI Newsletter

By Towards AI

Towards AI publishes the best of tech, science, and engineering. Subscribe to receive our updates right in your inbox. Interested in working with us? Please contact us → https://towardsai.net/contact Take a look

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

Zijing Zhu

Written by

PhD in Economics | Certified in Data Science | Passion in Life | https://zijing0926.github.io/Portfolio/ | https://www.linkedin.com/in/zijingzhu/

Towards AI

Towards AI is the world’s leading multidisciplinary science publication. Towards AI publishes the best of tech, science, and engineering. Read by thought-leaders and decision-makers around the world.

Zijing Zhu

Written by

PhD in Economics | Certified in Data Science | Passion in Life | https://zijing0926.github.io/Portfolio/ | https://www.linkedin.com/in/zijingzhu/

Towards AI

Towards AI is the world’s leading multidisciplinary science publication. Towards AI publishes the best of tech, science, and engineering. Read by thought-leaders and decision-makers around the world.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store