Performing Sentimental Analysis on Twitter (Part 1a — Data Extraction Twitter)

Hung Mai
4 min readNov 14, 2017

--

I perform sentimental analysis on social media using Python and its libraries to understand what people think about my university. This is a project that I have been working independently for the past 2 months. I am not a developer nor a data scientist, I just Googled a ton and customized codes to fit my objective, and I thought I would document everything so others could benefit from my work. A big chunk of codes in this post are from Randy Daw-Ran Liou, Randy you are really awesome, thanks for the great work.

In this post I will go in details on Twitter data extraction, Part 2 will be Data Cleaning, and Part 3 will be Data Analysis.

Objective: extracting tweets with hashtags related to Clark University.

Libraries: time and selenium (scrapes data from webs)

import timefrom selenium import webdriverfrom selenium.webdriver.common.keys import Keysbrowser = webdriver.Chrome()base_url = u’https://twitter.com/search?q=';query = u’%23clarkuniversity’url = base_url + querybrowser = webdriver.Chrome()browser.get(url)time.sleep(1)body = browser.find_element_by_tag_name(‘body’)for _ in range(500):body.send_keys(Keys.PAGE_DOWN)time.sleep(0.2)tweets = browser.find_elements_by_class_name(‘tweet-text’)for tweet in tweets:print(tweet.text)

How to use it:

It’s pretty straightforward. Selenium is a library that scrapes the web on demand using your inputs. In this case I inputed 3 things: Twitter, the hashtag, and number of tweets you want to extract, so if you want to extract the tweets related to a certain hashtag, just replace %23clarkuniversity with it.

for _ in range(500):

500 is the number of tweets that I extracted. Clark University is more active on Facebook, and Twitter is a platform mainly used by faculties and parents, so 500 tweets took me back to 2014, when I was a freshie.

After the code is run, a new Chrome window will pop-up and automatically extracts all the tweets (this is why Selenium is awesome).

This is the result.

Then I tried to use Pandas to convert these raw data into a dataframe, and eventually a CVS file, but that did not work. Individual letters from the tweets went to their own cells for some weird reasons, so I went with the most efficient way, the good ol Copy and Paste.

It came out pretty well didn’t it? Some tweets will be too long for a cell (well as I am writing this blog Twitter upped the max characters to 280, which…

Anyways, if you extracted 500 tweets (I know, it’s 300 in the photo, what the heck is this consistency Will?), you will get 1000 rows. Don’t worry, the data will still be the same.

I repeated this process 4 times for the 4 hashtags that are relevant to my objective, and also because it was cool seeing how Selenium did all the work.

Sweet, now let’s get you to part 2: data cleaning and part 3: data analysis.

Exception/Potential Error:

When I applied Randy Daw-Ran Liou’s code, I first got this error:

No biggie, just go to the site and download chromedriver, then put it into your PATH location, mine is /user/bin folder. This step took me a while, but after this it should be a smooth ride.

Further Explanation?

I initially wanted to try out scraping data using Twitter API, but I noticed that I could only get 1 week worth of tweets, so that was a big no-no.

Then I tried Beautiful Soup, still too complicated and time-consuming. Selenium worked best.

If you are still curious about how the code worked with Twitter, for example:

tweets = browser.find_elements_by_class_name(‘tweet-text’)

Why tweet-text?

Selenium uses its webdriver tool to extract data from a website, and since we only want the tweets and nothing else, we have to find the class that contain them, and woah lah just select the tweet, inspect, and you can see what class it comes from.

I hope it helped.

Part 2: Data Cleaning

Part 3: Data Analysis

--

--