Twitter Sentiment Analysis in NLP — Classifying positive and negative Tweets mentioned a brand name
- Setting up a Twitter developer account
In order to pull data from Twitter, a developer account needs to be set up. Below are the steps to set up this account.
- Create Twitter normal account
- Apply for developer Twitter account: Setting -> About Twitter -> Developers
- Create a new app
- Fill in the information and Create
- Get Key and Access Tokens from “Key and Access Tokens” tab
- Get Access Token and Access Token Secret
2. Setting up a Python environment
There are few Python packages will be used in this application:
- Tweepy: interact data with Twitter
- Numpy: mathematical and scientific operation in Python
- Sklearn: machine learning library
- Pandas: data manipulation and analysis
- Nltk: Natural Language Toolkit
- Re: regex pattern
Running below comment in terminal to install any packages that missing: E.g. tweepy
Open IDE and create a python file: twitter_nlp.py then import all packages and modules
3. Dataset
The training dataset is in CSV format, including two columns: the first column is text content and second will be sentiment tag as 1 as positive or 0 as negative.
Dataset will be divided into training data and testing data with a ratio of 80:20.
4. Cleaning and Steaming text content
Text content will be cleaned and steamed to put in the bag of words by PorterStemmer algorithm as below
5. Pipeline
The pipeline will be used to transform a bag of words into a vector before running Gaussian Naive Bayes (GaussianNB) algorithm to train the model. This algorithm has been chosen because its accuracy score was highest among other algorithms.
Using Pipeline will help to increase efficiency and code can be reused.
Because the output of CountVectorizer is a sparse matrix, and GaussianNB requires dense data, class DenseTransformer needs to be used as a middle step in Pipeline.
Running the model on training data and get accuracy score on testing data
6. Running on Twitter’s data
Connecting application with Tweepy API:
7. Cleaning all data:
Screenshot of tweets that appearing on home line. There is one Positive comment and one
The model will run on this data and try to predict which comment is Positive and which comment is Negative. The application will action Like to Positive comment and Retweet to Negative comment. It only does so if those actions have not been taken before.
Screenshot of a running application, it predicted one Positive comment and one Negative comment as expected:
Based on this prediction, the application should take actions: Like for Positive comment and Retweet for Negative comment:
All done for this project, it definitely helps you to get off the ground and build something awesome, please share and let me know if you do. Good luck!