Capstone project in the 4th year of college
In the first semester of the last year in Btech, we were supposed to work on a capstone project. We had to choose a topic which was related to the things we had studied in the University curriculum and also which was of a decent level which required 4 months to be put into. I selected the project of sentiment analysis of political reviews. The idea behind selecting this project was firstly that it was making use of deep learning which is a new age technology and secondly it was something I thought I will be able to manage myself.
Sentiment analysis is a technique of Machine learning where the model predicts the sentiments behind the remarks/reviews passed by the users. A word index file contains points for each word and with the help of that the score of every sentence is calculated. The sentiment of the sentence is determined based on that score depending upon whether it is higher or lower than the threshold value which we have set.

The idea behind selecting this project was that in today’s society there are many people who try and spread hatred by passing offensive remarks which are based on the caste/religion/gender of a person or remarks which contain abusive words. These things do not lead to a good environment in the society and should be eliminated. These type of people should be penalized. For that we have to perform sentiment analysis and label the tweets containing bad words as ‘offensive’.

There are several algorithms and libraries in python used to achieve the task of sentiment analysis. Commonly known machine learning techniques like Naïve Byes, Random Forest Classifier , K-Nearest Neighbour and Support Vector Machine are used for this purpose. There are several libraries in python which can be used for this purpose like keras, pandas, tensorflow and nltk which is a library used to perform all the natural language processing related tasks.
Now we shall come to the main part of the discussion which is the algorithm I have worked upon to perform sentiment analysis.
To start off the work all the necessary libraries are imported like numpy, pandas, seaborn, matplotlib, sklearn , nltk, tensorflow, keras etc. After importing the libraries we proceed to reading the dataset file into our code which is a csv file. The file consists of 2 columns, one is the political tweet and the other one is the sentiment behind it. The data initially is in the raw form and contains unwanted words, characters and symbols. Our job is to remove them and after that split the clean sentence into words so that we can form a list of words.
For this task of cleaning, we import the library ‘re’ in python. Firstly we remove all the stopwords. After that by the use of tokenizer class we split the sentence into words. The PorterStemmer function of the ‘nltk’ package is used for performing stemming. Stemming is a technique of Natural Language Processing in which words having similar meaning or same type of words are reduced to only one common word. For eg- The words ‘playing’, ‘play’, ‘player’, ‘plays’, ‘replay’ are all reduced to ‘play’. Stemming is a vital process in information retrieval systems like search engines.

After this, we move to the next task that is performing lemmatization on our data. Lemmatization is also somewhat like stemming because it also reduces similar words to a common root word. However the major difference is that lemmatization depends on correctly identifying the intended part of speech and the meaning of the word in a sentence , as well as the meaning within a larger context surrounding that sentence.
When we are finished with pre-processing of the data, its time to move to the main part , that is training the model. The dataset is split into train and test datasets. During the training of the model, the inputs are given and the neural network and the activation function to be used are specified. When the model is done with training, Adam optimizer is applied and the accuracy metric is used to evaluate the performance of the model.

After running the model for the specified number of epochs and batch size, the training and testing accuracies are calculated. My model gave a good training accuracy of 95 percent and a decent testing accuracy of 72%. The model is slightly overfitting and further work needs to be done on it to improve its accuracy and obtain better results.
It was a wonderful experience of working upon this project. The best thing was that this semester we all had to do the projects alone. Till third year, we were all allowed to work in teams of three or four members. Working alone was a challenge in the starting but when we were able to accomplish our tasks, it gave us the confidence that we are ready to face the corporate world. In college, we were living in an environment where everybody was ready to help us. However when we move to the job sector, it will be a highly competitive world where we need to have the right skill sets to be able to survive and not ask others for help everytime.
As the main aim was to give myself some challenge and learn new things , I selected a project in Natural language processing. This is one subject which I have not studied in the college curriculum as I did not opt for the NLP elective. I am glad that I have been able to manage this project myself . This project has helped me get hands on experience of NLP which is an upcoming technology in the area of Artificial Intelligence. There are a number of applications of NLP which include Automatic summarization, Natural language generation and much more.
I would like to thank Dr. Suneet Gupta, our mentor for this project and Dr. Deepak Garg, the HOD of CSE department for guiding me in this project. They were a source of constant motivation and support throughout this semester. Without their support, I would not have been able to make this project.
REFERNCES-
1-https://www.researchgate.net/publication/339672676_Twitter_Sentiment_Analysis_A_Political_View
Twitter sentiment analysis
By- Joylin Pinto
2-https://www.sciencedirect.com/science/article/pii/S1877050920306669
A research paper by Mohd Zeeshan Ansari
3-https://aclanthology.org/W13-1106.pdf
Report written by Akshat Bakliwal, Jennifer Foster and 4 others.
4-https://www.hindawi.com/journals/complexity/2020/8892552/
Sentiment analysis of tweets by Rabia Batool and 5 others.
5-https://ieeexplore.ieee.org/document/8487879
Paper on political sentiment analysis published in IEEE.
6-https://ieeexplore.ieee.org/document/6581022
7-https://ieeexplore.ieee.org/document/9486841
Sentiment analysis on Spanish elections tweets. The paper was published in IEEE.
8-https://ieeexplore.ieee.org/document/8978440
Paper of an IEEE conference held in BANGALORE, INDIA.