Embedding Sentiment Analysis Model into a Web Application
Sentiment analysis, sometimes also called opinion mining, is a popular subdiscipline of the broader field of NLP; it is concerned with analyzing the polarity of documents.
A popular task in sentiment analysis is the classification of text based on the expressed opinions or emotions of the end-user about a particular topic.
Today we’ll build a Sentiment Analysis model on a large dataset of movie reviews from the Internet Movie Database (IMDb) and then we’ll deploy this model over the cloud using https://www.pythonanywhere.com.
Sounds exciting? Let’s start from scratch and build everything step by step-
Dataset required for this problem is available as a .csv file on my GitHub - https://github.com/guptasoumya26/Sentiment-Analysis-model-on-web. Alternatively, you can also download the same from the link- http://ai.stanford.edu/~amaas/data/sentiment/
Model Creation
Step1. Import the necessary libraries, we are going to use further
import urllib.request
import os
import pandas as pd
import numpy as np
from nltk.tokenize import RegexpTokenizer
from nltk.stem.porter import PorterStemmer
from nltk.corpus import stopwords
Step2. Reading the .csv file from Pandas and examining the first 3 rows
df = pd.read_csv(‘movie_data.csv’, encoding=’utf-8')
df.head(3)
Step3. Cleaning the text data
The shape of our data is (50000,2), we have 50K rows with 2 columns of review and sentiment. We are going to define a helper function which will help us in various steps of cleaning the text data such as stopwords removal, lowering the case, etc.
# init Objects
tokenizer=RegexpTokenizer(r’\w+’)
en_stopwords=set(stopwords.words(‘english’))
ps=PorterStemmer()def getStemmedReview(review):
review=review.lower()
review=review.replace("<br /><br />"," ")
#Tokenize
tokens=tokenizer.tokenize(review)
new_tokens=[token for token in tokens if token not in en_stopwords]
stemmed_tokens=[ps.stem(token) for token in new_tokens]
clean_review=' '.join(stemmed_tokens)
return clean_review
In the above function, we are lowering the cases of sentences, breaking them into tokens, preserving the tokens if they are not part of predefined stopwords and finally stemming them for the root form.
Once all the above gets completed we are returning the clean text in the end.
Step4. Cleaning all the reviews and splitting our data for training and testing.
df[‘review’].apply(getStemmedReview)X_train = df.loc[:35000, ‘review’].values
y_train = df.loc[:35000, ‘sentiment’].values
X_test = df.loc[35000:, ‘review’].values
y_test = df.loc[35000:, ‘sentiment’].values
As our total length of data is 50K rows, we are splitting it into 35K rows for training and 15K for testing purposes.
Step5. Transforming words into feature vectors
To feed the data to the Machine Learning model, we have to convert categorical data, such as text or words, into a numerical form.
We are going to use TfidfVectorizer for this purpose which is already present in the scikit-learn library, for more details please refer the link- https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html#sklearn.feature_extraction.text.TfidfVectorizer
from sklearn.feature_extraction.text import TfidfVectorizer
vectorizer = TfidfVectorizer(sublinear_tf=True, encoding=’utf-8',
decode_error=’ignore’)vectorizer.fit(X_train)X_train=vectorizer.transform(X_train)
X_test=vectorizer.transform(X_test)
Please note that in the above code, we perform the fit operation only on the training set and once the vectorizer learns completely from the training data, we use the same learning to transform our test data.
Step6. Creating the model and checking the score on training and test data
Here we are using the LogisticRegression model because it’s easy to interpret in terms of probability of the output. Feel free to explore other models as per your preference.
from sklearn.linear_model import LogisticRegressionmodel=LogisticRegression(solver=’liblinear’)
model.fit(X_train,y_train)print(“Score on training data is: “+str(model.score(X_train,y_train)))
print(“Score on testing data is: “+str(model.score(X_test,y_test)))Score on training data is: 0.93597
Score on testing data is: 0.89766
Let’s verify our model’s output on a single review
Below is our test point df.iloc[35000,0], which is same as our first point in testing data i.e. X_test[0]
If you haven't seen the gong show TV series then you won't like this movie much at all, not that knowing the series makes this a great movie. <br /><br />I give it a 5 out of 10 because a few things make it kind of amusing that help make up for its obvious problems.<br />When we perform prediction on above point we can see below that our model is performing well enough.# Here 0 denotes a negative sentiment
model.predict(X_test[0])
array([0], dtype=int64)# 78% probability that the given text is negative
model.predict_proba(X_test[0])
array([[0.78833439, 0.21166561]])
So now as our model is ready, let’s start with the process of creating a Web Application for our model so that the same can be given to end-user without any external dependency.
Model deployment and creating a Web-App
For deployment purposes, we are going to use public server pythonanywhere.com Let’s cover the below steps one by one
- Saving the current state of a trained machine learning model
- Developing a web application using the popular Flask web framework
- Deploying a machine learning application to a public webserver.
Let’s dive into the process now
Step1. Serializing fitted scikit-learn estimators
Because we don’t want to retrain our model every time the web-application loads we go for the option of model persistence through Python’s in-built pickle
module (https://docs.python.org/3.7/library/pickle.html)
from sklearn.externals import joblibjoblib.dump(en_stopwords,’stopwords.pkl’)
joblib.dump(model,’model.pkl’)
joblib.dump(vectorizer,’vectorizer.pkl’)
The above code will save three pickle files in our current working directory.
For better clarity and following the same steps, please create a folder named “movieclassifier” in your current working directory and place below files inside the folder.
a) Manually create the subdirectory structure inside movieclassifier folder as shown in the image.
b) Move all the newly created .pkl files into pkl_objects folder
c) Create a new file flaskapp.py as shown and keep it blank as of now. Before creating flaskapp.py, let’s take a pause and look at a brief introduction of Flask which is a micro web framework written in Python.
A Very Brief Introduction to Flask
If you downloaded the Anaconda distribution, you already have Flask installed, otherwise, you will have to install it yourself with — pip install flask
.
Flask is very minimal since you only bring in the parts as you need them. To demonstrate this, here’s the Flask code to create a very simple web server.
from flask import Flaskapp = Flask(__name__)
@app.route("/")
def hello():
return "Hello World!"
if __name__ == '__main__':
app.run(debug=True)
Once executed, you can navigate to the web address, which is shown the terminal, and observe the expected result.
Let’s review what the executed code is doing.
After importing, we create an instance of the Flask class and pass in the __name__
variable that Python fills in for us. This variable will be "__main__"
, if this file is being directly run through Python as a script. If we import the file instead, the value of __name__
will be the name of the file where we did the import. For instance, if we had test.py
and run.py
, and we imported test.py
into run.py
the __name__
value of test.py
will be test
.
Above our hello
method definition, there’s the line. The @
denotes a decorator, which allows the function, property, or class it’s precedes to be dynamically altered.
The hello
method is where we put the code to run whenever the route of our app or API hits the top-level route: / or homepage
.
If our __name__
variable is __main__,
indicating that we ran the file directly instead of importing it, then it will start the Flask app which will run and wait for web requests until the process ends.
Now let’s continue from where we left-off.
Step2. Importing Flask with other libraries and creating the flaskapp.py
Using the above knowledge let’s create our flaskapp.py now
from flask import Flask, render_template, request
from wtforms import Form, TextAreaField, validators
import pickle
import sqlite3
import os
import numpy as np
from sklearn.externals import joblibloaded_model=joblib.load(“./pkl_objects/model.pkl”)
loaded_stop=joblib.load(“./pkl_objects/stopwords.pkl”)
loaded_vec=joblib.load(“./pkl_objects/vectorizer.pkl”)app = Flask(__name__)def classify(document):
label = {0: ‘negative’, 1: ‘positive’}
X = loaded_vec.transform([document])
y = loaded_model.predict(X)[0]
proba = np.max(loaded_model.predict_proba(X))
return label[y], probaclass ReviewForm(Form):
moviereview = TextAreaField(‘’,[validators.DataRequired(),validators.length(min=15)])@app.route(‘/’)
def index():
form = ReviewForm(request.form)
return render_template(‘reviewform.html’, form=form)@app.route(‘/results’, methods=[‘POST’])
def results():
form = ReviewForm(request.form)
if request.method == ‘POST’ and form.validate():
review = request.form[‘moviereview’]
y, proba = classify(review)
return render_template(‘results.html’,content=review,prediction=y,probability=round(proba*100, 2))
return render_template(‘reviewform.html’, form=form)if __name__ == ‘__main__’:
app.run(debug=True)
In the above code, we are performing below steps-
- Loading pickle files
- Starting the app and creating helper function for classification of text using the loaded model and vectorizer which returns predicted class along with the probability of output value.
- ReviewForm class is created for validation of input length as a minimum of 15 characters.
- “/” home route is associated with index() which renders reviewform.html, we’ll be creating all the HTML files in a while.
- Finally, we are creating a results route that will classify the input given by the user and then will display the same on the results.html webpage.
Step3. Setting up reviewform.html for the homepage of our app.
<!doctype html>
<html>
<head>
<title>Movie Classification</title>
<link rel=”stylesheet” href=”{{ url_for(‘static’, filename=’style.css’) }}”>
</head>
<body><h2>Please enter your movie review:</h2><form method=post action=”/results”>
<dl>
{{ render_field(form.moviereview, cols=’30', rows=’10') }}
</dl>
<div>
<input type=submit value=’Submit review’ name=’submit_btn’>
</div>
</form></body>
</html>
Through this form, users can provide a movie review and submit it via the Submit review button displayed at the bottom of the page. This TextAreaField
is 30 columns wide and 10 rows tall and will look like below
Step4. Setting up results.html
<!doctype html>
<html>
<head>
<title>Movie Classification</title>
<link rel=”stylesheet” href=”{{ url_for(‘static’, filename=’style.css’) }}”>
</head>
<body><h3>Your movie review:</h3>
<div>{{ content }}</div><h3>Prediction:</h3>
<div>This movie review is <strong>{{ prediction }}</strong>
(probability: {{ probability }}%).</div><div id=’button’>
<form action=”/”>
<input type=submit value=’Submit another review’>
</form>
</div></body>
</html>
Here we simply inserted the submitted review, as well as the results of the prediction, in the corresponding fields {{ content }}, {{ prediction }}, and {{ probability }}. If a user hits Submit another review button, we redirect the application to home page action via form action=”/”.
Place both reviewform.html and results.html inside the templates folder created previously.
Step5. Setting up style.css
We imported a CSS file (style.css) at the beginning of the results.html file. The setup of this file is quite simple: it limits the width of the contents of this web application to 600 pixels and adjusts the padding of the button from the top.
body{
width:600px;
}
.button{
padding-top: 60px;
}
Place this style.css inside the static folder created previously.
Step6. Testing your application on the local server.
Start the web application locally from your command-line terminal via the following command after navigating to the directory of flaskapp.py before we advance to the next subsection and deploy it on a public webserver:
python flaskapp.py
You should see your app running on localhost:5000 as below
Step7. Deploying the web application to a public server
We are now ready to deploy our web application onto a public webserver. We will be using the PythonAnywhere web hosting service, which specializes in the hosting of Python web applications
a) To create a new PythonAnywhere account, visit the website at https://www.pythonanywhere.com/ and click on the Pricing & signup link that is located in the top-right corner. Next, we click on the Create a Beginner account button where we need to provide a username, password, and valid email address
b) Go to the Web Section and then click on create a new app.
- Select manual configuration
- Select Python3.7 as env
- Go to the files tab and create a new directory “movieclassifier”
- Upload all the files from local disk to the server as below
c) Creating a virtual-env for all the libraries
Go to the consoles section and start a new bash console, then type the below command for creation of new virtual env
mkvirtualenv myvirtualenv --python=/usr/bin/python3.7
After completion of the above install below libraries one by one
pip install Flask
pip install flask-wtf
pip install sklearn
Now it’s time to link your virtual environment with the app via navigating to the web section of pythonanywhere.com and giving the path of your newly created virtualenv as below
d) Finally, edit the WSGI configuration file from the Web tab as below
import sys
path = ‘/home/soumyansh26/movieclassifier’
if path not in sys.path:
sys.path.append(path)from flaskapp import app as application
e) Now reload your web application and at this point, you should be able to see your app live and running over the internet. Congratulations on successfully deploying your machine learning model over the web !!
Link to the app- http://soumya26.pythonanywhere.com/
Thank you for reading till here. In this post, we learned how to create a Sentiment Analysis model, what Flask is, how to use it to create APIs, and most importantly how to apply this knowledge to deploy our Machine Learning model on a public web server for the end-user.
I hope you found this tutorial useful. I’m curious about what you think so hit me with some comments. You can also get in touch with me directly through email or connect with me on LinkedIn.