Natural Language Processing with Restaurant Reviews (Part 3)
Till now, in the previous two parts of this blog, we worked on the analysis of the data, and created the machine learning classifier model to predict whether a given review is positive or negative.
Links to previous parts-
Let’s start by using the classifier to do a test prediction.
#doing a test prediction
test = ["the food was not very good, it was very rotten and tasted bad"]
#transforming for using on the model (using the count vectorizer)
test_vec = cv.transform(test)
#0= not liked
#1= liked the foodclassifier.predict(test_vec)[0]
The output gave “0” which is the class for a negative review. So, we can understand that the model is working fine.
Saving the model and text corpus using Pickle
Well in real life, pickles are used to store food for longer time and to preserve food. Similarly, in Python pickle can be used to store Machine Learning models as a file for later use. To read more on pickle in Python, visit-
The code to proceed-
#saving the model
import picklefilename = 'reviews_classifier.sav'
pickle.dump(classifier, open(filename, 'wb'))
This will save the Classifier model as a .sav file in the local directory of your Jupyter notebook/Python IDE.
Now we also have to save the corpus for later use. In linguistics and NLP, corpus (literally Latin for body) refers to a collection of texts. Here the corpus is the collection of all the reviews. We shall be needing the corpus to be able to use the count vectorizer for our model. When we use a new input, it should be such that it can be fed to the model. So for that the count vectorizer will be needing the corpus. So we proceed with saving the corpus.
#saving the corpus
type(corpus)
with open('corpus.data', 'wb') as filehandle:
# store the data as binary data stream
pickle.dump(corpus, filehandle)
So we save the corpus as a file as well. It gets saved as corpus.data.
Creating a .py executable file to run the classifier
We shall be loading the corpus as well the model from thier files.
import pickle
from sklearn.feature_extraction.text import CountVectorizer#model filename
filename = 'reviews_classifier.sav'loaded_classifier = pickle.load(open(filename, 'rb'))cv = CountVectorizer(max_features = 2000)#loading the corpuswith open('corpus.data', 'rb') as filehandle:
# read the data as binary data stream
corpus = pickle.load(filehandle)
#fitting the count vectorizer with the corpus
cv.fit_transform(corpus)
Now we shall work on the python code, pretty much everything now is simple python code.
print("Welcome to the Restaurant Review analyser")
print("The output will be either positive or negative.")
print("A count vectorizer was used")
print("---------------------------------------------------")
user_input=input("Enter the review of the restaurant: ")test = [user_input]
test_vec = cv.transform(test)val=loaded_classifier.predict(test_vec)[0]print("---------------------------------------------------")if(val==0):
print("The review entered was negative.")
print("The user did not like the restaurant.")if(val==1):
print("The review entered was positive.")
print("The user liked the restaurant.")
This code if executed will give the user a choice to enter the review in plain English and will give an output where the review was positive or negative. This project is overall implementation of the basic NLP techniques.
One of the main goals of NLP is to derive meaning from human language in a smart and useful way. More advanced applications include text summarization, topic extraction, sentiment analysis, relationship extraction and much more. Follow me to know about Machine Learning, Data Science, NLP and much more.
The Github Repo having all project files-
Do give a star if you like the work.
Thank You.