Improving restaurant operations and customer satisfaction through data-driven decision making

7 min readSep 12, 2023

project is to provide a tech-based solution for restaurant owners to help them make informed decisions and improve their business operations. Currently, restaurant owners lack access to a comprehensive and user-friendly platform that provides real-time data and analytics, and existing solutions are often expensive, difficult to use, and lack customization options. The project aims to address these issues and provide a solution that helps restaurant owners understand their customers’ sentiment and make data-driven decisions to increase revenue and customer satisfaction.

We aim to help restaurant owners improve their business operations and decision-making through our analytics-based dashboard. Our solution provides quantifiable data and visualization to help owners understand their business better and combine it with business intelligence to increase revenue and customers.

We identified five key pillars of the restaurant industry: food quality, ambience, hygiene, service, and review. By analyzing user reviews and sentiment, we can understand the mind of the customers and help restaurant owners prioritize improvements.

Our solution is designed to provide a better understanding of the current situation of the business and help restaurant owners make data-driven decisions. We believe that our solution can help restaurant owners overcome challenges and improve their business operations.

If you’re interested in learning more or would like to collaborate, please feel free to contact us at joshiharsh0506@gmail.com.The code can be received here

I have created the solution, now I will go through the creation of service, provide you with steps I took, things I learned, difficulties I faced and decisions I made. Tech related part first

The plan

The plan consists of creating a sentiment analysis model for restaurant review
Creation of crud-based Api for data and model
Creating frontend for business visualization dashboard
Dockerisation and deployment
Presentation

So, now we have our plan of action ready, so let us start developing the solution for restaurants.

Step 1: Creating a Sentiment Analysis Model for Restaurant Reviews

What is Sentiment Analysis: Sentiment analysis involves understanding the emotional tone conveyed in a text.

The Need for Sentiment Analysis: In the restaurant business, customer reviews can vary widely in tone, ranging from positive and constructive to negative and critical. Sentiment analysis helps us extract valuable insights from these reviews, improving the overall dining experience.

Approach to Building a Sentiment Analysis Model: Developing a Sentiment Analysis model falls within the realm of Natural Language Processing (NLP) and Machine Learning (ML). NLP focuses on teaching machines to comprehend and process human language effectively. In this case, our objective is to train a machine to understand the sentiment expressed in restaurant reviews.

Data Gathering: To train our machine effectively, we require a dataset containing restaurant reviews and corresponding sentiment scores. This dataset can be sourced from various internet repositories. Specifically, we need a restaurant reviews dataset to train our machine for this specific use case. You can find a suitable dataset at link. Ensure that the dataset you choose is diverse and balanced to improve the model’s accuracy and reliability.

Data Exploration and Preprocessing:

Data exploration and preprocessing are crucial steps in building a sentiment analysis model. When working with raw data, it’s important to consider several factors. The data might contain features that are irrelevant to our specific use case, and it could have duplicate or empty rows. Handling these issues is essential to prevent errors or bias in our results.

To begin, we load the data from various sources, such as CSV and TSV files, into dataframes using Pandas. We then select the columns that are relevant to our sentiment analysis task. In this case, we focus on two columns: ‘Review’ (independent feature) and ‘Sentiment Score’ (dependent feature). These columns contain the necessary data for our analysis.

Additionally, we need to address the issue of imbalanced data. Imbalanced data can introduce bias into our model’s predictions. Depending on the dataset, we may employ techniques like oversampling the minority class, undersampling the majority class, or using methods like Synthetic Minority Over-sampling Technique (SMOTE) to balance the data.

Finally, we must also explore the data’s distribution and any underlying trends. Understanding the data’s characteristics is essential for building an effective sentiment analysis model.

# Load data from various sources
RDF1 = pd.read_csv('../data/Restaurant reviews.csv')
RDF2 = pd.read_csv('../data/Restaurant_Reviews.tsv', delimiter='\t')
RDF3 = pd.read_csv('../data/Yelp Restaurant Reviews.csv')

# Select relevant columns
RDF1 = RDF1[["Review", "Rating"]]
RDF3 = RDF3[["Review Text", "Rating"]]

Data Cleaning and Text Vectorization:

After obtaining the necessary data columns, “reviews” and “sentiment scores,” the next step involves data cleaning to remove unwanted elements. This includes eliminating HTML tags, emails, phone numbers, special characters, and URLs. Additionally, we remove common “stop words” that lack significant meaning. Stemming is applied to reduce words to their basic form, ensuring uniformity by converting text to lowercase.After cleaning the data :

Text vectorization is crucial for machine understanding, as machines require numerical formats. Various methods exist, such as bag-of-words, TF-IDF, n-grams, Word2Vec (C-BOW, Skip-gram), GloVe, and FastText. Each has distinct advantages and disadvantages:

Bag-of-Words (BoW):

Advantage: Simple and efficient, suitable for small datasets.
Disadvantage: Ignores word order and context, losing meaning.

TF-IDF (Term Frequency-Inverse Document Frequency):

Advantage: Weights words by importance, favoring rare but meaningful words.
Disadvantage: Lacks word context and struggles with synonyms.

N-grams:

Advantage: Captures word order and context, valuable for text classification.
Disadvantage: Expands vocabulary significantly with higher n-values, causing sparsity.

Word2Vec (C-BOW and Skip-gram):

Advantage: Captures word context and semantic relationships, effective in NLP tasks.
Disadvantage: Requires substantial training data, may struggle with out-of-vocabulary words.

GloVe (Global Vectors for Word Representation):

Advantage: Combines global and local context, yielding meaningful word embeddings.
Disadvantage: Demands extensive training data and can be memory-intensive.

FastText:

Advantage: Handles out-of-vocabulary words through subword embeddings, valuable for morphologically rich languages.
Disadvantage: Requires more storage space and can be computationally intensive due to subword modeling.

In our case, we chose TF-IDF for three reasons:

Prioritizes Unique and Emotionally Charged Words: TF-IDF highlights rare and emotionally charged words.
Noise Reduction: It reduces the impact of common, non-informative words.
Flexible and Interpretable Features: TF-IDF allows for feature selection and provides human-readable insights.

from sklearn.feature_extraction.text import TfidfVectorizer 
tfidf = TfidfVectorizer(max_features=6000, ngram_range=(1,2), smooth_idf=False) 
tfidf.fit(X_train) 
tfidf_X_train = tfidf.transform(X_train).toarray() 
tfidf_X_test = tfidf.transform(X_test).toarray()

Model Creation and Evaluation:

After converting text into numerical data, we applied various machine learning algorithms for sentiment analysis. Each algorithm has its advantages and disadvantages:

Support Vector Machine (SVM):

Advantage: Effective for high-dimensional data and complex decision boundaries.
Disadvantage: Requires careful hyperparameter tuning and can be slower for large datasets.

Random Forest:

Advantage: Ensembles decision trees for robustness and accuracy.
Disadvantage: May overfit on small datasets and lacks interpretability.

Logistic Regression:

Advantage: Simple, interpretable, and computationally efficient.
Disadvantage: Struggles with capturing complex relationships in data.

Multinomial Naive Bayes:

Advantage: Suitable for text classification tasks like sentiment analysis.
Disadvantage: Assumes independence between features, which may not hold for text data.

Bernoulli Naive Bayes:

Advantage: Effective for binary feature data like binary bag-of-words.
Disadvantage: Less suitable for capturing word frequencies and nuances in text.

In our case, Bernoulli Naive Bayes outperformed others for three reasons:

Binary Feature Handling: Well-suited for binary feature data like binary bag-of-words, common in sentiment analysis.
Simple and Efficient: Offers simplicity and efficiency, making it a fast and lightweight choice for text classification.
Interpretability: Its probabilistic nature allows for easy interpretation of class probabilities and feature importance.

We evaluated the results using precision, recall, accuracy, and F1 score to assess the model’s performance:

Precision: Proportion of true positive predictions among all positive predictions.
Recall: Proportion of true positive predictions among all actual positives.
Accuracy: Fraction of correct predictions among all predictions made.
F1 Score: Harmonic mean of precision and recall, balancing both metrics.

Here’s a code snippet for the evaluation:

bnb = BernoulliNB(alpha=0.39800561441561, fit_prior=True)
bnb.fit(tfidf_X_train, Y_train)

bnb_Y_train_pred = bnb.predict(tfidf_X_train)
bnb_Y_test_pred = bnb.predict(tfidf_X_test)

bnb_train_accuracy = accuracy_score(Y_train, bnb_Y_train_pred)
bnb_test_accuracy = accuracy_score(Y_test, bnb_Y_test_pred)

bnb_precision = precision_score(Y_test, bnb_Y_test_pred)
bnb_recall = recall_score(Y_test, bnb_Y_test_pred)
bnb_f1 = f1_score(Y_test, bnb_Y_test_pred)

print("Bernoulli Naive Bayes Train Accuracy:", bnb_train_accuracy)
print("Bernoulli Naive Bayes Test Accuracy:", bnb_test_accuracy)
print("Bernoulli Naive Bayes Precision:", bnb_precision)
print("Bernoulli Naive Bayes Recall:", bnb_recall)
print("Bernoulli Naive Bayes F1 Score:", bnb_f1)

As a result, we achieved an 85% accuracy, marking the completion of the first step.

Conclusion:

We’ve laid the foundation for a restaurant review sentiment analysis system. TF-IDF and Bernoulli Naive Bayes showed promise with an 85% accuracy rate. Now, we’ll build an API and website for user accessibility.

Next Steps in part 2:

API Development: Create a user-friendly API.
Website Creation: Develop an intuitive website.
Optimization: Ensure scalability and efficiency.
User Feedback: Continuously improve based on user input.

The part two:

Improving restaurant operations and customer satisfaction through data-driven decision making (part -2)

About myself:

I’m Harsh Joshi, currently pursuing my master’s in AI and ML with a solid year of project development experience. My curiosity drives me to embrace new challenges and amalgamate diverse elements to achieve results. I firmly believe in approaching personal projects from a business perspective, striving to deliver practical and valuable solutions.

📧 Email: joshiharsh0506@gmail.com
GitHub: harsh0506
Portfolio Website: harshjcodes.netlify.app
LinkedIn: Harsh Joshi

Improving restaurant operations and customer satisfaction through data-driven decision making

The plan

Written by Joshiharsh