Sentiment Analysis on bagel shop reviews — comparing New York City & Montréal

Published in

Nerd For Tech

6 min readJul 24, 2021

It’s safe to say that most people who have been to New York City or Montreal have probably had a bagel — the reliable, unabating food option as ubiquitous as the 24-hour restaurants and corner bodegas (or depanneurs in Montreal). Both cities are famous for their bagels, and each tend to claim theirs is better than the other’s. If you’re unaware of how contentious this debate is, just ask someone who has been to or lives in either city.

In 2008, when NASA officials asked Canadian astronaut Gregory Chamitoff what food item he wanted to bring aboard Discovery for a two-week flight into space, he requested fresh bagels from his cousin’s Montreal bagel shop, Fairmount Bagel.

For locals and tourists alike, Yelp may sway you to visit one shop over the other due to their business page attributes, namely the star rating and reviews left by visitors.

As an experiment, I wanted to see if I could determine which city bagel lovers prefer, and in order to answer this, I started an analysis and posed this question: Can one compare the same types of restaurants on Yelp (in my case, restaurants that serve bagels) between two places (in my case, New York City and Montreal) to determine which has the better product or service?

As someone who lives in Montreal and has visited New York and tried both types of bagels, I felt like a suitable candidate to take a deep dive into the analysis. To clarify the distinction, Montreal-style bagels are handmade and wood-fire baked, and in contrast to the New York-style bagel, are smaller, thinner, leaving a larger hole, and also sweeter with a denser texture. New York-style bagels tend to be much larger (like everything in America) with a softer, chewy interior.

Despite not being able to say which city has the better bagel after my investigation, I believe that with this new information, instead of relying on numeric attributes already existing on Yelp like star rating and number of reviews, my findings offers a look into the actual text data which has been converted into meaningful insights about the bagel experience at the given restaurant.

As far as I know, there haven’t been any data-driven approaches to answering the question I posed, however, many data scientists have used review data from Yelp to predict ratings of restaurants and to evaluate future development of businesses with various Natural Language Processing (NLP) techniques, which were helpful for me when looking at angles for how to approach my research question.

In order to start my investigation, I needed data. I decided to scrape the following attributes from Yelp to form my dataset:

Restaurant name
Customer name
Customer location
Review text
Review rating (customer)
Review date

I kept getting blocked from Yelp’s website when using Python’s web scraping libraries — Selenium and BeautifulSoup — and in the end I discovered a free Google Chrome extension called ‘Web Scraper’. I never once got blocked when using their service, and would recommend it.

I wanted to get a general idea of review and rating behaviour in both cities, so to begin my analysis I simply scraped a combined ~10,000 reviews & ratings from Yelp consisting of a variety of restaurant types. This wasn’t proving to be very helpful in terms of insights, so I dove right in to the bagel datasets, consisting of a combined ~7,300 reviews and ratings for New York and Montreal shops that sold bagels.

Once I read my CSV files containing the Yelp data into Python, I converted each of them into a Pandas DataFrame. Before my investigation, I made sure that each CSV consisted of the same columns as to make my cleaning process more streamlined.

I dropped unnecessary columns that were generated by the web scraper and renamed some columns for interpretability. There weren’t any null values or duplicates to take care of, however, I had to convert the ‘rating’ column since it was read in as a string. ‘5 star rating’ turned into a simple integer ‘5’.

I had a hunch during the scraping process that the star rating distribution would be very imbalanced, as I primarily only saw 5 and 4 star ratings. In the future, I would like to import more review data into an ML pipeline to balance out the dataset, as I believe this skewed results in the end. Moving forward, I binarized the rating column so that reviews with a rating of 4 and 5 become a 1 (positive) and every rating under 4 to become a 0 (negative). The class imbalance for Montreal was 25% negative and 75% positive, and for New York, 24% negative and 76% positive.

Looking at the review length distribution, there is a very similar distribution across the board for each star in each city, which gave me confidence that the model would be less biased.

Once I binarized the positive and negative reviews, I divided them into their own data frames with just the text column. Each review was then tokenized using TFIDF (short for term frequency-inverse document frequency) and 1-grams which were giving the most consistent results and clear topics. For topic modelling, I tried Latent Semantic Analysis (LSI), Latent Dirichlet Allocation (LDA) and Non-Negative Matrix Factorization (NMF). Who comes up with these names?!

The word weights for the negative review tokens where not very interpretable for LSI, however, I was able to determine the number of topics to use for my modelling based off of the topic strengths.

I was hopeful about LDA, but in the end due to my class imbalance and not enough negative review data, the overlap in meaning between positive and negative reviews made it difficult to make sense of the topics, so in the end I decided to go with NMF, which gave the most interpretable results and distinct topics.

I moved forward with two different sets of topics for negative and positive reviews. This differed between Montreal and New York, but not by much.

The process going forward consisted of mapping all of the reviews to topics, normalizing the topic weights to sum up to 1, and averaging the topic distributions for all reviews of a bagel shop to map that topic to the restaurant.

The process is seen clearly with my interactive Flourish visualization you can see here:

The newly modelled data is used to show topic strengths for both positive and negative reviews.

I set out to determine which city has the better bagel but as I spent more time with the data and started my modelling, it became clear to me that review data doesn’t really reflect the question of which city has the better bagel, as they are too conflated with other things like the various topics seen in my topic modelling.

Despite not being able to say which city has the better bagel after my investigation, I believe that with this new information, instead of relying on numeric attributes already existing on Yelp like star rating and number of reviews, my findings offers a look into the actual text data which has been converted into meaningful insights about the bagel experience at the given restaurant.

My next steps with this project are to look more into the customer metadata on a larger Yelp dataset to analyze the different review behaviour between locals and tourists. It would also be helpful to spend more time working on better tokenization (with more data) to see a greater variety of topics. Because of the overlap in meaning between negative and positive reviews, I could also try creating a singular topic space for all reviews instead of just positive and negative.

View the code on my GitHub: https://github.com/setadrift/capstone_1

Thanks,

Duncan Anderson
LinkedIn: https://www.linkedin.com/in/duncan-kg-anderson/
Twitter: @duncanand

Sentiment Analysis on bagel shop reviews — comparing New York City & Montréal

Written by Duncan Anderson