Exploring NYC’s Asian Food Spots with Yelp Fusion API

Finding out if New Yorkers care about price in rating their restaurants

Catharina Kartika Utami
6 min readOct 27, 2023
A picture I took on 16 October 2022 while queueing for a portion of their famous roasted pork and duck with white rice.

It was a cloudy and windy day when I first experienced queueing for food in NYC in October 2022. It was the celebration of my first month living in the US and I decided to go on a day trip to Chinatown to check out the world-famous takeout-only spot named “Wah Fung №1 Fast Food”. It was incredible.

Even though this post won’t be covering the details of my wonderful bite of roasted pork, I will instead share my journey of exploring NYC’s Asian category food and restaurants through my first encounter working with the API dataset. I will be calling the data from Yelp Fusion API, a free tool that provides developers (including beginners like me) with access to Yelp’s local business data and user reviews, allowing for integration into applications, websites, and services. It offers detailed business information, including ratings, price levels, and location, making it a valuable tool for local search and business insights. The API only presents data for business with reviews. For this research, I aim to focus on businesses in New York City.

Landing page of Yelp Fusion API

This project aims to collect data from Yelp Fusion API, filter businesses with categories related to Asian cuisines, and then analyze the consumer pattern by calculating statistical correlation and creating visualizations.

Our research question in this case is:

Does the price point of food places and restaurants with Asian categories in NYC correlate with the ratings they received?

1. Filtering data

#the parameters I used for API request and then I looped it to get 1000 results
#I included both restaurants and food places (for instance like Wah Fung, which is take-out only
params = {
'limit': 50, # 50 is the max per request
'location': 'newyork',
'term':'restaurant , food',
'radius':20000,
}

After retrieving the API data and doing some cleaning, I started to explore how I would filter the 1000-row dataset. I decided to print the unique keywords in the categories column, which presents me with lots of keywords I can choose from for the filtering.

I handpicked some keywords that relate to Asian cuisines. In the process, I also found the challenges of determining the keywords of the category such as the question of does noodles really only belong to Asia?

The selection of keywords for the filter is as presented below:

asian_keywords = [kw.lower() for kw in [
'Shanghainese','Malaysian','Noodles','Thai', 'Chinese', 'Filipino','Persian/Iranian',
'Lebanese','Taiwanese','Sushi Bars','Korean','Uzbek','Hong Kong Style Cafe','Hot Pot','Shaved Ice',
'Middle Eastern','Bubble Tea','Bangladeshi','Japanese', 'Izakaya','Cambodian', 'Indian',
'Kebab', 'Asian Fusion', 'Ramen','Cantonese', 'Szechuan', 'Dim Sum', 'Vietnamese'
]]

def has_asian_category(categories_list):
for category in categories_list:
if category['alias'].lower() in asian_keywords or category['title'].lower() in asian_keywords:
return True
return False

asian_df = original_df[original_df['categories'].apply(has_asian_category)]

2. Analysis

Screenshot of the dataset

In our analysis of places with Asian category, I noticed 'price' was indicated with symbols like $, $$, $$$, to $$$$. So before diving into the stats, I had to transform these into numbers: 1 for $, 2 for $$, and so on. I also decided to exclude entries without price indicators.

I used linear regression to see if higher meal prices result in better ratings. What is the finding? Well, every step up in price (like going from $ to $$) was associated with only a slight increase of 0.0396 in rating. The p-value, a measure of significance, was a relatively high 0.3917. This suggests that the relationship between price and rating is not statistically significant and could be due to random chance. The R-squared value was only 0.43%. This means that just under 0.5% of the variation in ratings of Asian food places and restaurants in NYC can be explained by price. (So, is it safe to say that New Yorkers agree with me that price might not matter when it comes to rating my delicious roasted pork over rice?)

Coefficient for Price: 0.039576365663321844
P-value for Price: 0.3917309049359192
R-squared: 0.004318427046182682

After the regression analysis, I felt it was essential to delve deeper into the data and provide a more visual representation. The regression plot below visually represents the findings. The red line represents the predicted rating trend across price tiers. Although there’s a very subtle incline, suggesting pricier restaurants might have marginally higher ratings, the encompassing shaded area signals a substantial variation, emphasizing that price isn’t the sole determinant. The scattered data points further highlight the variances within each price category. Clearly, diners base their judgments on more than just price.

I also visualized the data with a histogram to further explore the relationship between price and ratings. A significant concentration of mid-priced eateries (denoted by the blue bars) sitting comfortably at a 4.0 rating. On the fringes, cheaper and high-end establishments scatter more diversely. While mid-range dining spots consistently secure decent reviews, there’s a broader spectrum of experiences, both good and bad, at the price extremes. This opens an opportunity for further future qualitative research on this topic.

3. Playing with Plotly Mapbox

Diving deeper into data visualization, I turned to Plotly Mapbox to visually represent the distribution of Asian food places and restaurants in NYC. Mapbox is a dynamic tool that renders interactive maps, and it predominantly uses the Web Mercator Coordinate Reference System (CRS) to project geographical data. An advantageous feature of Mapbox is its capability to allow an interactive hover feature; I took advantage of this by embedding essential information such as the name of the business, its rating, and price directly onto the map. To ensure a seamless experience for my readers on Medium, I employed Plotly Chart Studio, which aids in effortlessly integrating interactive maps into a hyperlink that I can embed here in my post. We can see here that there are some outliers of places in New Jersey that sadly are still included in the dataset — homework for the next projects.

Map of Asian Category Businesses. This is the first map that I built using Plotly Mapbox. For more fun, try to hover in one of the location markers!

To complete the overall outlook, I crafted an additional map comparing the distribution of Asian food spots to the others listed in the API dataset. I intentionally designed the Asian category markers to be interactive, allowing users to delve into specifics while keeping the non-Asian category markers static to maintain focus and clarity in visualization.

Distribution Map of food places and restaurants with Asian category in New York City

Conclusion

While this research provides valuable insights into the correlation between price and ratings of Asian food spots in NYC, it’s imperative to note its limitations. Utilizing only Yelp Fusion API data has its constraints, as it focuses primarily on Yelp user reviews, potentially sidelining establishments with feedbacks on other platforms. Additionally, the insignificance of only 0.43% variance attributed to price highlights the myriad of other factors affecting ratings. This investigation spotlights the complexities of diner choices, urging the exploration of more comprehensive, qualitative research to truly grasp the dynamics shaping restaurant ratings in the city.

References:

Glaeser, E. L., Kim, H., Luca, L. (2018). Nowcasting Gentrification: Using Yelp Data to Quantify Neighborhood Change. From https://www.hbs.edu/faculty/Publication%20Files/18-077_a0e9e3c7-eceb-4685-8d72-21e0f518b3f3.pdf.

Walsh, M. (2022). Embed an Interactive Plot with ggplot, plotly, & RStudio. https://melanie-walsh07.medium.com/sample-plotly-sharing-f045140f5f56

Data source:

Yelp, Inc. (2023). “Yelp Fusion API documentation,” from Yelp. Retrieved November 26, 2023, from https://fusion.yelp.com/.

--

--