Bangalore Restaurant Analysis Using Power BI

Published in

Analytics Vidhya

7 min readOct 9, 2019

In this article, I will explain some useful insights that I was able to gather by analyzing a dataset sourced from Kaggle.com using Power BI Modeling and Reporting Capabilities. (The report can be accessed here). It will help anyone starting with Power BI to understand how we can quickly clean and model some data at hand to gain quick insights.

The dataset is a Zomato extract from around 6 months ago. The intent of analyzing the Zomato dataset is to get a fair understanding of the current situation of the foodservice industry in Bangalore. We also want to identify factors affecting the different types of restaurants in different places, the popular cuisines and meal types, Bengaluru being one such city that has more than 12,000 restaurants serving dishes from cuisines all over the world.

Cleaning & Transforming Data

The data source was a CSV file consisting of a dataset with 17 columns. On Power BI Desktop we click on “Get Data” from the CSV file and proceed onto transforming/cleaning the data in the Query Editor. By default, the first row is promoted to headers and the column types are changed.

Initially Applied Steps in the Query Editor

I removed the following columns since I would not be using them as a part of this analysis: reviews_list, menu_item, listed_in(city).

Note: The columns “rest_type” and “listed_in(type)” might seem to be redundant but on reviewing them closely we observe the difference in values. “rest_type” refers to the restaurant type and “listed_in(type)” refers to the meal type. Hence we retain both the columns.

2. I performed the replace value operation on the “rate” column multiple times to remove the “/5” from all values, to replace blank values, to replace “-” and to replace “NEW”.I then converted the data type for “rate” to decimal.

3. For restaurants with names like “Café” the “é” is observed to be replaced with special characters like “ƒ”,” Â”,”©”. I performed multiple replace operations on the column “name” to handle the same.

Note: There is a huge range of functions available in the power query language “M” which can be further explored to concatenate the replacing of characters in a single step.

Analysis & Modeling

Observations

There are around 51k rows in the dataset.
Different addresses have the same restaurant names (restaurant chains).
The same address applies to multiple restaurants at the same location.
Each entry in the dataset is already at a location level i.e. a single location per restaurant name and address but if we analyze the columns like “rest_type”, “dish_liked” and “cuisines”, they have multiple entries concatenated in a single column. We would want to extract the individual values for these to perform analysis like the top cuisines, dishes or restaurant types.
Multiple entries are present for a name-address combination differing in the column “listed_in(type)”.

Multiple entries for same restaurants with different listed_in(type)

6. Even for the same “listed_in(type)” values for a name-address combination, we notice multiple rows for different vote values or rate values. We also see an inconsistency between the values in the phone column concerning spaces between the numbers for the same combination of name-address-listed(type).

Multiple entries for the same name-address-listed_in(type) with different votes

Conclusions

Conclusion 1: It would not be correct to take the total count of “names” from this query to count the number of individual restaurants or the count of restaurants falling under a restaurant chain. We can create a new custom column called “Restaurant Address” which is a concatenation of the fields “name” and “address” to do uniquely identify a restaurant at a location.

Added custom column “Restaurant Address”

Note: I also added steps to transform the custom column “Restaurant Address” to upper case and trim it because I noticed inconstancies in letter casing and spacing for the same addresses.

Conclusion 2: We need to extract the unique values from the columns rest_type, cuisines and dishes_liked and implement them as individual queries relating them to the restaurants via a similar “Restaurant Address” column. As a result, we can also eliminate these columns from the query “Zomato.” Refer to the “Restaurant Type” table below.

Steps for Implementing query “Restaurant Type”

Implementation :

To start with creating an individual query for Restaurant Type, first, duplicate the current query “Zomato” and rename it to Restaurant Type.
Changed the “Remove Other Columns” step to only retain “address”, “name”, “rest_type”. Remove all the steps to clean the rate column as we are not retaining it for this query.
Retain the step to add a Custom Column “Restaurant Address”.
Additional Steps — Right-click on the rest_type column and click on “Split Column By Delimiter”. Make the selections as below by selecting a comma as the delimiter and split the data into rows.

Split Column Settings to split the column rest_type

5. Right-click on “rest_type” again to transform and trim the column. The final table should like the one shown in the above screenshot.

We can implement the Dish and Cuisine queries in a similar way.

Conclusion 3: In the “Zomato” query, we can aggregate the data for “vote”, “rate”,” phone” and “cost for 2” at all other column levels to have an aggregated set of data. To do that we right-click on a column and select “Group” to perform the following operation and name it as “Aggregate Rows” :

Group Settings to Aggregate the Zomato table based on Votes, Cost for 2, Rate and Phone

There will be many to many relationships established as shown below using our custom column Restaurant Address and this is how the final tables will look like.

Reporting (You can access the report here)

Now that the data is clean and ready to report on, I can discuss the reports and visualizations I have compiled as a part of this activity.

I have divided my reports to provide the following :

An overview of higher-level insights from the dataset.
Some analytical insights.
A report to help users select restaurants based on their inputs.

Introduction: The first report acts as a starting point. There is a brief introduction with a few basic numbers reported. It gives you an option to navigate to the Overview, Analytics or Search reports. (Standard use of bookmarks for navigation purposes)

Overview: There are 2 reports present as a part of the Overview.

Quick Insights from 1st Overview Report-

CCD, Just Bake, Domino’s, Five Star Chicken & Pizza Hut are among the restaurant chains with maximum restaurants.
The most available cuisine is North Indian followed by Chinese and South Indian.
A huge percentage of restaurants offer Online orders but very few offer table bookings.
Maximum no of restaurants (44.7 %) fall under the Meal Type Delivery.
The number of restaurants is dense towards central Bangalore.

All the visualizations on the report are interactive and can be used to filter the other and gain specific insights!

Quick Insights from 2nd Overview Report-

Chicken, Burger, Chocolate & Pizza are among the most popular dishes for Meal Type Cafes
Lavelle Road, Koramangala 3rd Block, St. Marks Road, Sankey Road, and Church street have the highest average rating of restaurants overall.
The two visualizations can be filtered down by selecting a Meal Type from the slicer available.

The two visualizations can be filtered down by selecting a Meal Type from the slicer available.

Analysis: There is 1 report available as a part of the analysis.

Quick Insights from Analysis Report

The majority of the restaurants serve food under Rs 1000 for 2!
The majority of the restaurants fall under the ratings 3–4!
Restaurants are able to maintain higher ratings even without offering Table Bookings.

All the visualizations on the report are interactive and can be used to filter the other and gain specific insights! The visualizations can be filtered down by selecting a Meal Type from the slicer available.

Search: This report offers to list down restaurants along with some useful information about them based on user inputs for location, cuisine and meal type. The results can be further filtered based on the availability of Table Booking, Online orders and price range. The restaurants can be sorted based on rating (highest rating first) and cost (lowest cost for 2 first).

There is still a huge scope to gain further insights from this dataset but I hope this serves as a helpful starting point. Any feedbacks are welcome:)!

The link to the report is here.

Bangalore Restaurant Analysis Using Power BI

Cleaning & Transforming Data

Analysis & Modeling

Observations

Conclusions

Written by Preksha Punwani