Casablanca’s Restaurants Clustring

Aissammy Zone
6 min readJul 12, 2020

--

Casablanca is the largest city of Morocco. Located in the central-western part of Morocco bordering the Atlantic Ocean, it is the largest city in the Maghreb region and the eighth-largest in the Arab world. Casablanca is Morocco’s chief port and one of the largest financial centers in Africa. According to the 2019 population estimate, the city has a population of about 3.71 million in the urban area and over 4.27 million in the Greater Casablanca. Casablanca is considered the economic and business center of Morocco, although the national political capital is Rabat.

The leading Moroccan companies and many international corporations doing business in the country have their headquarters and main industrial facilities in Casablanca. Recent industrial statistics show Casablanca holds its recorded position as the primary industrial zone of the nation. The Port of Casablanca is one of the largest artificial ports in the world, and the second largest port of North Africa, after Tanger-Med 40 km (25 mi) east of Tangier.

In this exercise I will focus on clustering all the restaurants in this big economic city, because it’s the most populated city in Morocco.

Introduction/Business problem

Casablanca, the city the author lives in, attracts a large number of investors, and less number of tourists, but it remains the biggest economic City where people can find a lot of job opportunities, that’s why the people living in Casablanca are various. For foreign people especially, finding the right place to eat can be a challenge because the city is too big and Moroccan dishes may not be convenient for them.

Thus, the goal I want to reach with this exercise is to give a simple recommendation to people in Casablanca: in which district of the city will you find a large number or even concentration of which types of restaurants? Where to eat Mediterranean food, where to find German food, French food, Italian food, where to get fast food? The target audience are foreign people and also investors who want to open a new restaurant, so it’s crucial for them to know how the existing restaurants are distributed in Casablanca’s districts.

Description of the data

I will, as requested by the assignment task, use foursquare data about restaurants in Casablanca. Foursquare is a US tech company from New York focusing on location data. Their technology and data powers apps such as Apple’s Maps, Uber, Twitter and many other household names. Here is an example of restaurant in Casablanca on foursquare: https://fr.foursquare.com/explore?mode=url&near=Al-Markaz%2C%20Casablanca&nearGeoId=10011708&q=Nourriture . I will use foursquare data such as the restaurant name, ID, location and category of food (vegetarian, Italian etc.).

Also, I will use the overview of districts/city parts of Casablanca from Wikipedia: https://en.wikipedia.org/wiki/Casablanca

https://sites.google.com/site/collectivitesaumaroc/regions/grand-casablanca/provinces-et-communes-du-grand-casablance

Here, you will find a table “Districts” which shows the nine city districts and its neighborhoods/city parts. I will use these districts and the data about restaurants in these districts from foursquare to show the density of restaurants in them.

Methodology

In this section, I will describe the data analysis and how I used the data to yield the results.

Starting out, I scraped data from Wikipedia to create a dataframe with the city Boroughs of Casablanca: https://sites.google.com/site/collectivitesaumaroc/regions/grand-casablanca/provinces-et-communes-du-grand-casablance. For this, I used the pandas read function. I had to clean the resulting data frame in terms of unnecessary information or data that could not be handled in a data frame, such as picture data of the coat of arms of each district. The result is a nice data frame:

Then, I enabled geopy functions by installing the conda-forge geopy package. I used the nominatim function to add geospatial data to the data frame that is the latitude and the longitude seen on the right side of the following table.

Using the folium package and my data frame, I then created a map with on it the Casablanca’s Boroughs.

Then, retrieved the foursquare data for all venues on foursquare with a distance of less than 900 meters from each center of borough, as indicated as blue dots in the map above. The result was a list of 757 venues all over Cologne city. Out of these 414 venues, 89 where restaurants. These 89 restaurants come from 19 unique restaurant categories, such as Italian, Moroccan or French.

I plotted a bar chart with the frequency of the 10 most frequently occuring restaurants in the whole city, using seaborn/matplotlib packages. We can see that Fast Food, Italien, French, Seafood and Moroccan restaurants are the most frequently occuring restaurants in Casablanca, which seems pretty reasonable.

To find clusters of restaurant types in the different Borough districts, I first transformed the data frame with the restaurant, associated to Boroughs, by one-hot encoding (0/1), as seen in the picture below.

Next, I used grouping to show the frequency of each category of restaurants in each borough.

I used this information to create a data frame in which you can see the most common restaurant venue types for each borough.

What we see in the table are the city districts and their most common venues, and they now have been assigned five different cluster labels from 0 to 3.

We can now use the cluster labels to show the city districts marked with a cluster-specific color on a map, again using folium:

You will see the 15 bubbles for the boroughs after removine the none clustred ones, with four different colors for the four different clusters.

Cluster 1 — the French, Italian & Seafood Cluster

Cluster 2 — the Fast Food Cluster

Cluster 3 — the Moroccan Food Cluster

Cluster 4 — the sushi and Italian Food

Discussion

If I reflect the work necessary to create these results, what comes to my mind is that for typical ways of scraping, cleaning, handling, transforming and visualizing data, all the tools are simply there. We just have to get to know the available open source packages and learn how to use them. What I find fantastic is that nearly all of them are free of charge. Also, a simple notebook computer is enough: in my case, I used a ThinkPad T430, more than three years old. All the rest is concentrated, creative, interesting, sometimes hard work and searching for hints, tips, examples, explanations etc. in the web. With these tools, many exciting data science use cases can be created, for all kinds of useful purposes.

Conclusion

The biggest constraint I encored in this projects is the lack of restaurant venues data this can be visible in the classified clusters, the same to be logic, except the fourth one.

We achieved the goal presented at the outset of this report: tourists can see in the results which city districts best match their food desires. This is just one example of fantastic data science uses cases one can realize applying technology which is available for free today! What a time to be alive.

--

--