Interactive Geospatial Data Visualization: Covid-19 Case Study

Diardano Raihan
Sekolah.mu Technology
12 min readJun 29, 2022

Build interactive visualizations to deliver insights on geospatial data using Python

A. Introduction

A.1. Motivation

As data analysts, sooner or later we might “play” with data related to geographical location on earth, commonly known as geospatial data (the term is used interchangeably with spatial data) in which its absolute location is defined by two coordinates: longitude and latitude. Revealing patterns and trends from this type of data becomes an essential skill in data science to provide meaningful insights for organizations to have better decision-making in the future. Suppose we want to know how companies pick their business locations, the best regions to invest more in marketing, or even inspect areas most affected by a pandemic. In that case, squeezing insights from geospatial data is a must-have skill at hand!

Analyzing geospatial datasets can be a daunting task. Fortunately, Python contains many valuable libraries that help us deliver insights into geospatial data with cool and interactive visualizations.

A.2. Objectives

In this project, we will focus on confirmed cases of Covid-19 in Indonesia and around the world and then visualize them using the following techniques:

  • Choropleth Map
  • Geographical Scatter Plot
  • Marker
  • Marker Cluster
  • Geographical Heat Map

We will equip you with tools leveraging Python libraries such as Pandas, Plotly, Geopy, and Folium to perform the task so you can pick the visualization methods best fit your future project.

Excited already? Let’s get started!

Notes:
This article will be quite extensive and technical, so feel free to jump into the section that fits your needs. Also, this project is documented in my GitHub repository. For those who are curious about the full code, please do have a visit.

Table of ContentA. Introduction
A.1. Motivation
A.2. Objectives
B. Data Description
B.1. World Covid-19 Cases
B.2. Indonesia Covid-19 Cases
B.3. Indonesia Province Map
C. Geospatial Analysis with Plotly
C.1. Plot Confirmed Cases with Choropleth Map
C.2. Plot Confirmed Cases with Geographical Scatter Plot
D. Geospatial Analysis with Folium
D.1. Generate a Base Map
D.2. Plot Confirmed Cases with Marker
D.3. Plot Confirmed Cases with Marker Cluster
D.4. Plot Confirmed Cases with Geographical Heat Map
E. ConclusionReferences

B. Data Description

B.1. World Covid-19 Cases [1]

The dataset contains daily records of Covid-19 cases in 198 countries from 22 January 2020 to date (last updated on 16 April 2022). Using the Python Pandas library, we can import the dataset via URL and store it as a DataFrame called df.

B.2. Indonesia Covid-19 Cases [2]

The dataset records Covid-19 daily cases in 34 provinces in Indonesia from 1 March 2020 to 18 August 2021. Again by using the Pandas library, we can import the downloaded dataset and store it as a DataFrame called df_indo.

B.3. Indonesia Province Map [3]

The administrative map of all Indonesia provinces in GeoJSON format. Notice that the key ‘feature’ stores a list of province attributes such as polygon geometries, properties, and id which is the same as a province code (‘kode’) inside the properties.
Using the JSON library, we can load the downloaded JSON file and store it as indo_provinces in the form of a Python dictionary.

{'type': 'Feature',
'geometry': {'type': 'MultiPolygon',
'coordinates': [[[[117.6272, -8.5064],
[117.6347, -8.5577],
. . .
[119.0875, -8.12935]]]]},
'id': 52,
'properties': {'ID': 2,
'kode': 52,
'Propinsi': 'NUSA TENGGARA BARAT',
'SUMBER': 'Peta Dasar BAKOSURTANAL Skala 1 : 250.000'}}

C. Geospatial Analysis with Plotly

Notes:
By default, all plots generated by Plotly will use "Inferno" or px.colors.sequential.Inferno color scheme and we will stick with it. Visit this page to see what other built-in colors that might suit your need then pass the argument to color_continuous_scale parameter inside the function to apply it.

C.1. Plot Confirmed Cases with Choropleth Map

A choropleth map is a thematic map where specific geographic areas (in the form of polygons) are colored or shaded with respect to the interval of a quantity. Thus, the map will have a “key” to explain what the shading means. In Python, we can use the Plotly library to implement the map via px.choropleth function.

C.1.1. Choropleth Map: World Covid-19 Cases

Plotly has built-in geometries for Countries and USA States that allow us to visualize data in seconds without using an external GeoJSON file. We only need to specify a few parameters inside px.choropleth function.

In our case, inside the function, we assign the data_frame parameter with df as the DataFrame name to be used. Then, we specify locations with "Country", the column name of df as the list of areas to be plotted. Since we use country names, we need to set "country name" for the parameter location_mode. Finally, because we want to see the confirmed Covid-19 cases for each country, we must set the "Confirmed" column as the color key for the map.

Additionally, we can shape the map as we want by assigning value to projection. My personal preference would be "natural earth" instead of the default "world". Also, we can animate the map based on the frames it creates by assigning the "Date" column to animation_frame. Put everything inside the function and assign it to a variable called fig.

After modifying the title of the map and color bar via fig.update_layout() function, our first choropleth map can be shown below:

Gif by Author

Beautiful isn’t it? The map is interactive in the sense that we can play around with the visual such as zooming in and out of the map, having a button to play around to a specific frame or date that we are interested in, and investigating all countries by hovering the mouse on top of it.

From the map, we can observe that the first country of the confirmed Covid-19 cases is China with around 500+ cases recorded on 22 January 2020. As time goes by to date (16 April 2022), US, India, and Brazil are the top 3 confirmed Covid-19 cases with the US having somewhere around 80 million, followed by India (43 million) and Brazil (30 million). On the other hand, Indonesia is seated in the 18th place with 6 million cases and much more cases than China in 45th place with 1,7 million cases.

C.1.2. Choropleth Map: Indonesia Covid-19 Cases

We just see how easy it is to create a choropleth map for countries around the world. Now, we are ready to step up the game. Plotly can also map areas arbitrarily as long as we have its geometry information stored in a GeoJSON file. For instance, we can plot Indonesia provinces onto the map to see the trend of confirmed covid-19 cases.

To make this possible, the choropleth map function expects us to have two main inputs:

  • Geometry information supplied from a GeoJSON file with "id" information stored inside each feature (or identifier value like in "properties" );
  • a feature identifier like "id" inside our DataFrame.

Now, we need to do some kind of mapping between the JSON features that we have fromindo_provincesand DataFrame df_indo because the choropleth function wants you to specify a particular key called id in each feature of the JSON file. Here are the steps we need to do:

  • Create an empty dictionary called province_id_map to store province-id as a key-value pair later.
  • Make sure the kode inside properties in indo_provinceis duplicated to the id inside feature to avoid any missing value.
  • Update the data in province_id_mapand use it to map df_ind["Province"] to a new field df_ind["id"]. This new column "id" will be the feature identifier inside our DataFrame.

Now, we will use a slightly different function to generate the choropleth map called choropleth_mapbox(). The idea is that we will take benefit of Mapbox to produce a more impressive map as we’ve never seen before 😏.

Here, we assign df_ind and indo_provinces to data_frame and geojson parameters respectively. Then we give the "id" field from df_ind to locations because the function will map it to the value assigned to featureidkey which contains the "id" of indo_provinces. Next, after we set the color parameter to field related to confirmed cases, we can provide a list of column names from df_ind to hover_data to show the quantities of our interest when the mouse hovers on the map. Do not forget to adjust the center coordinates for the map. According to latitude.to, the GPS coordinates of Indonesia are -2.548926 latitude and 118.0148634 longitude.

The visual is more stunning if we utilize Mapbox isn’t it? For the record on 18 August 2021, we can observe 3,9 million cases confirmed in which around 66,3% cases are concentrated in Java, with DKI Jakarta (842 K), Jawa Barat (660 K), and Jawa Timur (457 K) came as the Top 3. The provinces with the lowest cases were Maluku Utara (11 K), Sulawesi Barat (10,7 K), and Gorontalo (10,4 K).

C.2. Plot Confirmed Cases with Geographical Scatter Plot

We know about a scatter plot already. Now we just need to put it on top of a map, and voila, we just create a geographical scatter plot haha. In other words, the plot will use longitude and latitude coordinates as a marker point to be scattered over the map. To create the plot, we let px.scatter_geo function do the job.

C.2.1. Geographical Scatter Plot: World Covid-19 Cases

Since Plotly has built-in geometries for Countries and USA-States, the px.scatter_geo function will automatically map countries we defined to latitude and longitude. The only difference with the choropleth function is that we can set the size of the marker points from a column of the DataFrame via size parameter. We also use the darker template to give variation in how we can present the visual.

Gif by Author

The geographical scatter plot has markers that come with its color growing in size, making it easy for us to observe the trend. For example, the virus started initially in China with just a few cases, and then boom — it spread to the entire world rapidly. If we look closer, in the span of just 2 months almost all listed countries have been affected; totally frightening if you ask me.

C.2.2. Geographical Scatter Plot: Indonesia Covid-19 Cases

Okay, now we apply the same concept to Indonesia territory. Again, since Plotly does not provide geometries and coordinates for Indonesia provinces, we will use a Python library called Geopy that is capable of fetching coordinates of addresses, cities, countries, and landmarks around the world — such process is known as geocoding.

Geopy allows us to use some sort of the geolocation services like Google Maps, Bing Maps, or Nominatim. Each of the geolocation services we have mentioned has its own class in geopy.geocoders representing the service's API, so it is convenient!

To use the class, we need to provide a unique string inside the parameter user_agent that will be used as an HTTP request header. For each request, the user_agent value will be used to set the application name with the goal is to limit the number of requests per application.

Let’s collect all the unique provinces and turn them into a list. Then, for each province, retrieve all the coordinates information and store it into the df_ind DataFrame.

Next, we will use scatter_mapbox() to utilize Mapbox to produce a more attractive map. We assign "Latitude" and "Longitude" fields to parameter lat and lon respectively and put the rest of the parameters the same as we build the choropleth map for Indonesia Covid-19 cases.

Now we have another stunning visual to observe the spread of the virus per province. Since we do not aggregate df_ind, the data is moving as our mouse hovers on the marker. The wider the mouse is from the center of the marker, the more updated the data is shown. The combination of color and marker size helps us indicate provinces for the most and least confirmed cases, which are DKI Jakarta (842 K) and Gorontalo (10,4 K).

D. Geospatial Analysis with Folium

Notes:
We will only code on visualizing cases in Indonesia. We will leave the code for the world cases as homework for you and only show the visual result.

Folium is a great visualization Python library that allows us to visualize geospatial data on an interactive leaflet map. Before we begin using Folium, we need to do a few data preprocessing as follows:

  • Fetch coordinates for all unique provinces using geopy.geocodersand store it in the DataFrame.
  • Aggregate the data so that each row represents unique information for each province.

Notice that we have done the first part in the previous section. Now, we only need to aggregate the data by province using groupby and store it in a new DataFrame df_ind_sa.

D.1. Generate a Base Map

For all visuals that use Folium, we need to set a base location as the center of the map. According to latitude.to, the GPS coordinates of Indonesia are -2.548926 latitude and 118.0148634 longitude. In short, we will create a function generatebasemap_ind that has parameters to store the default coordinates location and the default zoom start. The arguments assigned to the parameters will act as the input for folium.Map to generate the base map.

The world base map where we use Corum City in Turkey as the geographical center of the Earth [4].

D.2. Plot Confirmed Cases with Marker

We will use folium.Marker to create markers for all provinces. The steps are:

  • define the base map using generatebasemap_ind function;
  • create a popup label and marker for each data row using folium.Popup and folium.Marker which then add it to the base map;
  • show the updated base map.

Pro tips:

  • We can format the label text using HTML tags because Folium functions are capable to read it (ex. label in folium.Popup and tooltip in folium.Marker);
  • We can customize the marker icon using folium.Icon and pick your preference in Glyphicons to make it more aesthetic.
The map for the World Confirmed Covid-19 Cases with Folium Marker.

Compared to the map generated by Plotly, now we can interact further not just having a tooltip when we hover on the marker but also popping up information in more detail when we click it. However, the result can be aesthetically not pleasing if the map shows too many markers making it hard to observe one by one. We can use Marker Cluster to solve this.

D.3. Plot Confirmed Cases with Marker Cluster

The map with markers can be visually messy if many markers are close and collide. To solve this, we can group several markers that are close to each other into a cluster using MarkerCluster class from folium.plugins.

Whatever marker that we have on folium.Marker() will be exactly the child of all these marker clusters. After we assign the marker cluster to the base map, the clustered marker will display the number of markers. Let’s take a look at the code for a better understanding.

The map for the World Confirmed Covid-19 Cases with Folium Marker Cluster.

Notice that whenever we zoom in or out of the map, the clusters will adjust themselves. Moreover, if we click on the clustered marker, the map will zoom in to the margin that collects all markers in that area. When we have many markers to show and need to keep them during the presentation, the marker cluster method can improve the user experience significantly because it help delivering insight by “cutting through the clutter”.

D.4. Plot Confirmed Cases with Geographical Heat Map

Geographic heat maps are an interactive way to identify where something occurs and demonstrate areas of high and low density. Say that we will use the confirmed Covid-19 cases for the heat map feature. This means whenever we have a higher number of confirmed cases, it will give higher density and darker shades for that area and vice versa.

To create the map, simply import the HeatMap function from folium.plugins. The required parameter is data in the form of a list containing information about all the locations' latitude and longitude along with the weight which in this case, is the cumulative confirmed cases.

To make the map more interactive, after we add the heat map to the base map, we can still use the marker on top of it. However, instead of using folium.Marker, we will use folium.CircleMarker so the markers will not totally cover up the visual.

The geographical heat maps for confirmed Covid-19 cases around the world.

Instead of the markers with icons, now we have little markers that are barely visible to support the information that the heat map tries to convey. From the maps above we can inspect some regions with high density indicating highly suffered from Covid-19. For example, in Indonesia, most cases are concentrated in DKI Jakarta and Jawa Barat. Furthermore, although countries in Europe and Africa are united to radiate even higher density but still cannot beat USA since the country is “in a league on its own” as the most confirmed Covid-19 cases globally.

E. Conclusion

Delivering insights on geospatial data with a proper visualization can be challenging due to many preferences. However, using the Covid-19 study case as an example, we hope this project inspires you on where to start and how to approach a similar case in the future.

Now we have the tools on dealing with geospatial data, the next is to try to explore different scenarios on applying the visualization methods and share your story with the community. Thanks for dropping by and see you guys again soon!

References

--

--

Diardano Raihan
Sekolah.mu Technology

Data Analyst @ Sekolah.mu | Hypnotizing through contents around AI and Big Data | Portfolio: diardano.com | Connect with me: tiny.one/mylinkedin