Application — Corona virus tracker — Update #1

Mark Monfort
Prosperity Advisers DnA
6 min readMar 25, 2020

NB: This article was published on LinkedIn by Mark Monfort on March 12, 2020. Republished here with permission.

In this update

  1. Update on adding Regional filters
  2. Update on adding Population and measuring cases by population size
  3. Details on the nature of the source data changing

So, as I mentioned in the previous article showcasing the app, I was going to continue to update it with new features and below are details of the update so far.

The App

To go straight to the app: CLICK HERE

Viewing the data by Regions

Firstly, people were interested in seeing things split by region. I’ve added that to all pages so you can split the views by Region.

As a result I’ve been able to turn the Country Ranks page (which has a bar chart showing confirmed cases, deaths, recoveries by country) from this split of all countries and all countries excluding China.

into this view which is split by 6 major global regions. At the time of writing this, Africa and South America had still not seen much happening in their parts of the world. Not yet anyway.

In terms of how I did this, the data was scraped from a Wikimedia page showing the ISO 3166 list of countries (https://meta.wikimedia.org/wiki/List_of_countries_by_regional_classification).

Unfortunately, the names here do not all align with the data that comes from the Johns Hopkins data. But this is common when joining datasets and potentially means needing a mapping file in between or, if one of the data sets you are joining does not change often, then it’s permissible to make changes to it. So, I have amended the ISO 3166 list to fit with how the coronavirus data comes out. See the end of this article for some further details on naming issues.

Per population figures

One of the interesting points that has come up in discussions I’ve had with others is the need to look at the data here by population. To start, I looked for population data and found some at The World Bank website (https://data.worldbank.org/indicator/SP.POP.TOTL). Downloading this dataset showed it to have historical population details across a number of years up to 2018. We only needed the most recent data so I’ve just grabbed that. Again, like the Regional data mapping above, not all countries aligned in naming convention so work had to be done to get the names corrected for the few misaligned ones.

Once done, I was able to pull in these population figures to the main data tables and create a measure that looks at number of cases as a percentage of the countries population. When doing this, some countries figures for confirmed cases, recoveries and deaths are so small that I needed to increase the number of decimals to at least 8 to even see something other than 0. Another way to look at this is what the confirmed cases were per a larger figure of people. I chose to measure it per 10,000 as it seems that this would not be an insignificant figure.

What it means is that in the Country Ranks page, you have a sub-menu option called “by % of Popn” and when you go to this and filter for Confirmed cases you will see this.

It shows that San Marino has the highest number of confirmed cases per 10,000 people according to its population.

If you hover over the bar you can see the number of cases (in this example 51) as well as the population (33.79k) and how this percentage of population measure has grown over time.

You will also see Australia had 107 cases yesterday (its now 150) but with a population of nearly 25 million that means that confirmed cases are 0.0428 per 10,000 persons.

Doing it this way has meant that the bar chart shows the Holy See (formerly labelled as Vatican City) to have 10 confirmed cases per 10,000 people but their population is only 1,000 (so you do the math). Anyway, you will see things like this just because we needed to pick a baseline.

Another thing to note here is that it may seem like a concern that these percentage of population figures are on the rise. In fact they will keep growing steadily because of the nature of the denominator being static. So just be careful how you interpret this.

In any case, it’s something that can put where we are at into perspective. You can even couple this with changing the date slider at the top to see how this looked at a certain point in time.

What’s in a name?

Speaking of naming issues, Axios highlighted something I saw in the data where a few weeks ago the names in the Johns Hopkins dataset for countries like Iran and Taiwan and South Korea were changed. There were a few more on the list but this article from Axios shows just how concerned people were about the name changes especially for Taiwan where it is now being referred to as “Taipei and environs”.

You can see the article here: https://www.axios.com/johns-hopkins-coronavirus-map-taiwan-china-5c461906-4f1c-42e7-b78e-a4b43f4520ab.html

I’ve dealt with data that changes in source systems that you have to rely upon and it can be tedious to have to change it but when we’re relying on data that’s important as this then it really matters.

When I discovered the issues last night, it was after updating the files and then noticing the cases confirmed went down from nearly 119,000 as of yesterday to 113,000. I had not read of any sudden misclassification of tests results in the news so the change had to do with the data.

Luckily, whilst experimenting with the data from Johns Hopkins, I had downloaded some of it and had a record showing that some countries had different names as at the 5th of March versus what was being shown on the online GitHub repo today.

What this does mean is that I’ll build myself a monitor that checks for name changes as it pulls out the data. This is good practice to do but often not necessary as data should not be changing like this. But hey, this is important enough to create considering the world is so transfixed right now.

So with all that completed, here’s an animation of what you can do in the app now.

--

--

Mark Monfort
Prosperity Advisers DnA

Data Analytics professional with over 10+ years experience in various industries including finance and consulting