Why you should cluster Covid-19 cases: focus on Africa

Webster Gova
5 min readMay 31, 2020

--

In all the conversations I have had with peers, colleagues and strangers in social distancing queues at the mall, the most interesting question I hear people ask is, “do you think we are doing better than country x”? In most cases, most people have invariantly decided on comparing the number of positive cases as a reliable measure of success to suppress the spread of coronavirus. I have attempted to find a more efficient and reliable way to cluster Covid-19 time series to make it more intuitive to compare country cases over time without simply looking at the number of cases. I have explained in more detail in my previous article why absolute numbers and point estimates for Covid-19 cases are neither efficient nor reliable.

The first major problem in the case of Covid-19 lies in deciding on what time series dimension to focus on. My approach has been to generate Rt values using Bayesian resampling, an approach adapted from Kevin Systrom as shown in this notebook Covid-19 Rt values — Africa. The result of that analysis shows that Burundi might have low cases, but has the highest Rt value, but this is not surprising after Burundi declared World Health Organisation (WHO) personnel as persona non grata for “unacceptable interference” with their coronavirus management. One can only guess what Burundi might be hiding.

Current Rt values for Afican countries over the period from 1 March to 30 May

The 6-means clusters that emerged from my analysis using Rt between 1 March and 30 May indicate that clusters were based on the average Rt value, min-max range and variance in the Rt values.

Covid-19 country Rt k-means clusters

Clusters 1 and 4 are countries with data gaps, and possibly under-reporting. For example, Angola, the only country in cluster 4, reported 19 cases on 7 April and no reports until 18 April where it reported 5 new cases. Cluster 1 on the other hand (Namibia, Tunisia and Western Sahara), most likely have unreliable data or resource limitations to test as frequently as other countries to have the latest updates on a regular basis. For example, it is difficult to find a reliable update on tests conducted in Western Sahara.

Rt value mean and standard deviation per cluster

Another example is Tunisia’s 3-day and 7-day moving averages which have both stayed below 10 since the beginning of May [ Source: Worldometers].

According to data available on Worldometers, Tunisia has conducted 51 881 tests to date, roughly 4394 tests per million of its population of 11, 807 839. Tunisia has currently reported 1077 cases, approximately 91 cases per million of its population.

Tunisia Covid-19 cases [Source:Worldometers]

Countries in cluster 2 shown in the map of clusters highlight high Rt values, and moderate variance. These countries have relatively low testing intensity ranging from 177 (Malawi) to 5 913 (Equatorial Guinea) per million of the population. Eritrea reported on 11 May that the “total number of infected individuals to-date is 39. All of them have recovered fully and the last patient was released from hospital yesterday.” No new cases have been reported since.

The Eritrean government’s strategy was to quarantine all persons who entered the country through air, sea and land routes, active tracing and quarantining of immediate contacts of those diagnosed positive for COVID-19. According to the government of Eritrea, around “3,486 persons have been quarantined to-date in 70 centers”, while “2,400 were released” and “over 1,000 individuals still remain in quarantine in 33 centers” as of 16 May. Since then, Eritrea has implemented random testing.

What is peculiar about Cluster 3 and 5 is that they have low cases, low Rt (below 1) but slightly high variance in Rt. Results in this cluster do not indicate any relationship between testing capacity and number of cases. Rwanda and Togo have high testing capacity, 5183 and 2385 per million population, while Burkina Faso and Burundi.

Countries in Cluster 0, including South Africa and Egypt are not showing signs of their cases going down any time soon.

Daily moving averages for South Africa [Source: Worldometers]

South Africa has conducted the most tests on the continent to date, and currently has the most cases. Some might argue that this is a direct consequence of the country’s testing intensity. Despite this, South Africa will ease its lockdown on 1 June from level 4 to level 3, where signifcant economic activity which had been restricted will resume.

Daily moving averages for Egypt [Source: Worldometers].

Egypt on the other hand, announced the discharge of 152 infected people from isolation and quarantine hospitals on Friday 28 May, expressing confidence in how they are managing the spread of the virus. Both South Africa and Egypt have had significant reductions in tourism revenue due to travel restrictions, with broad-reaching effects on industries interconnected to the tourism sector such as hospitality, food and travel.

Based on this analysis, and the k-means clusters, countries can be grouped more efficiently and labelled appropriately for further analysis. If this analysis has not helped explain coronavirus trends, I hope it has helped to settle some disputes over coffee on which country is doing better than the others at managing the spread of coronavirus in Africa. Please get in touch with me to share your ideas and data sources to improve this analysis.

--

--