Heat-Maps in R
Heat-maps show individual values as colours in two dimensions. Whilst the term is modern, the idea of colouring cells to show values has done for over a century.
This article looks at how to create a heat-map in R.
Hot, hot, hot
A cluster heat-map has tiles: the positions refer to two different categories. In that structure, analysts display the value through colour. One way to show lab-confirmed cases of the SARS-CoV-2 virus is through a heat-map.
An example is from Dr Robertson (Loughborough University):
Here, the horizontal position refers to the week, and the vertical position is the age group. Dr Robertson uses a colour scale to show lab-confirmed cases per 100,000 estimated people. Public Health England publish these figures, for pillar 1 and 2 testing in England.
This virus spreads through contact and through the air. An infected person being close to other people means a risk of passing the virus on.
A pre-print paper by Dr Adam Kurcharski (LSHTM), Dr Hannah Fry (UCL) and others looked at social contacts. As part of the BBC’s Pandemic study, people downloaded an app. That app recorded approximate locations every hour for 24 hours.
People inputted information about their close contacts, including the type of interaction. Contacts were at home, school, work, or elsewhere.
Beyond their families, people tend to be close to others of similar ages:
The very strong diagonal density of contacts (Fig 2B) is characteristic of strong age-assortative mixing, and the sub-diagonal density captures interactions between children and their parents.
As the virus spreads, infections emanate across age groups. This pattern may be easier to see on a heat-map than a line graph. Some readers could have trouble distinguishing between different shades of the same colour.
Bringing the heat-map
For this graph, I made a prepared file. It shows Figure 4 statistics from a recent PHE COVID-19 national surveillance report. I added week dates to the week 39 graph.
After reading the Excel file, the code shifts the table into the tidy format:
phe_caserates_agegroup_df <- phe_figure4table_df %>%
pivot_longer(cols = 4:13,
names_to = "age_group",
values_to = "case_rate") %>%
filter(week_number >= week_number_min)
When I was testing the code, the date breaks always showed Monday. As a result, we want those breaks to show the Sunday dates:
week_end_breaks <- phe_figure4table_df %>%
filter(week_number >= week_number_min) %>%
In order to organise the age groups, we use factors:
phe_caserates_agegroup_df$age_group <- factor(phe_caserates_agegroup_df$age_group,
levels = c("0 to 4", "5 to 9", "10 to 19", "20 to 29", "30 to 39", "40 to 49", "50 to 59", "60 to 69", "70 to 79", "80 or over"))
The graph code then starts, using geom_raster to create the heat-map:
ggplot(data = phe_caserates_agegroup_df,
mapping = aes(x = week_end_date, y = age_group, fill = case_rate)) + geom_raster() +
As I intend to use data labels, ‘guide = FALSE’ switches off the legend:
scale_fill_gradient(name = "",
low = "#FFFFFF",
high = "#d1112e",
guide = FALSE) +
This line labels the rounded case ratios:
geom_text(aes(label = round(case_rate))) +
The horizontal axis scales use the preset break marks:
scale_x_datetime(expand = c(0,0),
breaks = week_end_breaks,
date_labels = "%d-%b") +
After adding labels, the heat-map graph is:
Heat-maps have good use cases. Analysts can make professional graphs of this kind in R.
Public Health England publish weekly COVID-19 national surveillance reports. The graph uses the week 39 report. The data file and code is on GitHub and R Pubs.