Raincloud plots for clear, precise and efficient data communication

Guilherme A. Franchi, PhD
3 min readNov 16, 2023

Raw data visualization involves presenting data in its most basic form, without any manipulation or aggregation, and is fundamental for initial understanding and quality assessment of your data.

Specifically, raw data visualization allows for:

  1. Initial Exploration: It helps in the initial exploration of data, permitting us to get a sense of the patterns, outliers, and overall characteristics present in the raw dataset.
  2. Identifying Anomalies: Raw data visualization allows for the quick identification of anomalies, outliers, or errors in the dataset, which might need further investigation or cleaning.
  3. Understanding Data Distribution: It provides a clear understanding of the distribution of the data points, helping us comprehend the nature and spread of the dataset.
  4. Data Quality Assessment: By visualizing raw data, it is easier to assess the quality of the dataset, including missing values, inconsistencies, or inaccuracies.
  5. Contextual Understanding: Visualizing raw data aids in understanding the context in which the data was collected, helping in making informed decisions about the appropriate data preprocessing steps.
  6. Enhanced Communication: When collaborating with others, presenting raw data visually can facilitate better communication among team members, allowing everyone to start from the same foundational view.

Typically, (raw) data and summary statistics are presented through boxplots, including in scientific manuscripts and presentations.

Although they permit combination of many summary statistics in one chart, they may still mislead readers by not providing a clue of the sample size or underlying patterns in the data.

Even if one includes the individual data points into the boxplot, it can be clustered and not helping us see the where these points lie on.

Example of a “not-so-good” illustration of data and summary statistics. Data was retrieved from dataframe “ilri.sheep” (Baker et al., 2003; https://doi.org/10.1017/S1357729800053388) located in the R library agridat v.1.22 (Wright, 2023).

Alternatively, a raincloud plot, which is a hybrid plot mixing a halved violin plot, a box plot, and scattered raw data, can help us visualize raw data, the distribution of the data, and key summary statistics at the same time.

The raincloud plot improves upon the traditional box plot by emphasizing multiple modes, indicating the potential existence of different groups within the data.

Unlike the box plot, which does not reveal where densities gather, the raincloud plot does precisely that!

Now it is your turn. Check out this simple raincloud plot tutorial using the R library ggdist v.3.3.0 (Kay, 2023; https://mjskay.github.io/ggdist/).

  1. Load the libraries and data
install.packages(c("agridat", "ggplot2", "ghibli", "ggdist")

#Line below selects the dataset "Birth weight and weaning weight of Dorper x Red Maasi lambs"
data <- agridat::ilri.sheep

data <- data |> mutate(birthwt=as.numeric(birthwt),
weanwt=as.numeric(weanwt),
weanage=as.numeric(weanage),
#Line below creates the variable "weight gain from birth to weaning" displayed in grams per day
weight_gain_gram=as.numeric(round((((weanwt-birthwt)/weanage)*1000),2),na.rm=T))

data <- subset(data, select=c(lamb,gen,weight_gain_gram)) #selecting the variables of interest for this exercise

head(data,10) #shows the first 10 rows only
lamb gen weight_gain_gram
1 627 DD 108.80
2 629 DD 138.39
3 635 DD 111.93
4 636 DD 119.44
5 638 DD NA
6 639 DD 78.50
7 640 RD 113.08
8 642 DD NA
9 643 DD 142.99
10 644 DD 83.18

2. Plotting the raincloud using ggplot2 and ggdist

ggplot(data, aes(x = gen, y = weight_gain_gram, fill=gen)) +
# Line below sets the Studio Ghibli color pallete, for the sake of nostalgia =)
scale_fill_ghibli_d("SpiritedMedium", direction = -1) +
geom_boxplot(width = 0.1) +
xlab('Lamb genotype') +
ylab('Weight gain, in g/d') +
ggtitle("Weight gain from birth to weaning in 4 lamb genotypes") +
theme_classic(base_size=18, base_family="serif")+
theme(text = element_text(size=18),
axis.text.x = element_text(angle=0, hjust=.5, vjust = 0.5, color = "black"),
axis.text.y = element_text(color = "black"),
plot.title = element_text(hjust = 0.5),
plot.subtitle = element_text(hjust = 0.5),
legend.position="none")+
scale_y_continuous(breaks = seq(0, 180, by=20), limits=c(0,180), expand = c(0, 0)) +
# Line below adds dot plots from {ggdist} package
stat_dots(side = "left", justification = 1.12, binwidth = 1.9) +
# Line below adds half-violin from {ggdist} package
stat_halfeye(adjust = .5, width = .6, justification = -.2, .width = 0, point_colour = NA)
Raincloud plot illustrating illustration of data and summary statistics. Data was retrieved from dataframe “ilri.sheep” (Baker et al., 2003; https://doi.org/10.1017/S1357729800053388) located in the R library agridat v.1.22 (Wright, 2023).

You just learned how to make a raincloud plot with R libraries ggplot2 and ggdist! Congrats!

Stay tuned for more data visualization posts as well as more content on statistical programming and data analysis.

Cheers!

Guilherme

--

--

Guilherme A. Franchi, PhD

Sharing experiences and new knowledge on statistical programming, data visualization, data storytelling, and statistical analysis.