Raincloud plots for clear, precise and efficient data communication
Raw data visualization involves presenting data in its most basic form, without any manipulation or aggregation, and is fundamental for initial understanding and quality assessment of your data.
Specifically, raw data visualization allows for:
- Initial Exploration: It helps in the initial exploration of data, permitting us to get a sense of the patterns, outliers, and overall characteristics present in the raw dataset.
- Identifying Anomalies: Raw data visualization allows for the quick identification of anomalies, outliers, or errors in the dataset, which might need further investigation or cleaning.
- Understanding Data Distribution: It provides a clear understanding of the distribution of the data points, helping us comprehend the nature and spread of the dataset.
- Data Quality Assessment: By visualizing raw data, it is easier to assess the quality of the dataset, including missing values, inconsistencies, or inaccuracies.
- Contextual Understanding: Visualizing raw data aids in understanding the context in which the data was collected, helping in making informed decisions about the appropriate data preprocessing steps.
- Enhanced Communication: When collaborating with others, presenting raw data visually can facilitate better communication among team members, allowing everyone to start from the same foundational view.
Typically, (raw) data and summary statistics are presented through boxplots, including in scientific manuscripts and presentations.
Although they permit combination of many summary statistics in one chart, they may still mislead readers by not providing a clue of the sample size or underlying patterns in the data.
Even if one includes the individual data points into the boxplot, it can be clustered and not helping us see the where these points lie on.
Alternatively, a raincloud plot, which is a hybrid plot mixing a halved violin plot, a box plot, and scattered raw data, can help us visualize raw data, the distribution of the data, and key summary statistics at the same time.
The raincloud plot improves upon the traditional box plot by emphasizing multiple modes, indicating the potential existence of different groups within the data.
Unlike the box plot, which does not reveal where densities gather, the raincloud plot does precisely that!
Now it is your turn. Check out this simple raincloud plot tutorial using the R library ggdist v.3.3.0 (Kay, 2023; https://mjskay.github.io/ggdist/).
- Load the libraries and data
install.packages(c("agridat", "ggplot2", "ghibli", "ggdist")
#Line below selects the dataset "Birth weight and weaning weight of Dorper x Red Maasi lambs"
data <- agridat::ilri.sheep
data <- data |> mutate(birthwt=as.numeric(birthwt),
weanwt=as.numeric(weanwt),
weanage=as.numeric(weanage),
#Line below creates the variable "weight gain from birth to weaning" displayed in grams per day
weight_gain_gram=as.numeric(round((((weanwt-birthwt)/weanage)*1000),2),na.rm=T))
data <- subset(data, select=c(lamb,gen,weight_gain_gram)) #selecting the variables of interest for this exercise
head(data,10) #shows the first 10 rows only
lamb gen weight_gain_gram
1 627 DD 108.80
2 629 DD 138.39
3 635 DD 111.93
4 636 DD 119.44
5 638 DD NA
6 639 DD 78.50
7 640 RD 113.08
8 642 DD NA
9 643 DD 142.99
10 644 DD 83.18
2. Plotting the raincloud using ggplot2 and ggdist
ggplot(data, aes(x = gen, y = weight_gain_gram, fill=gen)) +
# Line below sets the Studio Ghibli color pallete, for the sake of nostalgia =)
scale_fill_ghibli_d("SpiritedMedium", direction = -1) +
geom_boxplot(width = 0.1) +
xlab('Lamb genotype') +
ylab('Weight gain, in g/d') +
ggtitle("Weight gain from birth to weaning in 4 lamb genotypes") +
theme_classic(base_size=18, base_family="serif")+
theme(text = element_text(size=18),
axis.text.x = element_text(angle=0, hjust=.5, vjust = 0.5, color = "black"),
axis.text.y = element_text(color = "black"),
plot.title = element_text(hjust = 0.5),
plot.subtitle = element_text(hjust = 0.5),
legend.position="none")+
scale_y_continuous(breaks = seq(0, 180, by=20), limits=c(0,180), expand = c(0, 0)) +
# Line below adds dot plots from {ggdist} package
stat_dots(side = "left", justification = 1.12, binwidth = 1.9) +
# Line below adds half-violin from {ggdist} package
stat_halfeye(adjust = .5, width = .6, justification = -.2, .width = 0, point_colour = NA)
You just learned how to make a raincloud plot with R libraries ggplot2 and ggdist! Congrats!
Stay tuned for more data visualization posts as well as more content on statistical programming and data analysis.
Cheers!
Guilherme