My first class of visualisation with R

“You won’t go to bed without knowing something else” is a saying that refers to the idea that every day we learn something new.

This phrase highlights the nature of our lifelong learning, which is continuous and unstoppable, which increases day by day with small things: a new activity, information we didn’t know, a different way of looking at things.

The saying implies that every day we must increase a little more our knowledge about things, that we must not waste time but use it to know new things.

This phrase is generally used to express that we have learned something new. For example, someone tells us that the Atacama Desert in Chile is the driest in the world, and we respond to that, satisfied, “you won’t go to bed without knowing something else”. So it is used to indicate that we have learned something new or interesting.

Photo by Simon Berger on Unsplash

Variations of this saying are “you will not go to bed without knowing one more thing”, “you will not go to bed without knowing one more thing”, “you will never go to bed without knowing one more thing”, or “you will not go to bed without knowing one more thing”.

Today was the first day I was given a master class on graphics and visualization! How the perspective of the data changes when you see it on gŕaficas instead of on a table.

As an example they have shown us the “datasauRus” library.

The Datasaurus data package

This package wraps the awesome Datasaurus Dozen dataset, which contains 13 sets of x-y data. Each sub-dataset has five statistics that are (almost) the same in each case. (These are the mean of x, mean of y, standard deviation of x, standard deviation of y, and Pearson correlation between x and y). However, scatter plots reveal that each sub-dataset looks very different. The dataset is intended to be used to teach students that it is important to plot their own datasets, rather than relying only on statistics.

The Datasaurus was created by Alberto Cairo in this great blog post.

Datasaurus shows us why visualization is important, not just summary statistics.

He’s been subsequently made even more famous in the paper Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing by Justin Matejka and George Fitzmaurice.

In the paper, Justin and George simulate a variety of datasets that the same summary statistics to the Datasaurus but have very different distributions.

This package looks to make these datasets available for use as an advanced Anscombe’s Quartet, available in R as anscombe.

Load packages

packages <- c("datasauRus","ggplot2","gganimate")
newpack = packages[!(packages %in% installed.packages()[,"Package"])]

if(length(newpack)) install.packages(newpack)
a=lapply(packages, library, character.only=TRUE)


To see that statistics are (almost) the same for each sub-dataset, you can use dplyr.

datasaurus_dozen %>%
group_by(dataset) %>%
mean_x = mean(x),
mean_y = mean(y),
std_dev_x = sd(x),
std_dev_y = sd(y),
corr_x_y = cor(x, y)

To see that each sub-dataset looks very different, you can draw scatter plots.

ggplot(datasaurus_dozen, aes(x=x, y=y, colour=dataset))+
theme(legend.position = "none")+
facet_wrap(~dataset, ncol=3)

Let´s make animate!


p <- ggplot(datasaurus_dozen, aes(x=x,y=y)) +
geom_point() +
theme_minimal() +
transition_states(dataset,3,1) +


I hope you like it.

No matter what books or blogs or courses or videos one learns from, when it comes to implementation everything can look like “Outside the Curriculum”.

The best way to learn is by doing! The best way to learn is by teaching what you have learned!

Never give up!

See you on Linkedin!

Master in Data Science. Passionate about learning new skills. Former branch risk analyst.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store