Data Visualisations — Part V

Photo by Pixabay from Pexels

Facts are stubborn, but statistics are more pliable

— Mark Twain

Introduction

This article illustrates an approach to visualise the distribution of observations for the given dataset with histograms. The article also discusses bubble charts which are useful to convey information regarding a third data variable per observation where bubble size indicates the magnitude of the third data variable.

Bubble plot

A bubble chart or bubble plot is a variation of a scatter plot when there are three variables considered for visualisation. The bubbles of different sizes are used to plot the data point that varies according to the other two variables.

Bubble chart depicting the number of deaths against total COVID tests performed and total confirmed cases

The bubbles in the following chart shows the number of deaths that occurred due to COVID-19 disease. Here the relation (i.e deaths) is analysed with respect to the variables:- total COVID-19 tests performed and total confirmed cases. The data is taken for some of the countries around the world for the date 31st May 2021. The dataset is extracted from the world meter and we can interpret the three series for all the countries from the bubble charts. Here, the size of the bubble indicates the number of deaths in a specific country. The larger the bubble size, the more is the number of deaths in the country due to COVID-19 disease.

Histogram

A histogram is a bar plot where the axis representing the data variable is divided into a set of discrete bins and the percentage count of observations falling within each bin and is shown as the height of the corresponding bar. The histogram gives an analysis of the distribution of the continuous variable in the data with respect to outliers, skewness. Hence, the histogram is a significant data visualization technique that gives an overall summary of the spread of the data. The dataset used to plot the graph is the percentage distribution of the total number of confirmed cases for the country of Ireland within each age group and is taken from here for the month of December 2020.

Histogram plot for distribution of confirmed covid cases in age groups

Takeaways

We discussed in this article the distribution of data within a particular subspace with histograms and bubble charts to infer a three-way relationship between variables. The entire code for the above article can be found here.

Do you have any questions?

Kindly ask your questions via email or comments and we will be happy to answer :)

`

--

--

Insights on Modern Computation
Perspectives on data science

A Communal initiative by Meghana Kshirsagar (BDS| Lero| UL, Ireland), Gauri Vaidya (Intern|BDS). Each concept is followed with sample datasets and Python codes.