5 Visualisations In R

(with 5 lines or less of code)

Introduction

Often times, open data is only made available in huge chunks at a time that cannot be easily digested. Think of a chaotic library scene with hundreds of thousands of books after a storm passed through leaving papers all over the floor and ceilings, and equate that to a dataset in one spreadsheet. As such, tools and techniques exist that are used to manipulate data in a bid to organize, analyse and interpret it in a way that’s easy to ingest and share it in context and elicit feedback from the audience it is meant for.

Recently, I came across a very interesting post, 5 data visualisations in 5 minutes: each in 5 lines or less of R. The title and work done by Sharon Ellis, as well as a name as straight-forward as R was so captivating that I decided to install, learn and use R to create visualisations in local context using data obtained from The National Treasury, Kenya. The goal? To create simple R code snippets that can be easily understood and re-used by anyone.


What is R?

R is a programming language and software environment for statistical computing and graphics (Wikipedia, 2015). It helps you create powerful visualizations (maps, charts, graphs) using your own data. Here’s how to install R and R-Studio(recommended).


The Data

a) The first dataset has information on how much money in donor funding was invested in each of the 8 Millennium Development Goals (MDGs) in Kenya — https://www.dropbox.com/s/e87onbxlhel7fsp/MDG%20Kenya%20Data.csv?dl=0

b) The second dataset shows how much money in donor funding was invested in each of the 47 counties in Kenya — https://www.dropbox.com/s/dz2nsoilyw1h6vz/donor%20funding%20per%20county%20Kenya.csv?dl=0


Visualising the Data

In 5 lines of code or less, we will create visualisations that will give a new spin and context on the data around donor funding along MDGs and counties.

1. Bar Charts

Bar charts are oriented horizontally and are great for drawing comparisons between groups of data especially when the data values vary greatly.

a) Using the first dataset on MDGs, we can create a bar chart showing total number of funded projects affiliated to each MDG.

At a glance, we can tell that most donor-funded projects aim at eradicating extreme hunger and poverty in Kenya

b) Using the same dataset, we can create a bar chart showing total amount of money invested in each MDG.

KSh. 10.5 Trillion was invested in Developing a Global Partnership for Development i.e. to avail more effective but costly medicine to Kenyans

2. Column Charts

Column charts, unlike bar charts, are oriented vertically and great to use when data values have a smaller range (between the minimum and maximum). For this example, the Amount is in Billions and placed side by side with corresponding Number of Projects.

3. Pie Charts

Who doesn’t like a good pie? ☺ Pie charts make use of pie slices to show variations in data sizes. They work best where the data values are fewer and vary greatly. We will still use the first data set for this visualization.

11.1% of donor money was invested in MDG 8.

4. Bubble Chart

The first three charts are great when you need to show data with one dimension, say amounts, number, et al. But what happens when you need to show data with two to four dimensions, say time, amount and number all at once?

Bubble charts are the answer.

In this example, the smaller the bubble, the fewer the number of projects affiliated to that MDG. Bubbles on the right indicate MDGs that are highly invested in.

5. Maps

Maps make use of geospatial data to represent information about a place in an easy and sometimes, interactive manner. They are a great tool for exploration and learning.

  1. In this example, we are going to make use of one of R’s powerful packages- maps- to render a simple colored map of Kenya. Alternative packages that can be used within R include “rworldmap” and “mapdata”.
A simple map of Kenya created using two lines of R . Pretty neat.

2. Uses Leaflet, a JavaScript Library, to create an interactive map as well as the second data set which has information on each of Kenya’s 47 counties.


Conclusion

While three to five lines may not be sufficient for a more complex set of visualisations, it is clear that R is quite powerful and well-laden with packages and proper documentation to help you visualise and tell meaningful stories with your data.

Interested in learning R? Code School and Data Camp offer introductory sessions to help you get started with R and find your way from there- for free!

Happy learning.