Data visualization with ggplot2 — part 2 of “R for Applied Economics” guide

Dima Diachkov
13 min readJan 25, 2023

--

Brief intro

In this story, we will explore the capabilities of the ggplot2 library in R for creating plots and charts. We will have a look at the main functions and chart types, as well as typical issues on basic levels.

For illustrative purposes, the visuals will be based on the data on inflation, which we now automatically parse directly from the ECB data warehouse (please go to part 1 for details). Do you need some extra motivation for free? Here you go, buddy.

“Even the smallest person can change the course of the future.” — Galadriel

Now you are ready to rumble.

Why do we need this ‘ggplot2’?

The ggplot2 library in R is widely used for the analysis of economic data. One of the key strengths of ggplot2 is its ability to create highly customizable and visually appealing plots, which can be used to effectively communicate economic trends and patterns. The package’s grammar of graphics allows for easy separation of the data, aesthetics, and statistical transformations, making it simple to create a wide range of plots such as line plots, scatterplots, and heatmaps.

By the way, the “grammar of graphics” is not just technical rules for charting. It is also a sort of standard (just like ISO), so when you follow it — any user/reader/analyst knows what to expect from your visuals. Probably, one of the next posts will be about this fascinating framework (you stay tuned, right?). The approach is based on layering data and visual components.

Credit: Teodor Kuduschiev (unsplash.com)

Additionally, ggplot2 has a wide range of options for creating maps, which can be useful for visualizing economic data at a geographic level. The package also provides a variety of tools for data manipulation, which can be useful for cleaning and preparing economic data for analysis.

One package will serve a great deal of help throughout your projects. Isn’t it beautiful?

One area where ggplot2 and R can be particularly useful is in the analysis of inflation. As you may already know, inflation is a measure of the rate at which prices for goods and services increase over time, and is an important indicator of a country’s economic health. With ggplot2, we can create line plots to visualize inflation data over time and identify patterns and trends. Of course, that is not an exhaustive list, but this is what we have to do according to our imaginary task from Gandalf in part 1.

You have made it through the introduction and some theory. So it means that you are ready to conquer. First, we start with a toy example. Then, as soon as we are ready to tackle real-world data — we will jump into it.

How to install and load the ggplot2 package

Let’s install the package (just in case), and then I will provide some basic examples of how to create different types of plots, such as scatter plots, line plots, and bar plots.

To get started with ggplot2, you first need to install and load the package. You can do this by running the following code:

# chunk 1
install.packages("ggplot2")
install.packages("scales")
install.packages("zoo")
library(ggplot2) # for beautiful charts
library(scales) # for beautiful scaling
library(zoo) # for moving average function

Once the package is loaded, you can start creating plots. The basic structure of a ggplot2 plot is as follows:

# chunk 2
# general approach to creation of plots # do not run
ggplot(data, aes(x, y)) + geom_type()

Where data is the dataset you want to use, x and y are the variables you want to plot, and geom_type() is the type of plot you want to create (e.g., geom_point() for a scatter plot, geom_line() for a line plot, etc.).

Here’s an example of how to create a simple scatter plot using the our mpg dataset that comes with ggplot2:

# demonstration on test data
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point()
Output for the code above

This code will create a scatter plot of highway miles per gallon (hwy) against engine displacement (displ) for all the cars in the mpg dataset.

You can also create a line plot by replacing geom_point() with geom_line():

# demonstration on test data
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_line()
Output for the code above

Similarly, you can create a bar plot by replacing geom_line() with geom_bar():

# demonstration on test data
ggplot(mpg, aes(x = class)) +
geom_bar()
Output for the code above

These are just a few basic examples of the types of plots you can create with ggplot2, but there are many more options available.

You can use these code examples as a starting point for creating your own plots and experimenting with different options. You can also use different data and “aes” to create different types of plots.

Customizing plots

One of the strengths of ggplot2 is its ability to easily customize different aspects of a plot, such as the axis labels, the legend, and the theme. Here are a few examples of how to customize a plot.

To change the axis labels, you can use the xlab and ylab arguments within the ggplot() function. For example, to change the x-axis label of the scatter plot created in the previous example to "Engine Displacement" and the y-axis label to "Highway Miles per Gallon":

# demonstration on test data
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point() +
xlab("Engine Displacement") +
ylab("Highway Miles per Gallon")
Output for the code above

You can also customize the title of the plot using the ggtitle() function:

# demonstration on test data
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point() +
xlab("Engine Displacement") +
ylab("Highway Miles per Gallon")+
ggtitle("Scatter plot of Engine Displacement vs Highway Miles per Gallon")
Output for the code above

To change the appearance of the plot, you can use the theme() function to modify various elements of the plot's theme, such as the background color, the font size, and the grid lines. For example, to remove the grid lines and change the background color to white:

# demonstration on test data
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point() +
xlab("Engine Displacement") +
ylab("Highway Miles per Gallon")+
ggtitle("Scatter plot of Engine Displacement vs Highway Miles per Gallon")+
theme(panel.grid.major = element_blank(),
panel.background = element_rect(fill = "white"))
Output for the code above

The vertical and horizontal gridlines are gone (actually they are white now, but seem fine).

You can also change the scale of the plot to control the appearance of the data. For example, to change the y-axis to be on a logarithmic scale:

# demonstation on test data
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point() +
xlab("Engine Displacement") +
ylab("Highway Miles per Gallon")+
ggtitle("Scatter plot of Engine Displacement vs Highway Miles per Gallon")+
theme(panel.grid.major = element_blank(),
panel.background = element_rect(fill = "white"))+
scale_y_log10()
Output for the code above

The log-scaling is a very often procedure for real-world data processing, especially when we work with financial and economical data.

These are just a few examples of the many ways you can customize a ggplot2 plot. With a little experimentation, you can create plots that are both informative and visually appealing.

One of the key features of ggplot2 is the ability to add multiple layers to a plot, each with its own set of aesthetics and geoms. This allows you to create complex plots with multiple types of data, such as points, lines, and polygons.

Adding layers

To add a layer to a plot, you simply use the + operator to add another geom_ function to the plot. For example, to add a line to the scatter plot created earlier, you can use the geom_smooth() function:

# demonstration on test data
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point() +
geom_smooth()
Output for the code above.

This will add a smooth line of best fit to the scatter plot, showing the general trend of the data. So you get the idea, right? ggplot2 is all about the layers!

Each layer can also have its own set of aesthetics, which are specified within the aes() function. This allows you to create plots with multiple variables and different levels of detail.

With the ability to add multiple layers, ggplot2 offers a great deal of flexibility in creating complex and informative plots.

As you can see from the above examples, you can add different layers to the plots with different germs and aesthetics. You can also use these layers to create different types of plots like histograms, box plots, and so on. This feature of ggplot2 gives you a lot of flexibility to create complex plots with multiple variables and different levels of detail.

Let me remind you some basics of inflation data that we are dealing with.

The economics behind EU Inflation

The European Central Bank (ECB) sets a target inflation rate of below, but close to 2% for the EU as a whole. The inflation rate can vary widely across different EU member states and is affected by various factors such as economic growth, labor market conditions, and energy prices.

In general, higher inflation is associated with a stronger economy, while lower inflation or deflation can indicate economic weakness. The ECB monitors inflation and adjusts monetary policy accordingly to achieve its inflation target.

Inflation rates can vary widely across different countries, and it can be useful to compare inflation data between EU member states (btw — ggplot2 provides the ability to create maps and choropleth plots, which can be used to visualize inflation data at a country level).

So, for now, we will apply all the knowledge we obtained to produce simple ggplot2 graphics with EU inflation, formatted, labeled, properly named, and supplemented with reference info about the 2% targeted threshold. Should be enough for one day, don’t you think so?

Visualization of the EU inflation trends

We will start from the most basic step — the ordinary plot. Firstly, we do the basic prep. I assume that you have the data from part 1 in your R environment, which contain a dataframe with two columns on the period of observation and inflation.

If not — please run this code to import code from GitHub.

# chunk 3
# I assume that you have data from part 1 in your R environment
# if not - please run this code to import code from GitHub
library(devtools) # <<< if you don't have it - you know what to do: install.packages("packagename")
source_url("https://raw.githubusercontent.com/TheLordOfTheR/R_for_Applied_Economics/main/Part1.R")

Then we do the magic.

# this action will make our code more explicit: let's put clean_df into a new df object
df <- clean_df

ggplot(df, aes(x = Period, y = Inflation)) +
geom_point()
Basic plot produced by the code above

Well, that is something. But what do we need to upgrade? What would you say if you were a sophisticated analyst and saw a chart like this in a paper or report? Obviously, first questions will be:

  • “Why is it not a line plot?”
  • “Why is it so dark?”
  • “Where is the title?”
  • “What are the units?”
  • “Where do these gridlines mean?”
  • “WHERE IS THE STORY?”

And you would be right to ask these and other such questions. We will tackle it right now. I will gradually increase the complexity by adding layers for each component that we need (line by line you can check out the interim states yourself).

# here we start our basic plot BUT we add one by one layers on 
# top of that plot so we can gradually check the output

ggplot(df, aes(x = Period, y = Inflation)) +
geom_line(aes(color = "Inflation"), show.legend = F, size = 0.8) +
theme_bw() +
xlab('Period of observation') +
ylab('Inflation rate, %') +
ggtitle("Inflation in European Union") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+
scale_x_date(date_breaks = "6 months",
date_labels = "%b %y",
expand = c(0.01,0.01)) +
scale_y_continuous(breaks=c(seq(0,max(df$Inflation),1)))

This one a way better, right? Just a recap of what we did.

  1. We started with a basic line plot;
  2. We switched the default palette to a lighter one — theme_bw();
  3. After that, we renamed the x and y axis to make them more explicit — xlab(), ylab();
  4. Then we added the title — ggtitle();
  5. Then we rotated x-axis names — theme(axis.text.x = (…));
  6. Then we transformed monthly gridlines to half-year — scale_x_date(date_breaks = (…)), chose different formats for dates — chose different labels for dates — scale_x_date(date_labels = (…)), and erased the empty space on the plot area to the right and to the left of the line — scale_x_date(expand = c(…));
  7. And the last action is to fix the y-axis, so it starts at 0 and ends at the maximum value in the dataset — scale_y_continuous(breaks = (…)).

Yeah, the next part of the journey will be about the grammar of graphics, so you will be able to feel this permeating beauty of simplicity. Basically, we covered the list of ideas for enhancement, mentioned above. But we still lack the story.

So what do we do next? Try this.

# here we add some story
ggplot(df, aes(x = Period, y = Inflation)) +
geom_line(aes(color = “Inflation”), #show.legend = F,
size = 0.8) +
theme_bw() +
xlab(‘Period of observation’) +
ylab(‘Inflation rate, %’) +
ggtitle(“Inflation in European Union”) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+
scale_x_date(date_breaks = “6 months”,
date_labels = “%b %y”,
expand = c(0.01,0.01)) +
scale_y_continuous(breaks=c(seq(0,max(df$Inflation),1))) +
geom_hline(aes(yintercept=2, color = “Target threshold 2%”),
size=1) +
geom_line(aes(y=rollmean(Inflation, 12, na.pad=TRUE), color = “MA(12)”),
show.legend = TRUE) +
theme(plot.title = element_text(hjust = 0.5),
text=element_text(size=12,family=”Comic Sans MS”)) +
theme(legend.position=”top”) +
scale_color_manual(name = “”,
values = c(“black”, “blue”, “red”),
labels = c(“Inflation”, “MA(12)”, “Target 2%”))

What exactly have we done? Again: the grammar of graphics and ggplot2 in particular is all about the layers, namely:

  1. geom_hline: This function is used to add a horizontal line to the plot. The yintercept the argument is set to 2, which means the line will be placed at the y-coordinate of 2%, which is a targeted threshold for the ECB. The color argument is set to "red", which means the line will be displayed in red color to make it stand out. The linetype argument is set to "dashed", which means the line will be displayed as a dashed line. The size argument is set to 1, which means the width of the line will be 1. Super simple, isn’t it? Hence, I will not comment on stuff like that further if you don’t mind (but if you do — please let me know);
  2. geom_line: This function is used to add a line to the plot. We used it already, but there is one interesting thing about it. The rollmean() function is used to calculate the moving average (package zoo) of the Inflation variable with a window size of 12. The na.pad=TRUE the argument is used to ensure that missing values are padded when calculating the rolling mean. In our case, we could omit it, but I decided to keep it so next time you re-use code, you will be able to properly set this attribute.
  3. theme(plot.title = element_text(hjust = 0.5), text=element_text(size=12,family="Comic Sans MS"))is used to set the title of the plot and the text of the plot. The hjust argument is used to set the horizontal justification of the title and the size argument is used to set the text size.
  4. theme(legend.position="top") is used to set the position of the legend in the plot. The argument legend.position="top" is used to place the legend at the top of the plot.
  5. scale_color_manual(name = "Legend", values = c("black", "red", "blue"), labels = c("Inflation", "Target threshold 2%", "MA(12)")): This function is used to create a manual legend for the plot with the specified name, values, and labels. The name argument is used to specify the name of the legend, the values argument is used to specify colors to be used in the legend, and the labels argument is used to specify the labels for each color in the legend.
Our implementation of chart for inflation in EU 1996–2022 with ggplot2
Output for the code above

Looks much better. At this stage, we should probably make pause, relax and explore the capabilities of ggplot2 and scales for visualization.

From an economical perspective. We see that current inflation levels are above the recent yearly averages and threshold, targeted by the ECB. But this is an aggregated view (overall for the EU), maybe we should dig deeper, into country-level data? Or we could find how other macroeconomic variables are behaving to shed more light on the reason or related consequences. In the next parts, we will explore both ideas.

Stay tuned.

As usual, the FULL code is available at the designated Github repo for your convenience: https://raw.githubusercontent.com/TheLordOfTheR/R_for_Applied_Economics/main/Part2.R

Time for conclusion

Now we can conclude that using R for analysis can be helpful in different areas of research, such as econometrics, financial analysis, and macroeconomics. For such purposes, we used ggplot2, based on “grammar of graphics”. It makes it easy to create a wide range of plots such as line plots, scatterplots, and many others.

With all the above information, you are now ready to create your own plots and visualize economic data effectively.

Further, we will talk more about the “grammar of graphics”, try to match inflation with other datasets that are relevant, find out how to JOIN data based on set of keys (package dplyr), clean data (package tidyr), check out country-level data for EU inflation (packages rvest and other) and plot them (package ggplot2 again), then we could prepare an output for reports (package stargazer), interactivity (gganimate, plotly), AI generated-comments to the text and plenty of other topics. Stay tuned.

Please clap 👏 and subscribe if you want to support me. Thanks!❤️‍🔥

--

--

Dima Diachkov

Balancing passion with reason. In pursuit of better decision making in economic analysis and finance with data science via R+Python