How to Fetch Multiple Economic Datasets from the ECB Database with R

Dima Diachkov
7 min readApr 29, 2023

--

Hello, fellow data enthusiasts and R users! Welcome back to another episode (#22) of “R for Applied Economics” guide, where we have a good time exploring data and economics at the same time. 😄 In the previous article, we explored the R package ‘ecb’. In this article, we will continue to learn how to use the powerful ‘ecb’ package to fetch multiple datasets at once. So, let’s dive into part 22 of “R for Applied Economics” guide.

Today we will explore inflation datasets by country and product type and find out why coffee is better than wine in the current economic environment.

Source: Janko Ferlič on Unsplash

Quick Prep

Before we start, make sure you have installed and loaded the ‘ecb’ package. If you haven’t, here are the commands you need to run:

install.packages("ecb")
library(ecb)

Design the Query

All is set. Let’s tune our query and we will make it based on the example from the last part. Last time we used “ICP.M.U2.N.000000.4.ANR” to access data for inflation in EU (recap of part 21). Today I want to explain these sets of characters, separated with dots, and why this key gives us what we want.

Basically, the key is indeed a key, that gradually leads us into the deepth of dataset. Adding more characters after each dots makes us query more and more specific.

In order to understand the meaning of this items there are two ways:
1) go to https://sdw.ecb.europa.eu/, find data you need, then copy-paste its key;

2) go to https://sdw.ecb.europa.eu/, find data you need and opens its metadata to see all definitons for each items.

First approach is okay when you need only one indicator (for example, one-off task to parse data on inflation).

SDW page for the key ICP.M.U2.N.000000.4.ANR

You just copy series key and good to go. But second appoach works for all other cases (for example, when you need to create a customized query which parses inflation data by country and with different start date every time. To do that and to understand what exactly we are doing, we need to open “Data structure definition”, which sheds light on the key structure.

Data structure definitions

Here we have a list of cube dimensions and additional attributes. We will focus on cube dimensions because it helps us to specify our key.

For example, if you click on codelist description for frequencies “CL FREQ”, you will get a list of all possible values for frequency.

Examples of possible frequency values

But this list contains ALL possible values, applicable to all datasets of ECB. It does not mean that data is available for all of them and you need to check metadata pages anyway. But once you know what is available, you immediately know how to request it. Plain and simple.

So the whole key we used can be broken down into:

  • ICP — dataset family “Indices of Consumer prices”
  • M — data frequency “Monthly”
  • U2 — reference area “Euro area (changing composition)”
  • N — adjustment type “Neither seasonally nor working day adjusted”
  • 000000 — classification context “HICP — Overall index”
  • 4 — Institution originating the data flow “Eurostat”
  • ANR — Series variation “Annual rate of change”

So if you want to change some indicators — just go for it! Play with it. For example, you can request i.e. annual data with average growth rates, simply do that. At first, navigating the ECB website will be almost unavoidable but you will get used to it and start to notice some patterns.

how to change the query to get another dataset

But the benefit of this DSD/metadata keys is following. You can request multiple datasets at the same time. You just concatenate them with a “plus” sign. But again — you need to make sure that such request is fulfillable (of course if is does not work — then your will just get HTTP 404 error and try again).

So, to fetch multiple datasets with the ecb package, we need to create a query with a list of dataset codes, separated by a ‘+’ sign. This is like inviting all your favorite datasets to a party. So, let’s imagine that we again have to parse data on inflation in specific countries in EU (let’s say Germany, France, Spain, Italy and Netherlands) and EU as a whole. So when we send request for data, we specify more territories and concatenate them with a plus sign “DE+FR+ES+IT+NL+U2” instead of just “U2”.

# Define the SDW series key for all countries and filter options
key <- “ICP.M.DE+FR+ES+IT+NL+U2.N.000000.4.ANR”
filter <- list(detail = “full”)

But how did I know which country code to put? Easy, as I said earlier I just went for the dictionary of countries in DSD scheme. Example is below:

Example of country codes from DSD scheme

So let’s just run our original code but with a new country list. To check out what we have now, we just plot it as a laser chart with a country breakdown.

# Get the data from ECB Data Warehouse and convert dates
df <- get_data(key, filter)
df$obstime <- convert_dates(paste0(df$obstime,"-01"))


# Create the plot of inflation in the EU
ggplot(df, aes(x = obstime, y = obsvalue, color = ref_area)) +
geom_line(group = df$ref_area) + # Add a line to the plot
theme_bw(8) + # Use black and white theme
theme(legend.position = "bottom") + # Position legend at the bottom
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + # Rotate x-axis text
labs(x = NULL, y = "Percent per annum\n", # Set axis labels and title
title = "Inflation in EU", color = "Territory")
The output for the code above

So it works. We have extracted data by country. Now let’s try to find data on some other indicators from the ICP dataset. For example, let’s compare inflation dynamics to food inflation, especially — wine and coffee price change in the EU. So again, we go to the DSD scheme, find the codes, add them to our code, and run. Meanwhile, we decode technical codes into human-friendly indicator names.

# Define the new SDW series key and filter options
key2 <- “ICP.M.U2.N.000000+010000+021200+012110.4.ANR”
filter2 <- list(detail = “full”)

convert_code <- function(code){
switch(code,
"000000" = "Overall inflation",
"010000" = "Food inflation",
"021200" = "Wine inflation",
"012110" = "Coffee inflation",
"Invalid code")
}

# Get the data from ECB Data Warehouse and convert dates
df2 <- get_data(key2, filter2)
df2$indicator <- apply(df2["icp_item"], 1, convert_code)
df2$obstime <- convert_dates(paste0(df2$obstime,"-01"))

# Create the plot of inflation in the EU
ggplot(df2, aes(x = obstime, y = obsvalue, color = indicator)) +
geom_line() + # Add a line to the plot
theme_bw(8) + # Use black and white theme
theme(legend.position = "bottom") + # Position legend at the bottom
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + # Rotate x-axis text
labs(x = NULL, y = "Percent per annum\n", # Set axis labels and title
title = "Inflation in EU", color = "Product group")
The output for the code above

Here we see that overall has started to go down in the second half of 2022, prices for food in general skyrocketed and still do. 🚀 Fortunately for me (an incorrigible coffee person ☕), prices for coffee started to decrease following the overall inflation, even though they grew faster. In contrast, prices for wine are constantly growing from 2021–2022 and do not show any signs of improvement…Why am I showing you this? Give up wine, and switch to coffee while you have a chance. 😉🍷➡️☕

And finally, we can make complex queries and combine quiery. Let’s access specific data for Spain’s and Italy’s prices for coffee and wine.

# Define the new SDW series key and filter options
key3 <- "ICP.M.IT+ES.N.021200+012110.4.ANR"
filter3 <- list(detail = "full")

# Get the data from ECB Data Warehouse and convert dates
df3 <- get_data(key3, filter3)
df3$indicator <- apply(df3["icp_item"], 1, convert_code)
df3$obstime <- convert_dates(paste0(df3$obstime,"-01"))

# Create the plot of inflation in the EU
ggplot(df3, aes(x = obstime, y = obsvalue, color = indicator)) +
facet_wrap(~ref_area) +
geom_line() + # Add a line to the plot
theme_bw(8) + # Use black and white theme
theme(legend.position = "bottom") + # Position legend at the bottom
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + # Rotate x-axis text
labs(x = NULL, y = "Percent per annum\n", # Set axis labels and title
title = "Coffee and Wine Inflation in Italy and Spain", color = "Product group")
The output for the code above

Nice. Now we can compare and extract data for any product groups with various backgrounds. You can take any dataset to play, explore, modify, and add it to your applications or research.👌

Wrap-up

And that’s it! We’ve covered how to fetch multiple datasets using the ‘ecb’ package and created interesting plots to compare different economic indicators. Keep exploring and having fun with data!

Please clap 👏 and subscribe if you want to support me. Thanks!❤️‍🔥

--

--

Dima Diachkov

Balancing passion with reason. In pursuit of better decision making in economic analysis and finance with data science via R+Python