An overview of the FRED-MD database
FRED-MD is an open-source dataset of monthly U.S. macroeconomic indicators maintained by the Federal Reserve Bank of St. Louis. The FRED-MD dataset was introduced to provide a common benchmark for comparing model performance and to facilitate the reproducibility of research results [1]. The FRED-MD dataset includes eight different categories of macroeconomic indicators:
- Output and Income
- Labor Market
- Consumption and Orders
- Orders and Inventories
- Money and Credit
- Interest Rates and Exchange Rates
- Prices
- Stock Market
The time series included in the FRED-MD dataset are sourced from the Federal Reserve Economic Data (FRED) database, which is St. Louis Fed’s main, publicly available, economic database. The FRED-MD dataset applies different adjustments to the raw data sourced from FRED, such as seasonal adjustments, inflation adjustments and backfilling of missing values.
The FRED-MD dataset also takes into account data changes and revisions. For instance, in the main FRED database the same indicator can be released with different names and, potentially, be reported in different units, over different time periods. In the FRED-MD dataset each indicator is instead always represented by a single time series with a unique name and is always reported in the same units.
The FRED-MD dataset was released for the first time in 01–2015. At the time of its first release, the FRED-MD dataset contained 134 time series. As of 12–2023, the FRED-MD dataset contains 127 time series. 118 time series are included in all monthly releases from 01–2015 to 12–2023. The first date included in the FRED-MD dataset is 01–1959, even though a few time series start several years later.
The FRED-MD dataset is updated on a monthly basis. Each monthly release is referred to as a vintage. A different CSV file is released for each month. The CSV files can be downloaded from the URL below, where{year}
and {month}
are the year and month of the release.
"https://files.stlouisfed.org/files/htdocs/fred-md/monthly/{year}-{month}.csv"
Each CSV file contains the data from 01–1959 up to the previous month end. For instance, the 01-2015.csv
file contains the data from 01-1959 to 12-2014, the 02-2015.csv
file contains the data from 01-1959 to 01-2015, and so on.
The datasets released on a monthly basis since 01–2015 are referred to as real-time vintages. The authors have also made available the datasets from 08–1999 to 12–2014, which are referred to as historical vintages. The historical vintages can be downloaded from this link.
The first row of each CSV file includes the codes of the suggested transformations to be applied to the time series in order to make them stationary prior to using them in a statistical model. The transformation codes are defined as follows:
- no transformation
- first order difference
- second order difference
- logarithm
- first order logarithmic difference
- second order logarithmic difference
- percentage change
The FRED-MD dataset has been used extensively for forecasting US inflation. In [2] it was shown that a random forest model trained on the FRED-MD dataset outperforms several standard inflation forecasting models at different forecasting horizons. [3] expanded the analysis in [2] to include an LSTM model and found that it did not significantly outperform the random forest model. [4] applied different dimension reduction techniques to the FRED-MD dataset in order to forecast US inflation and found that autoencoders provide the best performance. In [5] it was shown that machine learning models trained on the FRED-MD dataset outperform the standard linear regression model in all considered forecasting periods.
Code
In this section, we provide the Python code for downloading and processing the FRED-MD dataset. We start by importing the dependencies.
import os
import pandas as pd
import numpy as np
After that we define a function for transforming the time series based on their assigned transformation code.
def transform_series(x, tcode):
'''
Transform the time series.
Parameters:
______________________________
x: pandas.Series
Time series.
tcode: int.
Transformation code.
'''
if tcode == 1:
return x
elif tcode == 2:
return x.diff()
elif tcode == 3:
return x.diff().diff()
elif tcode == 4:
return np.log(x)
elif tcode == 5:
return np.log(x).diff()
elif tcode == 6:
return np.log(x).diff().diff()
elif tcode == 7:
return x.pct_change()
else:
raise ValueError(f"unknown `tcode` {tcode}")
We can now define a function for downloading and, optionally, transforming the time series.
def get_data(year, month, transform=True):
'''
Download and (optionally) transform the time series.
Parameters:
______________________________
year: int
The year of the dataset vintage.
month: int.
The month of the dataset vintage.
transform: bool.
Whether the time series should be transformed or not.
'''
# get the dataset URL
file = f"https://files.stlouisfed.org/files/htdocs/fred-md/monthly/{year}-{format(month, '02d')}.csv"
# get the time series
data = pd.read_csv(file, skiprows=[1], index_col=0)
data.columns = [c.upper() for c in data.columns]
# process the dates
data = data.loc[pd.notna(data.index), :]
data.index = pd.date_range(start="1959-01-01", freq="MS", periods=len(data))
if transform:
# get the transformation codes
tcodes = pd.read_csv(file, nrows=1, index_col=0)
tcodes.columns = [c.upper() for c in tcodes.columns]
# transform the time series
data = data.apply(lambda x: transform_series(x, tcodes[x.name].item()))
return data
We can then use the above function for downloading the 12–2023 dataset vintage as follows:
dataset = get_data(year=2023, month=12, transform=False)
dataset.head(n=3)
dataset.tail(n=3)
A Python notebook with additional functions for working with the FRED-MD dataset is available in our GitHub repository.
References
[1] McCracken, M. W., & Ng, S. (2016). FRED-MD: A monthly database for macroeconomic research. Journal of Business & Economic Statistics, 34(4), 574–589. doi: 10.1080/07350015.2015.1086655.
[2] Medeiros, M. C., Vasconcelos, G. F., Veiga, Á., & Zilberman, E. (2021). Forecasting inflation in a data-rich environment: the benefits of machine learning methods. Journal of Business & Economic Statistics, 39(1), 98–119. doi: 10.1080/07350015.2019.1637745.
[3] Paranhos, L. (2023). Predicting Inflation with Recurrent Neural Networks. Working Paper.
[4] Hauzenberger, N., Huber, F., & Klieber, K. (2023). Real-time inflation forecasting using non-linear dimension reduction techniques. International Journal of Forecasting, 39(2), 901–921. doi: 10.1016/j.ijforecast.2022.03.002.
[5] Malladi, R. K. (2023). Benchmark Analysis of Machine Learning Methods to Forecast the US Annual Inflation Rate During a High-Decile Inflation Period. Computational Economics, 1–41. doi: 10.1007/s10614–023–10436-w.
Originally published at https://fg-research.com. See fg-research’s disclaimer.