Estimating Monthly G.D.P. Figures Via an Income Approach

Published in

LSEG Developer Community

23 min readJun 18, 2020

Only quarterly U.S.A. G.D.P. data is published; this article describes a method of estimating monthly such figures using monthly Total Compensation figures.

Introduction

Gross Domestic Product (G.D.P.) is often thought of and calculated with an Expenditure Approach such that GDP=C+I+G+(X−M) where C = Consumption, I = Investment, G = Government Spending, & (X–M) = Net Exports, but it is also possible to calculate it via a per worker Income Approach. With this approach, we may estimate G.D.P. Per Worker (G.D.P.P.W.) with Total Compensation Per Worker (T.C.P.W.) figures such that (as per Appendix 1 on GitHub’s Notebook):

where

We can then attempt to interpolate G.D.P.P.W. values at higher frequencies than they are officially released with higher wage/income figure releases (exempli gratia (e.g.): to compute monthly G.D.P. figures when only quarterly ones are released using monthly published Total Compensation data).

We have to be careful of a few difficulties outlined by Feldstein (2008) (MF hereon): “The relation between productivity [ id est (i.e.): G.D.P.] and wages has been a source of substantial controversy, not only because of its inherent importance but also because of the conceptual measurement issues that arise in making the comparison.” (Feldstein (2008)). MF outlines several key factors (KF hereon) to consider when studying the question:

“[An undeserved] focus on wages rather than total compensation. Because of the rise in fringe benefits and other noncash payments, wages have not risen as rapidly as total compensation. It is important therefore to compare the productivity rise with the increase of total compensation rather than with the increase of the narrower measure of just wages and salaries.” This is why this article will first focus on using Total Compensation (TC).
“[… The] nominal compensation deflated by the CPI is not appropriate for evaluating the relation between productivity and compensation […] this implies that the real marginal product of labor should be compared to the wage deflated by the product price and not by some consumer price index. […] The CPI differs from the nonfarm product price in several ways. The inclusion in the CPI, but not in the nonfarm output price index, of the prices of imports and of the services provided by owner occupied homes is particularly important.” This is why this article will first focus on using Nominal T.C. deflated (made ‘real’) with the Nonfarm Business Sector Output Price Index Deflator (NBSOPID).

Keeping these points in mind, we will attempt to investigate the relationship between the United States of America( U.S.A.)’s Real G.D.P. (R.G.D.P.) and U.S.A.’s Real National Total Compensation (R.N.T.C.) in order to construct a framework allowing us to estimate monthly R.G.D.P. data when only quarterly data is published.

Numerical Approximation

Background

per Appendix 1 (on GitHub’s Notebook):

It may seem needlessly convoluted to insert rt the way it was done above, but it was necessary due to the fact that we have rt estimates (rt with a hat) — we thus build our framework around it: since Total Compensation figures are released on a monthly basis, if we can construct an estimate for the linearly-time-variant rt we can then construct monthly G.D.P. figures (even though only quarterly ones are published).

This article will study the efficiency of the U.S.A. Real G.D.P. Per Worker To National Real Total Compensation Per Worker Ratio (RGDPPWTNRTCPWR) as the ratio rt.

Development Tools & Resources

The example code demonstrating the use case is based on the following development tools and resources:

Refinitiv’s DataStream Web Services (DSWS): Access to DataStream data. A DataStream or Refinitiv Workspace IDentification (ID) will be needed to run the code bellow.
Python Environment: Tested with Python 3.7
Packages: DatastreamDSWS, Numpy, Pandas, statsmodel, scipy and Matplotlib. The Python built in modules statistics, datetime and dateutil are also required.

Import libraries

# The ' from ... import ' structure here allows us to only import the
# module ' python_version ' from the library ' platform ':
from platform import python_version
print("This code runs on Python version " + python_version())

This code runs on Python version 3.7.4

We need to gather our data. Since Refinitiv’s DataStream Web Services (DSWS) allows for access to the most accurate and wholesome end-of-day (E.O.D.) economic database (DB), naturally it is more than appropriate. We can access DSWS via the Python library “DatastreamDSWS” that can be installed simply by using pip installpip install.

import DatastreamDSWS as DSWS# We can use our Refinitiv's Datastream Web Socket (DSWS) API keys that allows us to be identified by
# Refinitiv's back-end services and enables us to request (and fetch) data:# The username is placed in a text file so that it may be used in this code without showing it itself:
DSWS_username = open("Datastream_username.txt","r")
# Same for the password:
DSWS_password = open("Datastream_password.txt","r")ds = DSWS.Datastream(username = str(DSWS_username.read()), password = str(DSWS_password.read()))# It is best to close the files we opened in order to make sure that we don't stop any other
# services/programs from accessing them if they need to:
DSWS_username.close()
DSWS_password.close()

The following are Python-built-in module/library, therefore it does not have a specific version number.

import os # os is a library used to manipulate files on machines
import datetime # datetime will allow us to manipulate Western World dates
import dateutil # dateutil will allow us to manipulate dates in equations
import warnings # warnings will allow us to manupulate warning messages (such as depretiation messages)

The following are Python-built-in module/library, therefore it does not have a specific version number.

import statistics # This is a Python-built-in module/library, therefore it does not have a specific version number.
import numpy
import statsmodels
from statsmodels.tsa.api import ExponentialSmoothing, SimpleExpSmoothing, Holt
import scipy
from scipy import statsfor i,j in zip(["numpy", "statsmodels", "scipy"], [numpy, statsmodels, scipy]):
       print("The " + str(i) + " library imported in this code is version: " + j.__version__)

The numpy library imported in this code is version: 1.16.5
The statsmodels library imported in this code is version: 0.10.1
The scipy library imported in this code is version: 1.3.1

pandas will be needed to manipulate data sets

import pandas pandas.set_option('display.max_columns', None) # This line will ensure that all columns of our dataframes are always shown
print("The pandas library imported in this code is version: " + pandas.__version__)

The pandas library imported in this code is version: 0.25.1

Setup Original Functions

Setup Data Sets

The cell below defines a function that translates DataStream data into a shape we can work with: a Pandas Dafaframe normalised to the 1st of the month that it was collected in.

def Translate_to_First_of_the_Month(data, dataname):
    """This function will put Refinitiv's DataStream data into a Pandas Dafaframe
    and normalise it to the 1st of the month it was collected in."""
    
    # The first 8 characters of the index is the ' yyyy-mm- ', onto which will be added '01':
    data.index = data.index.astype("str").str.slice_replace(8, repl = "01")
    
    data.index = pandas.to_datetime(data.index)
    data.columns = [dataname]
    
    return data

The cell below defines a function that appends our monthly data-frame with chosen data

df = pandas.DataFrame([])
def Add_to_df(data, dataname):
    global df # This allows the function to take the pre-deffined variable ' df ' for granted and work from there
    DS_Data_monthly = Translate_to_First_of_the_Month(data, dataname)
    
    df = pandas.merge(df, DS_Data_monthly,
                      how = "outer",
                      left_index = True,
                      right_index=True)

Setup plot functions

The cell below defines a function to plot data on one y axis.

# Using an implicitly registered datetime converter for a matplotlib plotting method is no
# longer supported by matplotlib. Current versions of pandas requires explicitly
# registering matplotlib converters.
pandas.plotting.register_matplotlib_converters()def Plot1ax(dataset, ylabel = "", title = "", xlabel = "Year", datasubset = [0],
            datarange = False, linescolor = False, figuresize = (12,4),
            facecolor = "0.25", grid = True):
    """ Plot1ax Version 1.0:
    This function returns a Matplotlib graph with default colours and dimensions
    on one y axis (on the left as oppose to two, one on the left and one on the right).
    
    datasubset (list): Needs to be a list of the number of each column within
    the data-set that needs to be labelled on the left.
    
    datarange (bool): If wanting to plot graph from and to a specific point,
    make datarange a list of start and end date.
    
    linescolor (bool/list): (Default: False) This needs to be a list of the color of each
    vector to be ploted, in order they are shown in their data-frame from left to right.
    
    figuresize (tuple): (Default: (12,4)) This can be changed to give graphs of different
    proportions. It is defaulted to a 12 by 4 (ratioed) graph.
    
    facecolor (str): (Default: "0.25") This allows the user to change the
    background color as needed.
    
    grid (bool): (Default: "True") This allows us to decide whether or
    not to include a grid in our graphs.
    """
    
    if datarange == False:
        start_date = str(dataset.iloc[:,datasubset].index[0])
        end_date = str(dataset.iloc[:,datasubset].index[-1])
    else:
        start_date = str(datarange[0])
        
        if datarange[-1] == -1:
            end_date = str(dataset.iloc[:,datasubset].index[-1])
        else:
            end_date = str(datarange[-1])
    
    fig, ax1 = plt.subplots(figsize = figuresize, facecolor = facecolor)
    ax1.tick_params(axis = 'both', colors = 'w')
    ax1.set_facecolor(facecolor)
    fig.autofmt_xdate()
    plt.ylabel(ylabel, color = 'w')
    ax1.set_xlabel(str(xlabel), color = 'w')
    
    if linescolor == False:
        for i in datasubset: # This is to label all the lines in order to allow matplot lib to create a legend
            ax1.plot(dataset.iloc[:, i].loc[start_date : end_date],
                     label = str(dataset.columns[i]))
    else:
        for i in datasubset: # This is to label all the lines in order to allow matplot lib to create a legend
            ax1.plot(dataset.iloc[:, i].loc[start_date : end_date],
                     label = str(dataset.columns[i]),
                     color = linescolor)
    
    ax1.tick_params(axis='y')
    
    if grid == True:
        ax1.grid(color='black', linewidth = 0.5)
    
    ax1.set_title(str(title) + " \n", color='w')
    plt.legend()
    plt.show()

The cell below defines a function to plot data on two y axis.

def Plot2ax(dataset, rightaxisdata, y1label, y2label, leftaxisdata,
            title = "", xlabel = "Year", y2color = "C1", y1labelcolor = "C0",
            figuresize = (12,4), leftcolors = ('C1'), facecolor = "0.25"):
    """ Plot2ax Version 1.0:
    This function returns a Matplotlib graph with default colours and dimensions
    on two y axis (one on the left and one on the right)
    
    leftaxisdata (list): Needs to be a list of the number of each column within
    the data-set that needs to be labelled on the left
    
    figuresize (tuple): (Default: (12,4)) This can be changed to give graphs of
    different proportions. It is defaulted to a 12 by 4 (ratioed) graph
    
    leftcolors (str / tuple of (a) list(s)): This sets up the line color for data
    expressed with the left hand y-axis. If there is more than one line,
    the list needs to be specified with each line colour specified in a tuple.
    
    facecolor (str): (Default: "0.25") This allows the user to change the
    background color as needed
    """
    
    fig, ax1 = plt.subplots(figsize=figuresize, facecolor=facecolor)
    ax1.tick_params(axis = "both", colors = "w")
    ax1.set_facecolor(facecolor)
    fig.autofmt_xdate()
    plt.ylabel(y1label, color=y1labelcolor)
    ax1.set_xlabel(str(xlabel), color = "w")
    ax1.set_ylabel(str(y1label))
    
    for i in leftaxisdata: # This is to label all the lines in order to allow matplot lib to create a legend
        ax1.plot(dataset.iloc[:, i])
    
    ax1.tick_params(axis = "y")
    ax1.grid(color='black', linewidth = 0.5)
    
    ax2 = ax1.twinx()  # instantiate a second axes that shares the same x-axis
    color = str(y2color)
    ax2.set_ylabel(str(y2label), color=color)  # we already handled the x-label with ax1
    plt.plot(dataset.iloc[:, list(rightaxisdata)], color=color)
    ax2.tick_params(axis='y', labelcolor='w')
    
    ax1.set_title(str(title) + " \n", color='w')
    
    fig.tight_layout()  # otherwise the right y-label is slightly clipped
    plt.show()

The cell below defines a function to plot data regressed on time.

def Plot_Regression(x, y, slope, intercept, ylabel,
                    title="", xlabel="Year", facecolor="0.25", data_point_type = ".", original_data_color = "C1",
                    time_index = [], time_index_step = 48, figuresize = (12,4), line_of_best_fit_color = "b"):
    """ Plot_Regression Version 1.0:
    Plots the regression line with its data with appropriate default Refinitiv colours.
    
    facecolor (str): (Default: "0.25") This allows the user to change the background color as needed
    
    figuresize (tuple): (Default: (12,4)) This can be changed to give graphs of different proportions. It is defaulted to a 12 by 4 (ratioed) graph
    
    line_of_best_fit_color (str): This allows the user to change the background color as needed.
    
    ' time_index ' and ' time_index_step ' allow us to dictate the frequency of the ticks on the x-axis of our graph.
    """
        
    fig, ax1 = plt.subplots(figsize = figuresize, facecolor = facecolor)
    ax1.tick_params(axis = "both", colors = "w")
    ax1.set_facecolor(facecolor)
    fig.autofmt_xdate()
    plt.ylabel(ylabel, color = "w")
    ax1.set_xlabel(xlabel, color = "w")
    ax1.plot(x, y, data_point_type, label='original data', color = original_data_color)
    ax1.plot(x, intercept + slope * x, "r", label = "fitted line", color = line_of_best_fit_color)
    ax1.tick_params(axis = "y")
    ax1.grid(color='black', linewidth = 0.5)
    ax1.set_title(title + " \n", color='w')
    plt.legend()
    
    if len(time_index) != 0:
        # locs, labels = plt.xticks()
        plt.xticks(numpy.arange(len(y), step = time_index_step),
                   [i for i in time_index[0::time_index_step]])
    
    plt.show()

Setup Statistics Tools

def Single_period_Geometric_Growth(data):
    """ This function returns the geometric growth of the data
    entered at the frequency it is given. id est (i.e.): if
    daily data is entered, a daily geometric growth rate is returned.
    
    data (list): list including inc/float data recorded at a specific
    frequency.
    """
    
    data = list(data)
    
    # note that we use ' len(data) - 1 ' instead of ' len(data) ' as the former already lost its degree of freedom just like we want it to.
    single_period_geometric_growth = ( (data[-1]/data[0])**(1/((len(data))-1)) ) - 1
    
    return single_period_geometric_growth

The cell below defines a function that creates and displays a table of statistics for all vectors (columns) within any two specified datasets.

def Statistics_Table(dataset1, dataset2 = ""):
    """ This function returns a table of statistics for the one or two pandas
    data-frames entered, be it ' dataset1 ' or both ' dataset1 ' and ' dataset2 '.
    """
    
    Statistics_Table_Columns = list(dataset1.columns)
    
    if str(dataset2) != "":
        [Statistics_Table_Columns.append(str(i)) for i in dataset2.columns]
    
    Statistics_Table = pandas.DataFrame(columns = ["Mean",
                                                   "Mean of Absolute Values",
                                                   "Standard Deviation",
                                                   "Median", "Skewness",
                                                   "Kurtosis",
                                                   "Single Period Growth Geometric Average"],
                                        index = [c for c in Statistics_Table_Columns])
    
    def Statistics_Table_function(data):
        for c in data.columns:
            Statistics_Table["Mean"][str(c)] = numpy.mean(data[str(c)].dropna())
            Statistics_Table["Mean of Absolute Values"][str(c)] = numpy.mean(abs(data[str(c)].dropna()))
            Statistics_Table["Standard Deviation"][str(c)] = numpy.std(data[str(c)].dropna())
            Statistics_Table["Median"][str(c)] = numpy.median(data[str(c)].dropna())
            Statistics_Table["Skewness"][str(c)] = scipy.stats.skew(data[str(c)].dropna())
            Statistics_Table["Kurtosis"][str(c)] = scipy.stats.kurtosis(data[str(c)].dropna())
            
            if len(data[str(c)].dropna()) != 1 or data[str(c)].dropna()[0] != 0: # This if statement is needed in case we end up asking for the computation of a value divided by 0.
                Statistics_Table["Single Period Growth Geometric Average"][str(c)] = Single_period_Geometric_Growth(data[str(c)].dropna())
    
    Statistics_Table_function(dataset1)
    
    if str(dataset2) != "":
        Statistics_Table_function(dataset2)
    
    return Statistics_Table

U.S.A. Working Population Growth

We will use Yearly Geometric Growth rate when looking at geometric growth rates throughout this article, as outlined in Appendix 2.

We proportion of population which works. This includes only employed adults, excluding children (~ 20% of U.S.A.P.) and the elderly (~ 14% of U.S.A.P.). The cell below adds the United States of America’s Working Population (USAWP) values to the data-frame ‘df’ and plots our population data this far:

df = Translate_to_First_of_the_Month(data = ds.get_data(tickers = 'USEMPTOTO', fields = "X", start = '1950-01-01', freq = 'M') * 1000,
                                     dataname = "USAWP (monthly data)")

Next we take the USA Working Population’s first Difference, i.e.: USAWPDt = USAWPt — USAWPt-1

USAWPD = pandas.DataFrame(df["USAWP (monthly data)"].diff())
USAWPD.rename(columns = {'USAWP (monthly data)':'USAWPD (monthly data)'}, inplace = True)
df = pandas.merge(df, USAWPD, how = 'outer', left_index = True, right_index = True)

U.S.A.’s Working Population’s single period (in this case: monthly) Geometric Growth is then defined as USAWPG as per Appendix 2:

# Create our USAWPYGG (note that due to the use of logarythms in the scipy.stats.gmean function/class, we cannot use it here without yeilding an error)
USAWPYGG = ((1 + Single_period_Geometric_Growth(data = df["USAWP (monthly data)"].dropna()) )**12)-1
df["USAWPG (monthly data)"] = df["USAWPD (monthly data)"] / df["USAWP (monthly data)"]Plot2ax(title = "USA Working Population (in blue) and its Growth (in orange) (USAWP & USAWPG) over the years (monthly data)",
        dataset = df, leftaxisdata = [-3], rightaxisdata = [-1], y1label = "USAWP", y2label = "USAWPG", leftcolors = "w")
print("Our U.S.A.'s Working Population' Yearly Geometric Growth is " + str(USAWPYGG) + ", i.e.: " + str(USAWPYGG*100) + "%")

Our U.S.A.’s Working Population’ Yearly Geometric Growth is 0.012018187303719507, i.e.: 1.2018187303719507%

Total Compensation

As per MF’s first KF, we use U.S.A. Nominal National Total Compensation (N.N.T.C.) — the nominal value of which is derived from National Nominal Personal Income Accounts. Note that this dataset is the inflows into accounts; this means that we do not need to include incomes from assets. Note also that deflators used to compute ‘real’ figures form ‘nominal’ are defined in Appendix 3.

National Nominal Total Compensation

We can now add out National Nominal Total Compensation (N.N.T.C.) data in our data-frame and plot if for good measure.

Add_to_df(data = ds.get_data(tickers = "USPERINCB",
                             fields = "X",
                             start = '1950-01-01', freq = 'M').dropna() * 1000000000,
          dataname = "NNTC (monthly data)")df["NNTC in trillions (monthly data)"] = df["NNTC (monthly data)"] / 1000000000000Plot1ax(title = "U.S.A. National Nominal Total Compensation (monthly data)",
        dataset = df, datasubset = [-1], ylabel = "Nominal  $ in trillions",
        linescolor = "#ff9900")

As per MF, Nonfarm Business Sector Output Price Index Deflator (NBSOPID) is used to calculate U.S.A.’s Real Total Compensation figures.

Since the incoming data is indexed on 2012 such that the ‘RAW’ and unaltered value from DataStream is RAWNBSOPIDt(2012−01−01) = 100 where t(2012−01−01) is the time subscript value for 01 January 2012, we define 1/NBSOPIDt as RAWNBSOPIDt/RAWNBSOPIDT. Note that the inversion of NBSOPIDt was only necessary as the raw values were inverted (such that NBSOPIDt = RAWNBSOPIDTRAWNBSOPIDt). NBSOPID figures are released every quarter — that’s fine since we don’t expect inflation to change too much from one month to the other, so quarterly data is sufficient.

NBSOPID = ds.get_data(tickers = "USRCMNBSE",
                      fields = "X",
                      start = '1950-01-01',
                      freq = 'M').dropna()
# Our NBSOPID is released quarterly, so we have to shape it into a monthly database. The function below will allow for this:
def Quarterly_To_Monthly(data, dataname, deflator = False):
    
    data_proper, data_full_index = [], []
    for i in range(0, len(data)):
        [data_proper.append(float(data.iloc[j])) for j in [i,i,i]]
        [data_full_index.append(datetime.datetime.strptime(data.index[i][0:8] + "01", "%Y-%m-%d") + dateutil.relativedelta.relativedelta(months =+ j)) for j in range(0,3)]
    
    data = pandas.DataFrame(data = data_proper, columns = [dataname], index = data_full_index)
    
    if deflator == True:
        data = pandas.DataFrame(data = 1 / (data / data.iloc[-1]), columns = [dataname], index = data_full_index)
    
    return data
Add_to_df(data = Quarterly_To_Monthly(data = NBSOPID,
                                      deflator = True,
                                      dataname = "NBSOPID"),
          dataname = "NBSOPID (quarterly data)")

U.S.A. National Real Total Compensation is defined as NRTCt = NBSOPIDt * NNTCt

df["NRTC (monthly data)"] = df["NBSOPID (quarterly data)"] * df["NNTC (monthly data)"]df["NRTC in trillions (monthly data)"] = df["NRTC (monthly data)"] / 1000000000000

U.S.A. National Real Total Compensation Per Worker is defined as NRTCPWt = NRTCt / USAPt

df["NRTCPW (monthly data)"] = df["NRTC (monthly data)"] / df["USAWP (monthly data)"]

How does Total Compensation compare to G.D.P. through time?

The cell below adds U.S.A. Real G.D.P.’s quarterly data (both every quarter and every month) in net and in Trillions to the Pandas data-frame ‘df’:

Add_to_df(data = ds.get_data(tickers = "USGDP...B",
                             fields = "X",
                             start = '1950-01-01',
                             freq = 'Q').dropna() * 1000000000,
          dataname = "RGDP (quarterly data)")Add_to_df(data = Quarterly_To_Monthly(data = ds.get_data(tickers = "USGDP...B",
                                                         fields = "X",
                                                         start = '1950-01-01',
                                                         freq = 'Q').dropna() * 1000000000 ,
                                      dataname = "RGDP (monthly data)"),
          dataname = "RGDP (monthly data)")df["RGDP in trillions (quarterly data)"] = df["RGDP (monthly data)"] / 1000000000000

The cell below adds U.S.A. Real G.D.P. per Capita (RGDPPC) and Per Worker (RGDPPW) to our Pandas data-frame (df) and plots our data concisely:

df["RGDPPW (monthly data)"] = df["RGDP (monthly data)"] / df["USAWP (monthly data)"]

The cell below adds U.S.A. Real G.D.P. Per Worker To National Real Total Compensation Per Worker Ratio (RGDPPWTNRTCPWR) to our Pandas data-frame (df) and plots it:

df["RGDPPWTNRTCPWR (monthly data)"] = df["RGDPPW (monthly data)"] / df["NRTCPW (monthly data)"] Plot2ax(dataset = df, leftaxisdata = [8,-2], y1labelcolor = "w", rightaxisdata = [-1], title = "U.S.A. Real G.D.P. Per Worker (yellow) To National Real Total Compensation Per Worker (blue) Ratio (green) (monthly data)", y1label = "Today's $", y2color = "#328203", y2label = "RGDPPWTNRTCPWR", leftcolors = "w")

The graph above displays G.D.P. and Total Compensation per worker figures as well as the ratio between the two (the latter is shown in green). It is this ratio that we use as our ‘r’ is the estimation model:

Modelling:

The aim of our investigation is to find a ratio ‘r’ such that

Since (as per Appendix 1 on GitHub) we deffined rt = (RGDPPWt / RNTCPWt), rt-1 values do not need to be forecasted or modeled — we have the data needed to compute rt-1 = (RGDPPWt-1 / RNTCPWt-1).

rt values — however — require values of RGDPPWt for month t, which are not published: this is where an estimate of rt hat comes in useful.

How may we model our estimate of rt? The first models that come to mind are linear regression models; the below investigates the regression of U.S.A. Real G.D.P. Per Worker To National Real Total Compensation Ratio (RGDPPWTNRTCPWR) against time (i.e.: months).

Regression of U.S.A. Real G.D.P. Per Worker To National Real Total Compensation Per Worker Ratio against time

Here we use RGDPPWTNRTCPWR as ‘r’ such that

xt is time in months, and c & beta are computed to reduce the error of this model via Ordinary Least Square estimation:

# First: Fit the trend
x = numpy.array([numpy.float64(i) for i in range(0, len(df["RGDPPWTNRTCPWR (monthly data)"].dropna()))])
y = numpy.array([numpy.float64(i) for i in df["RGDPPWTNRTCPWR (monthly data)"].dropna()])
(slope, intercept, r_value, p_value, std_err) = (scipy.stats.linregress(x = x, y = y))
df["RGDPPWTNRTCPWR regressed on months"] = (intercept + slope * x) - (df["RGDPPWTNRTCPWR (monthly data)"].dropna() * 0)
df["RGDPPWTNRTCPWR regressed on months' errors"] = df["RGDPPWTNRTCPWR regressed on months"] - df["RGDPPWTNRTCPWR (monthly data)"].dropna()# Second: Plot it:
Plot_Regression(title = "U.S.A. Real G.D.P. Per Worker To National Real Total Compensation Per Worker Ratio (monthly data) (in orange) regressed on time (in blue)",
                x = x, y = y, slope =  slope, intercept =  intercept,
                data_point_type = "-", ylabel = "RGDPPWTNRTCPWR",
                time_index_step = 12*10, figuresize = (16,4),
                time_index = df["RGDPPWTNRTCPWR (monthly data)"].dropna().index)# Third: Evaluate it:
Statistics_Table(dataset1 = df).loc[["RGDPPWTNRTCPWR (monthly data)",
                                     "RGDPPWTNRTCPWR regressed on months",
                                     "RGDPPWTNRTCPWR regressed on months' errors"],
                                    ["Mean of Absolute Values"]]

The statistics table shows a healthily low absolute (as in: ignoring the sign) error mean value for RGDPPWTNRTCPWR’s regression model of 0.0241528. We seem to have gathered evidence that a time-linear model may be a good way to estimate r and then monthly G.D.P.. Remember that we will measure the effectiveness of our models by how they estimate G.D.P. values at lower frequencies than they are published by official bodies (here: monthly, using monthly released Total Compensation data); an accurate estimation of our ratio is therefore paramount. One may thus attempt to look at estimates produced by methods other than straight forward linear regressions, e.g.: the Holt-Winters method. Let’s investigate that method too:

Holt-Winters method

All estimation methods are to be used recursively. The Holt-Winters method makes sure that forecasts are heavily influenced by latest values by design. It offers better (i.e.: lower) model error (see model fit graph below) than our linear regressions too. \ Nota Bene (N.B.): A non multiplicative Holt-Winters method is used here as computational limits mean that multiplicative Holt-Winters methods result in very many ‘NaN’ estimates that are unusable.

The cell below creates recursive RGDPPWTNRTCPWR Holt-Winters exponential smoothing estimates (RGDPPWTNRTCPWRHW); this means that we are using RGDPPWTNRCTPWRt-1 as rt-1 and RGDPPWTNRCTPWRt as rt hat:

# The following for loop will throw errors informing us that the unspecified Holt-Winters Exponential Smoother
# parameters will be chosen by the default ' statsmodels.tsa.api.ExponentialSmoothing ' optimal ones.
# This is preferable and doesn't need to be stated for each iteration in the loop.
warnings.simplefilter("ignore")RGDPPWTNRTCPWRHW_model_fit = statsmodels.tsa.api.ExponentialSmoothing(df["RGDPPWTNRTCPWR (monthly data)"].dropna(),
                                                                      seasonal_periods = 12,
                                                                      trend = 'add',
                                                                      damped=True).fit(use_boxcox=True)df["RGDPPWTNRTCPWRHW fitted values (monthly data)"] = RGDPPWTNRTCPWRHW_model_fit.fittedvalues
df["RGDPPWTNRTCPWRHW forecasts (monthly data)"] = RGDPPWTNRTCPWRHW_model_fit.forecast(12*4)
df["RGDPPWTNRTCPWRHW fitted values' real errors (monthly data)"] = df["RGDPPWTNRTCPWRHW fitted values (monthly data)"] - df["RGDPPWTNRTCPWR (monthly data)"]# See the HW model's forecasts:
RGDPPWTNRTCPWRHW_model_fit.forecast(12)

2020–04–01: 1.143114
2020–05–01:1.143141
2020–06–01: 1.143158
2020–07–01: 1.143169
2020–08–01: 1.143176
2020–09–01: 1.143180
2020–10–01: 1.143183
2020–11–01: 1.143185
2020–12–01: 1.143186
2021–01–01: 1.143186
2021–02–01: 1.143187
2021–03–01: 1.143187
Freq: MS, dtype: float64

# Plot the newly created Exponential Smoothing data
fig1, ax1 = plt.subplots(figsize=(12,4), facecolor="0.25")
df["RGDPPWTNRTCPWR (monthly data)"].dropna().plot(style = '-',
                                                  color = "blue",
                                                  legend = True).legend(["RGDPPWTNRTCPWR (monthly data)"])RGDPPWTNRTCPWRHW_model_fit.fittedvalues.plot(style = '-',
                                             color = "C1",
                                             label = "RGDPPWTNRTCPWRHW model fit",
                                             legend = True)RGDPPWTNRTCPWRHW_model_fit.forecast(12).plot(style = '-',
                                             color = "#328203",
                                             label = "RGDPPWTNRTCPWRHW model forecast",
                                             legend = True)ax1.set_facecolor("0.25")
ax1.tick_params(axis = "both", colors = "w")
plt.ylabel("Ratio", color = "w")
ax1.set_xlabel("Years", color = "w")
ax1.grid(color = "black", linewidth = 0.5)ax1.set_title("Forecasting U.S.A. Real G.D.P. Per Worker To Real Income Per Worker Ratio (RGDPPWTNRTCPWR) of properties" +
              "\nusing the Holt-Winters method (HW) (monthly data) (forecasts in green)" +
              " \n", color='w')ax1.legend()
plt.show()warnings.simplefilter("default") # We want our program to let us know of warnings from now on; they were only disabled for the for loop above.Statistics_Table(dataset1 = df).loc[["RGDPPWTNRTCPWR regressed on months' errors",
                                     "RGDPPWTNRTCPWRHW fitted values' real errors (monthly data)"],
                                    ["Mean of Absolute Values"]]

The Holt-Winters model’s estimates — r hat — (in yellow) closely follow the realised RGDPPWTNRTCPWR values. This only holds untrue at the start (approximately for the first 2 years) when the model needs training. The (short) green line shows monthly forecasts for the next 4 years computed with this Holt-Winters model. \ The statistics table clearly hows that the Holt -Winters model performs much better than the time-linear regression model: its errors are — on average — a lot lower (i.e: 0.00773767<0.0241528).

Backtesting our Method

The Python function defined in the cell below returns the In-Sample Recursive Prediction values of ‘ data ‘ using the Holt-Winters exponential smoother Plus it then provides forecasts for One Out-Of-Sample period (thus the acronym ‘ISRPHWPOOOF’):

def ISRPHWPOOOF(data_set, dataset_name, start, result_name, enforce_cap = False, cap = 1.1):
    """ This is a function created specifically for this article/study. returns the In-Sample Recursive
    Prediction values of ' data ' using the Holt-Winters exponential smoother Plus it then provides
    forecasts for One Out-Of-Sample period (thus the acronym 'ISRPHWPOOOF').
    
    data_set (pandas dataframe): Pandas data-frame where the data is stored
    
    dataset_name (str): Our data is likely to be in a pandas dataframe with dates
    for indexes in order to alleviate any date sinking issues. As a result, this
    line allows the user to specify the column within such a dataset to use.
    
    start (str): The start of our in-sample recursive prediction window. This value can be changed.
    
    result_name (str): name given to the result column.
    
    enforce_cap (bool): Set to False by default. The H-W model sometimes returns extremely large
    values that do not fit in our set. This is a convergence issue. Applying an arbitrary cap allows
    us to account for this.
    
    cap (int/float): Set to ' 1.1 ' by default. This is the cap applied to the returned values if
    ' enforce_cap ' is set to True.
    """
    
    # The following for loop will throw errors informing us that the unspecified Holt-Winters
    # Exponential Smoother parameters will be chosen by the default
    # ' statsmodels.tsa.api.ExponentialSmoothing ' optimal ones.
    # This is preferable and doesn't need to be stated for each iteration in the loop.
    warnings.simplefilter("ignore")
    
    
    # We create an empty list (to get appended/populated)
    # for our dataset's forecast using the Holt-Winters exponential smoother.
    data_model_in_sample_plus_one_forecast = []
    
    # Empty lists to be populated by our forecasts (in- or out-of-sample)
    value, index = [], []
    
    # the ' len(...) ' function here returns the number of rows in our dataset from our starting date till its end.
    for i in range(0,len(data_set[dataset_name].loc[pandas.Timestamp(start):].dropna())):
        # This line defines ' start_plus_i ' as our starting date plus i months
        start_plus_i = str(datetime.datetime.strptime(start, "%Y-%m-%d") + dateutil.relativedelta.relativedelta(months=+i))[:10]
        
        HWESi = statsmodels.tsa.api.ExponentialSmoothing(data_set[dataset_name].dropna().loc[:pandas.Timestamp(start_plus_i)],
                                                         trend = 'add', seasonal_periods = 12,
                                                         damped = True).fit(use_boxcox = True).forecast(1)
        
        # This(following) outputs a forecast for one period ahead (whether in-sample or out of sample).
        # Since we start at i = 0, this line provides a HW forecast for i at 1; similarly, at the
        # last i (say T), it provides the first and only out-of-sample forecast (where i = T+1).
        data_model_in_sample_plus_one_forecast.append(HWESi)
        
        try:
            if enforce_cap == True:
                if float(str(HWESi)[14:-25]) > cap:
                    value.append(float(str(HWESi)[14:-25]))
                else:
                    value.append(numpy.nan) # If the ratio
            else:
                value.append(float(str(HWESi)[14:-25]))
        
        except:
            # This adds NaN (Not a Number) to the list ' values ' if there is no value to add.
            # The statsmodels.tsa.api.ExponentialSmoothing function sometimes (rarely) outputs NaNs.
            value.append(numpy.nan)
        finally:
            # This adds our indecies (dates) to the list ' index '
            index.append(str(HWESi)[:10])
    
    # We want our program to let us know of warnings from now on; they were only disabled for the for loop above.
    warnings.simplefilter("default")
    
    return pandas.DataFrame(data = value, index = index, columns = [result_name])

We will back-test our models from January 1990

start = "1990-01-01"df["RGDPPWTNRTCPWRHW fitted values' real errors from " + str(start) + " (monthly data)"] = df["RGDPPWTNRTCPWRHW fitted values (monthly data)"].loc[start:] - df["RGDPPWTNRTCPWR (monthly data)"].loc[start:]
# Model:
RGDPPWTNRTCPWRHW_model_in_sample_plus_one_forecast = ISRPHWPOOOF(data_set = df,
                                                                 dataset_name = "RGDPPWTNRTCPWR (monthly data)",
                                                                 start = start,
                                                                 result_name = "RGDPPWTNRTCPWRHW model in sample plus one forecast")df = pandas.merge(df,  RGDPPWTNRTCPWRHW_model_in_sample_plus_one_forecast,
                  how = "outer", left_index = True, right_index = True)# Model's errors:
RGDPPWTNRTCPWRHW_model_in_sample_plus_one_forecast_error = pandas.DataFrame(df["RGDPPWTNRTCPWR (monthly data)"] - df["RGDPPWTNRTCPWRHW model in sample plus one forecast"],
                                                                            columns = ["RGDPPWTNRTCPWRHW model in sample plus one forecast error"])df = pandas.merge(df, RGDPPWTNRTCPWRHW_model_in_sample_plus_one_forecast_error.dropna(),
                  how = "outer", left_index = True, right_index = True)r = df["RGDPPWTNRTCPWR (monthly data)"].dropna()
r_f = df["RGDPPWTNRTCPWRHW model in sample plus one forecast"].dropna()
df["r"] = r
df["r_f"] = r_fPlot1ax(df, datasubset = [-1,-2], datarange = ["1990-01-01",-1], ylabel = "Ratio",
        title = "Real G.D.P. Per Worker To National Real Total Compensation Per Worker Ratio realised (r) and\nmodeled estimates using forecasts (r_f) starting on Jan. 1990",)

Modeled estimates (r_f) follow the realised r values so closely that it is difficult to differentiate them.

RGDPPW_estimate = (r_f / r.shift(1)) * (df["NRTCPW (monthly data)"] / df["NRTCPW (monthly data)"].shift(1)) * df["RGDPPW (monthly data)"].shift(1)
df["RGDPPW estimates"] = RGDPPW_estimate.dropna()

Yet again, looking at the realiations and model outputs on the same graph(s), it is difficult to see the model outputs because they follow realisations so well; the errors’ graph and table shines a light on how small the error is:

Plot1ax(df, datasubset = [12,-1], datarange = ["1990-01-01",-1], ylabel = "Today's $", title = "Real G.D.P Per Worker Realised and Estimated Figures starting on Jan. 1990")
df["RGDPPW estimates' errors"] = df["RGDPPW (monthly data)"] - df["RGDPPW estimates"]
df["RGDPPW estimates' proportional errors"] = df["RGDPPW estimates' errors"] / df["RGDPPW (monthly data)"]Plot2ax(dataset = df, leftaxisdata = [-2], rightaxisdata = [-1], title = "Real G.D.P Per Worker Estimates' Real (blue) and Proportional (orange) Errors",
        y1label = "RGDPPW estimates' real errors", y2label = "RGDPPW estimates' proportional errors", leftcolors = "w")

The difference between the lowest and 2nd lowest (as well as the highest and 2nd highest) troughs (and peaks) is less significant in the Proportional Error graph than in the straight forward Error graph.

Since R.G.D.P. figures are only released quarterly, it is best to investigate errors at each quarter

df["RGDPPW estimates' errors every quarter"] = df["RGDP (quarterly data)"] - df["RGDPPW estimates"]
Statistics_Table(dataset1 = df).loc[["RGDPPW estimates' errors", "RGDPPW estimates' proportional errors"],["Mean of Absolute Values"]]

Our model gave a healthily low error of 0.662307%.

Conclusion

We may now compute monthly estimates of G.D.P. figures using an Income Approach; since:

then:

df["RGDP estimates"] = df["RGDPPW estimates"] * df["USAWP (monthly data)"]
df[["RGDP (monthly data)", "RGDP estimates"]].dropna()

It would be redundant testing the validity of these figures since we already did so when investigating Real G.D.P. Per Worker estimates above.

Note also that for values today — at time T — and/or close to today (e.g.: next month), RGDP is approximately equal to GDP, and population growth is negligable, such that USAWPT is approximately equal to USAWPT+1, then: