RL in Economics

7 min readJul 3, 2022

Economics

Economics is a multi agent problem where each agent is interacting with other agents trying to maximise its own utility. There has been an explosion in the application of reinforcement learning in economics. Reinforcement learning can be applied to various areas in problems in economics. This is a recent survey on areas where RL can be applied to Economics. Some of the key areas where RL can be applied are the following

Modelling consumption and income dynamics : Agent tries to optimise its wealth based on a time dependent income stream, expenses and taxation.
Bounded Rationality : Bounded rationality is the idea that rationality is limited when individuals make decisions: humans’ “preferences are determined by changes in outcomes relative to a certain reference level” . This can be modelled using RL.
Rational Expectations : In economics, “rational expectations” are model-consistent expectations, in that agents inside the model are assumed to “know the model” and on average take the model’s predictions as valid. Rational expectations ensure internal consistency in models involving uncertainty.
Multi Agent / Game Theory : Economic problems can be modelled as multi agent game where each agent is trying to maximise its own reward. Various RL algorithms have been found to work in multi agent settings

There has been a recent success in applying RL to economics. We will look at few of the recent papers. In the next few blog series we will review the following papers

Reinforcement Learning : An overview

Reinforcement Learning is a framework for sequential decision making. In this setup an agent continuously interacts with an environment, performing an action at each step and receiving a reward at each step.

The goal of the agent is to maximise the cumulative reward

Unlike Supervised Learning , Reinforcement Learning doesn’t used a dataset to learn but the agent learns by interacting with the environment using trial and error.

Basic concepts / terms in RL

Agent : This is the model which interacts with the Environment and learns using the reward it receives. Usually approximated using a neural network in Deep RL.
Environment : This is the system the agent is acting on and trying to control. The Environment responds to the agents action and transitions to the next state and also returns a reward to the agent.
Action Space : This is the set of actions which can be performed by the agent. Actions can be discrete or continuous
State Space : State space is the set of states describing the Environment. The transition probability describes the probability of state to state. The probability of this depends on the current state and current action.
Reward : This is the feedback from the Environment which helps the agent to learn.
Discount Factor : The objective of the agent is to maximise the cumulative reward the agent receives. While computing the future cumulative rewards, future rewards are discounted using a discount factor for each time step.
MDP : MDPs are markov decision processes. MDPs follow the markov property. This means the current state depends only on the previous state and previous action. MDPs don’t need to know the history of states to understand the next state.

Optimal Monetary Policy using Reinforcement Learning

This paper tries to learn a Central Bank policy ( Interest Rate decision ) using Reinforcement Learning. According to the authors this is the first paper to discuss such an approach. Earlier approaches have used control based or optimal control based ideas to get the optimal Central Bank policy. The advantages of using RL over optimal control methods are two

RL can model asymmetric Central Bank outcomes like ZLB ( Zero lower bound ). More generally RL can model non differentiable loss functions thus increasing the flexibility in the model
RL doesn’t suffer from the curse of dimensionality as RL algorithms can work with incomplete state information and there are model-free approaches

Basic Concepts / terms

Output Gap : The output gap is an economic measure of the difference between the actual output of an economy and its potential output. Potential output is the maximum amount of goods and services an economy can turn out when it is most efficient — that is, at full capacity.
Inflation : Inflation is typically a broad measure, such as the overall increase in prices or the increase in the cost of living in a country. But it can also be more narrowly calculated — for certain goods, such as food, or for services, such as a haircut, for example.
Central Bank : A central bank is a public institution that manages the currency of a country or group of countries and controls the money supply — literally, the amount of money in circulation. The main objective of many central banks is price stability
DSGE Models : Dynamic stochastic general equilibrium (DSGE) models use modern macroeconomic theory to explain and predict co-movements of aggregate time series over the business cycle and to perform policy analysis.

Modelling

The main idea of the paper is to model the central bank as an agent which interacts with the economy to stabilise the economy. Central bank’s action space is the interest rate. The idea here is the economy responds to the Central banks interest rate action and transitions to a different state.

Central Bank : Central bank is modelled as a neural network. Two models are chosen : Linear and a Non Linear model. Output of the Central Bank ( i.e the action ) is the interest rate
Environment / Economy : The economy is the environment. The economy is modelled as a neural network. This neural network is trained on the historical economic data to learn a model for the economy. The economy is modelled with two neural networks : one models the output gap, the other models inflation. These two functions are trained on the historical economy data.

State Space : The observation seen by the agent. It’s the lagged version of the output and the inflation.
Reward : The reward is sum of two terms. first term is difference the agents rate recommendation and the base policy rate ( this is the expected rate to be set by the Central Bank ). second term a risk term which penalises the output gap.both the terms are equally weighted

RL Algorithm used

Data

Quarterly data from 1987 Q3 to 2007 Q2 is used.
Output gap is computed as the percentage deviation of actual GDP from its potential. The latter value is the estimates from the U.S. Congressional Budget Office.
The effective Federal Funds Fate is used as the actual behaviour of the Central Bank.
The data used is not real-time data but revised data but this data is used only to estimate the transition function. The Central Bank reaction is estimated using this transition function.

Discussion of the modelling

Environment / Economy Model : The economy is modelled using a linear and non-linear neural network. The Economy model takes in the lagged versions of output gap, inflation and interest rate and outputs the next value. Multi layer networks achieve better accuracy while modelling thus are better.
Policy : This is the interest rate control function learnt by the agent. The non-linear policy is better than the linear policy.
Historical Counterfactual Analysis : To test the robustness of the learnt policy ( this is a critical step for most RL approaches ), the policy is tested on DSGE models to see how effective they are.
Conclusion : The authors find that the RL agent is able to find a policy which reduces the Central Bank cost quite significantly ( the non-linear model does better ).