Modelling Emergent Behavior in Financial Agents with Q-Learning: Phase 1

Published in

Bucknell AI & CogSci

6 min readNov 5, 2018

Background

Financial crises are as fascinating as they are destructive. In itself, each crisis gives us a very real taste of bounded rationality, an idea that its founder Herb Simon put it as:

Rationality is bounded because there are limits to our thinking capacity, available information, and time.

**Figure 1.** Panicked crowds gather outside the New York Stock Exchange following the 1929 crash (Library of Congress)

There’s some order to this chaos, with some excellent observations offered here on how and why economic crises have struck in cycles of roughly 18 years for two centuries, bar the two world wars. I’m neither an economic historian nor an expert on trading strategies, so I’m not going to go in too much depth on crises. However, something I have found to be of interest and have explored over the past few weeks are the modelling techniques which failed to predict the crash through the events leading up to it.

At the time of the 2008 recession, models rooted in dynamic stochastic general equilibrium (DSGE) were the primary tools for policy analysis within market economies, but their inadequacies were exposed by the crisis itself (Fagiolo and Roventini 2017), where the criticism was primarily with regard to the principles of agent-based analysis, which overlooked the dual nature of decentralized market economies: they can be viewed as both top-down or bottom-up structures.

Here, I would like to introduce the culture-dish analogy (Tesfatsion 2002). The modeler starts by constructing an “economy” with an initial population of agents, which can represent traders, financial instituions etc., while other agents may represent social and environmental phenomena. I found this lower level paradigm of modelling to be more attractive, since it allows us to observe emergent behavior in a situation such as stock market trading, which has been well explored.

In order to set up my trading agents, I took inspirations from two previous studies: the first (Peck and Yang 2011) modelled investment returns using Markov Decision Processes (MDPs) and treats portfolio managers as agents, while the second (Pendharkar and Cusatis 2018) examines the use of reinforcement learning techniques when trading with a fixed-asset portfolio.

Setting up a single trading agent

**Figure 2.** A UML class diagram outlining the basic structure of a single adaptive trading agent and its associations

The initial attributes of an agent might include type characteristics, internalized behavioral norms, internal modes of behavior (including modes of communication and learning), and internally stored information about itself and other agents. An overview of those attributes is given in Figure 2, and we can see the adaptive agent relies on and trains a Q-learner to achieve its goals. More details on the model-free process of Q-learning can be found here. For our purposes we will refer to the Q-learner by the functionality specified in Figure 2. The historical data to train the adaptive agent was obtained through Yahoo Finance.

**Figure 3.** The ideal parameters for our Q-learning paradigm, obtained through a parameter sweep. Note that the high discount factor helps the agent seek long term rewards, while the impact is the effect on stock pricing from high volume trading (This wiki page gives a surprisingly succinct summary of market impact and Kyle’s coefficient).

Setting up the Environment

I had the opportunity to explore some agent-based modelling tools, and while there are plenty of great ones out there, the most extensively used is NetLogo. Just for some reference, Figure 4 gives an idea of just how high its share in agent-based models is on CoMSES.net

**Figure 4. Platform by agent-based models on CoMSES.net (CoMSES 2018)**

As a consequence of its extensive usage, NetLogo is also well documented (relatively), and has a varied collection of models uploaded by the community.

Python Controller for NetLogo

The end-goal is to simulate multiple agents within a financial market in NetLogo where each agent can have its own Q-learning table to query state-action pairs. In other words, I wanted to simulate multiple adaptive agents as specified in Figure 2. In my case, I wanted a solution that enabled rapid prototyping and scaling through Python, which I used to implement the adaptive agent. Enter NL4Py, a NetLogo controller package for Python. Not only did it allow me to specify parameters and schedule reporters to collect simulation data, it also allowed significant improvement of overhead by having the availability of headless (No GUI) workspace control.

The Model

One model I found to be of particular interest was an older one titled Artificial Financial Markets, which is described on its webpage as:

This is a model of an artificial financial market with heterogeneous boundedly rational agents that are influenced by the sentiment of their most close colleagues regarding the future evolution of the market.

**Figure 5.** A visualization of the Artificial Financial Markets model in NetLogo

While this model does not necessarily have the most exciting visualization, it does offer a lot in how I want to model the market for my purposes when considering the capability of simulating volatility, clustering, bubbles, and crashes. I adopted several ideas from it in the creation of my own network.

**Figure 6.** A tree diagram showing the different classes of networks used.

**Figure 7.** The network structure reflects the level of randomness involved.

As shown in Figure 6, the model I pursued was an exchange network, while the randomness of interaction between agents would be encapsulated by the network structure itself, as shown in Figure 7. For my model, I want the randomness to be medium-high, which would lie somewhere between the Small-world and Random networks in Figure 7.

How exactly is my network going to be? First, we will have to look at the 4 classes of agents provided by NetLogo:

Observer: It doesn’t have a location — you can imagine it as looking out over the world of turtles and patches, and giving them instructions.
Turtles: Mobile agents which can be controlled and visualized
Links: They provide a relationship between different turtles, with the nature of the relationship to be specified by the user.
Patches: Immobile agents which are fixed in place but can otherwise behave as turtles.

In my model, I have the traders represented by turtles and the relationships between one trader and another by links, with each link specifying the propensity of the traders at its ends to adopt each others’ sentiment. Each trader may have a link with any number of individuals between minimum and maximum thresholds.

Going towards Phase 2 …

The end-goal is to draw out some of the emergent behavior one would observe in financial markets, and so what I want to achieve through my network is some agent clustering and ‘bubbles’. At the very start, we briefly saw the idea of bounded rationality, which we can emulate in this case through variance in links between traders, and the propensity of traders themselves to adopt sentiments. However, I want the agents to go beyond a greedy class of algorithms and strive towards long-term rewards. This was achieved through the adaptive agent which relied on Q-learning with a mixture of fundamental and technical indicators to guide it. In Phase 2, I want to transfer the functionality of the adaptive trading agent to the NetLogo model I have using the Python controller for NetLogo, which is bound to draw out some interesting things…

References

Simon H.A. (1990) Bounded Rationality. In: Eatwell J., Milgate M., Newman P. (eds) Utility and Probability. The New Palgrave. Palgrave Macmillan, London

Colander, David et al. (2008). “Beyond DSGE models:toward an empirically based macroeconomics.” In: American Economic Review 98.2, pp. 236–40.

Fagiolo, Giorgio and Andrea Roventini (2017). “Macroeconomic Policy in DSGE and Agent-Based Models Redux: New Developments and Challenges Ahead.” In: Journal of Artificial Societies and Social Simulation 20.1, p. 1. issn: 1460–7425. doi:10.18564/jasss.3280. url: http://jasss.soc.surrey.ac.uk/20/1/1.html

Gunaratne, Chathika and Ivan Garibay (2018). “NL4Py: Agent-Based Modeling in Python with Parallelizable NetLogo Workspaces.” In: arXiv preprint arXiv:1808.03292.

Peck, James and Huanxing Yang (2011). “Investment Cycles, Strategic Delay, And Self-Reversing Cascades.” In: International Economic Review 52.1, pp. 259–280.

Pendharkar, Parag C and Patrick Cusatis (2018). “Trading financial indices with reinforcement learning agents.” In: Expert Systems with Applications 103, pp. 1–13.

Sklar, Elizabeth (2007). Netlogo, a multi-agent simulation environment