Graph networks are easy on the eye, aren’t they?

Exploring the London Stock Exchange using Graph Networks in Neo4j — Part 1

A practical guide, using Graph Databases, Python and Docker

Applied Data Science
6 min readFeb 24, 2020

--

Back in December I attended an event called ‘Network Science in Financial Service’ at the Alan Turing Institute. I found the approach of using graph networks to visualise relationships between stocks very interesting, so thought I’d give it a shot.

The Data

For this project I used stock prices data from the London Stock Exchange, but there’s two main caveats I should put out there:

  • Data only covers a 3 month period, between 11 November 2019 and 7 February 2020.
  • There’s only data for 365 stocks, out of the 3,000+ stocks in the LSE.

This was basically down to the difficulty to obtain the data. I thought Yahoo Finance was the go-to source for this kind of data, apparently it’s not as open anymore. I tried sources like Alpha Vantage and built a pipeline for it, but in the end I realised that the price data for it had errors (maybe it’s fine for US exchanges?). In the end, I used IEX Cloud’s API which worked really well, however, their free tier limits meant my dataset has those limitations. If anyone does know how to get access to historical price data for stocks in the London Stock Exchange, please let me know!

The Code

Because we all like easy set-ups, I built this project using Docker Compose to manage and connect the different containers. They look more or less like this:

Why isn’t the Docker Compose logo related to music?
docker-compose.ymlversion : '3'  services:    neodb:      image: 'neo4j'    pgdb:      image: 'postgres'    app:

build:
context: .

Feel free to explore the code or run your own version of the code from this repository here.

Processing

I worked with the data using the process described below:

  1. Chose a random list of stocks and pulled in the price data for as many as I could, which resulted in the previously mentioned 365 stocks.
  2. Built a script which pivots the price data and calculates the daily percentage change in price.
  3. Calculated the pairwise correlations between the stocks’ price changes and obtained the unique stock combinations.

These previous steps were all executed using Pandas and Postgres. The rationale for using percentage change rather than the price of the stocks in explained in this amazing blog post by QuantDare.

Finally, I used the py2neo library to take my table of correlations and insert them into the Neo4j database.

Analysis

Having done the previous work, I could now use Neo4j interface (available on localhost:7474) to explore the data that was just loaded into it.

  • Exploring highly correlated stocks

As a first action, I wanted to explore highly correlated stocks. Here I would expect to see connections between stocks from similar or related industries or maybe from similar regions, or both?

Cypher is a very visual language

This Cypher query yields the following graph network. Most highly correlated stocks are connected in pairs or triples (trios?), there’s only two ‘clusters’ that are more densely connected.

Highly correlated stocks

Exploring the clusters, there’s some interesting, but perhaps not very surprising, pairings. Not surprising because we’d expect stocks from companies in the same industry to be exposed to the same systemic risk. For example, a spike in the price of oil should have a similar effect on two oil companies.

Fresnillo and Polymetal

Fresnillo and Polymetal

While Fresnillo is a Mexican company and Polymetal is Russian, they’re both precious metal mining companies

Kosmos Energy and Tullow Oil

Kosmos Energy and Tullow Oil

Both are oil extraction and production companies, Kosmos is American and Tullow is Irish.

These pairings don’t really show the usefulness Neo4j provides, as we could have easily found them in a SQL query, but what if we explore one of the bigger groups?

Large cluster 1

Stocks related to industrial / engineering industries?

WEIR (The Weir Group) is one of the ‘central’ nodes in this cluster. It is an engineering company which focuses on mining, oil and gas and power markets. It’s got relationships (correlation greater than 0.6) with:

  • AGK (Aggreko), who supply power generation and temperature control equipment;
  • AAL (Anglo American), mining company and largest supplier of platinum;
  • CRH (CRH), manufacture and supply of materials for construction industry;
  • SXS (Spectris), supplier of precision instruments and controls;
  • EVR (Evraz), steel making and mining company;
  • MNDI (Mondi), packaging and paper group;
  • HL (Hargreaves Lansdown), financial services selling funds and shares to retail investors.

Now we can start understanding (hypothesising about) dynamics in price movements. This cluster (with maybe the exception of HL), it seems, is related to a general engineering sector. With Weir Group as a central node, could it be that companies connected to it are large customers (or companies similar to their customers), and thus their performance directly impacts Weir’s?

Large Cluster 2

Stocks in the construction/property industry?

The central node in this cluster is FORT (Forterra), who manufacture building products for the construction industry.

  • Its highest correlation is with LAND (Land Securities Group) who are a commercial property development and investment company. Land Securities are in turn related to WTB (Whitbread), who are a hotel and restaurant company. Could it be that LAND invest in Whitbread or businesses similar to it, and are exposed to similar risks?
  • It is also connected to TEP (Telecom Plus), who are a multi-utility supplier to residences and businesses. They are then connected to DFS, who are a furniture supplier. Could this correlation reflect their similar dependence on the construction of new housing and commercial outlets?

Maybe Forterra is related to the previous companies because they all benefit from new housing and commercial developments. Where they provide the materials used to build, Land Securities Group provide the capital, Telecom supply utilities to the final users and DFS the furniture? Maybe it’s a bit of a push to think it like this, but it’s still interesting the observe the correlations between these related industries.

If anyone knows the answer to any of these questions do let me know if I got them right or not!

Graph networks are a great way to explore multiple relationships simultaneously and Neo4j makes it easy to insert the data and query it. It also has a set of algorithms, such as PageRank and the Louvain community detection algorithm, which I will explore in Part 2 of this blog.

Applied Data Science Partners is a London based consultancy that implements end-to-end data science solutions for businesses, delivering measurable value. If you’re looking to do more with your data, please get in touch via our website. Follow us on LinkedIn for more AI and data science stories!

--

--