Pattern-Driven Insights: Visualize Stock Volume Similarity with Neo4j and Power BI

Bryant Avey

Follow

Published in

CodeX

9 min readAug 9, 2021

--

Uncover and visualize hidden patterns for Pattern-Driven Insights

This pattern-driven visualization technique was presented at the Power BI Bootcamp in July 2021. The Bootcamp session walked through the process of creating Power Query templates to import data from Neo4j and then visually tell the story pattern that Neo4j surfaced. For this article, we’ll focus on visualizing the time series data patterns created from Neo4j’s Graph Data Science Algorithms to score and categorize volume similarity between stocks.

If you’re interested in how to create the Power Query templates that can be used to automatically schedule Power BI online data refreshes from Neo4j graphs, refer to the article Importing Neo4j Graph Data with Power BI. The Power Query scripts are also available as a GitHub Gist. The Power Query scripts utilize a technique I developed to wrap multiple Power Query steps into a single applied step, while creating double underscore “dunder” input variable steps within the Power Query Script.

Example of dunder variable steps in Power Query with wrapped steps combining multiple applied steps into a single step.

If you need more information on this technique, let me know, as I won’t be covering that in this article.

Before we begin, let’s be clear that this article is not offering stock advice or providing stock tips in any way. Instead, it’s showing how Neo4j and Power BI can be used together, in a general way, to gain pattern-driven insight.

Neo4j Graph Data

We’ve all heard the hype about there being “hidden patterns” in your data. But how do you discover these hidden patterns? One of the easiest ways to find patterns is to use a property graph database. Specifically, we’ll use the leading graph database provider, Neo4j; they invented the property graph database.

Interested in the differences between the top three native graph databases? Read my article: Labeled vs Typed Property Graphs — All Graph Databases are not the same.

For data, we’ve pulled in daily stock price history data from TD Ameritrade. The data is organized as a series of linked nodes with a [:NEXT_PERIOD] relationship between each consecutive price history node. Data has been loaded from 1991–01–02 through the current date. A full year of Microsoft (MSFT) price history looks like this in the graph:

Price time series data for MSFT showing 1 years of stock prices with the [:NEXT_PERIOD] relationship

Each [:NEXT_PERIOD] relationship has a property for the price gap between the two trading periods:

priceGap of 3 cents on the [:NEXT_PERIOD] relationship

Here we can see that there’s a 3-cent price gap on the relationship property between the 2019–04–05 price node, where MSFT opened at $119.39, and the 2019–04–04 price node where it closed at $119.36. So, the price gap between April 4th and April 5th was 3 cents.

Each price node contains the daily period high, low, open, close, and volume data properties along with the ticker symbol and a timestamp.

For this example, we want to find similarity between the trading volume of stocks to see if these similarity patterns could prove valuable in identifying tops, bottoms, or significant support or resistance areas in stock prices.

To do this we’ve run three of Neo4j’s graph data science algorithms to identify hidden similarity patterns between stock volumes of daily stock prices. The first graph data science algorithm to run against the price data is the K-Nearest Neighbor algorithm or KNN. KNN is a type of supervised machine learning algorithm used for classification. The KNN algorithm uses regression and classification to predict the similarity values between the volume of the trading periods and the various stocks in our data set. K-Nearest Neighbor creates a [:SIMILAR_KNN_VOLUME] relationship in the graph between the price nodes where it finds similarity in the volume. As it creates the relationship between stock nodes, it also records a “volume similarity score” property on each relationship, identifying the volume similarity percent between the two nodes. A score of 1 is 100% similar volume, meaning the stock volume traded on each of the two nodes is identical.

Graph showing the property: volumeSimilarityScore between Microsoft and Tesla.

Next, we run a Louvain algorithm to detect communities or clusters of volume similarities between stock price nodes. Theoretically, based on the relative density of the relationships generated from KNN algorithm, the Louvain method results in the best possible groupings of nodes in a network. Louvain runs against the “volume similarity score” and analyzes the [:SIMILAR_KNN_VOLUME] relationships generated by KNN and groups them into communities. The generated community id is written back to the graph as a property called Volume Community.

Finally, we run the Label Propagation algorithm or LPA. LPA is a semi-supervised machine learning algorithm that assigns labels or tags to previously unlabeled or untagged data points. Using the complex network created from KNN, LPA identifies significant volume clusters. The LPA algorithm writes the “similarity label” property back to each price node in the graph. These LPA clusters are hierarchically structured within the Louvain communities. The result is a Volume Community of similarity labels and a similarity volume relationship with a similarity score property.

The Neo4j Enterprise version has an entire library of graph data science algorithms. We only used three of over fifty algorithms available in the library for this example.

Categories of Neo4j Graph Data Science Algorithms

After running the KNN, Louvain, and LPA algorithms, we have a graph that contains similar volume between the same stocks on different days as well as similarity between different stocks on various days. Here we see a portion of the graph showing similar volume between Microsoft, Facebook, Nvidia, Advanced Micro Devices, Tesla, Disney, and others.

Graph showing example of [:SIMILAR_KNN_VOLUME] relationships between stock prices and stocks.

This next graph is an overview of all the stocks contained in the graph along with all the Similar KNN Volume relationships. As you can see, the Neo4j Graph Data Science Algorithms created a rich environment of new volume patterns between stocks. This is the environment we’ll be exploring in Power BI by plotting these relationships, communities, and similarity scores on candlestick and time series charts to gain Pattern-Driven Insights:

Graph of all Stock Symbols in the graph showing the complex mix of volume similarity relationship created between stocks.

Power BI Visualization for Pattern-Driven Insight

Power BI allows us to visualize Neo4j’s graph data in hundreds of ways with their huge library of existing visualizations. Using R and Python visualization in Power BI, extends the visualization capability of Power BI to thousands of visualization options.

After pulling the data into Power BI, we need to check for communities of volume similarity on a timeline. We also want to visualize any significant activity in the stock market to see if there are connecting patterns. To start, we’ll look at a High Low comparison chart showing stock price highs and lows on a timeline. This will help us see areas where all the stocks were collectively hitting new lows or new highs.

In addition, we want to see a corresponding visualization of the Volume Communities to see if there are significant communities of volume similarity during severe pricing highs and lows. The following Power BI chart shows how this all comes together. We’ve highlighted a specific period of time on which to focus: February through April of 2020.

Power BI report page showing High Low comparison of stock prices with volume communities on a timeline

Here we can see that around March 18, 2020, all the stocks were collectively hitting new lows. We also see that there’s a spike cluster of Volume Communities based on the average trading volume. This looks like a good place to drill in and see what our volume similarity indicators have done for us.

Candlestick pricing chart for Facebook stock with significant volume similarity scores and distributions.

The Power BI chart above shows a candlestick chart for Facebook along with a distribution chart showing Volume Communities and Similarity labels. The distributions allow us to focus in on significant clusters of communities and labels, while the Similarity Score from the KNN Relationships allows us to focus on significant similarity events.

Facebook candlestick chart with volume similarity score correlation.

We see that on February 6, 2020, Facebook has an indecision candlestick and a significant spike in volume similarity. This was just 8 days before a major price decrease. We see this same pattern on February 19th, just before the 20-day collapse where Facebook fell $72. Then halfway through the fall, Facebook tested between the $185 and $195 price level. Here we see another volume similarity spike indicating the price may fail support.

Facebook candlestick chart with volume similarity score correlation at failed support line.

March 25th shows another similarity spike indicating a potential support or another resistance price:

Facebook candlestick chart with volume similarity score correlation indicating support levels

Then on both April 9th and April 13th (a Friday and Monday) we have 2 more spikes in similarity volume:

Facebook candlestick chart with volume similarity score correlation confirming support level.

This shows that the previous spike on March 25th was indeed a new support level.

If we shift and look at Adobe instead, we can see that there’s an even tighter correlation between the similarity score and the prices.

Adobe candlestick chart with volume similarity score correlation. Shows trend bottom with double inverted hammer and major support confirmation.

March 16th shows a similarity score spike at the very bottom of Adobe’s price fall which also corresponds with a significant candlestick “double inverted hammer” pattern. We also see that the major similarity spike on March 23rd corresponds to a very significant support line around the $290 price level. In this case, the volume similarity seems to have pinpointed the bottom of the bear trend.

This begs the question: “How are things statistically correlated?” To gain a better understanding of how the various communities, labels, scores, and pricing information is correlated, we need to plot them on a timeline and filter specific stocks for a timeline and look at correlations.

Here we can see for the same time period, Adobe has slight negative correlations between the similarity score and closing price, as well as a strong negative correlation between the similarity labels and the closing price. There is also a strong correlation between the label and the price gap.

Looking at Facebook, we see a completely different picture. We have very significant negative correlations between the volume similarity score and average volume. There are also significant positive correlations between the close price and similarity scores.

To further visualize how and where significant clusters of volume communities and similarity labels correlate between various groups of stocks, it’s helpful to look at various distribution clusters to see which groups of stocks tend to be affected similarly by the volume similarity.

Example of Python distribution charts and correlations for volume similarity communities, labels, and volume.

Here we have several Python visuals in Power BI helping us to see the distribution slices from different angles and perspectives. By isolating out the volume communities and similarity label communities, we can eliminate the “noise” and focus in on areas where there are tight correlations to discover which stocks are trending together inside our communities of volume similarity.

Native Power BI scatter chart highlighting ability to filter and zoom in to clusters to eliminate the “noise” and identify stocks that are grouped together into similarity communities.

Here, we’ve focused on Similarity Labels and Volume Communities with the highest concentration of stock volume similarity. This allows us to discover specific stocks that tend to group together allowing us to dive more deeply into analyzing and creating indicators that could be used for making better stock trading decisions.

Conclusion

Because relational database and table-based data structures don’t allow for pattern discovery where there are unknown patterns, the use of Neo4j’s integrated machine learning and graph data science algorithms, makes uncovering hidden patterns quite easy. Uncovering hidden patterns can lead to better trading decisions and higher profits. By combining Neo4j with Power BI, we can view data in a time series and uncover volume similarity patterns. This amplifies the value of our stock market analysis effort by giving us time-based volume similarity.

Hopefully, we’ve helped you see how using Neo4j for pattern discovery and Power BI for visualization creates an immensely powerful and flexible analytics platform.

Neo4j + Power BI = Pattern-Driven Insight

Pattern-Driven Insights: Visualize Stock Volume Similarity with Neo4j and Power BI

Neo4j Graph Data

Power BI Visualization for Pattern-Driven Insight

Conclusion

Neo4j + Power BI = Pattern-Driven Insight

Written by Bryant Avey