Using Machine Learning to programmatically determine Stock Support and Resistance Levels

judopro
7 min readDec 22, 2019

--

In this article we will talk about locating support and resistance levels of stock prices (or any price data) using unsupervised classification technique (K-means). I was looking for something that will calculate support and resistance levels on-the-fly with real time data to use it as part of my day trading strategy. I saw some people taking a hit on the problem, such as smoothing in the chart to find similar points, another one was describing an algo to find similar prices and then applying some delta depending on the size of the stock price to group them.

My main issue with other approaches was basically they were instructions the developer gave to the computer. Basically you looked at your data, and you have found a way to determine the support and resistance levels, such as smoothing the numbers and you are instructing the computer to follow it and compute.

On the other hand, Machine learning is all about machine looking at the data and telling us the relationships exist. That’s exactly what I wanted to do. I didn’t want to describe how to define a support and resistance levels. But then how the machine would know what to classify?

Let’s take a step back and talk about what Support and Resistance is. Support/Resistance can be defined as certain levels of price points at which the price action of a stock may stop and/or reverse due to a larger number of interested investors in those price levels. For example, if a stocks price was dropping and then hover around a low price before starting going up again; we would consider that low price it hovered around as a support level. Resistance is same thing in reverse: In an up market, if the price hits a certain high and then not penetrate it (either stop at that level or go down), that that high price is the resistance.

In other words, if I were to find what key levels of prices the stock moves in between of and consolidating them down to a few levels is exactly what we are looking for. This sounded like a perfect description of an unsupervised classification problem. Unsupervised classification is computer looking at data (without given any labels or prior data hence unsupervised) and finding similarities among records and segmenting them in different clusters — classification.

K-Means is a very popular unsupervised machine learning algorithm. In essence, it takes your data, try to create K number of groups that you define (we will come to that later), and group the data based on its proximity to the center of each of the K groups/clusters.

https://www.mathworks.com/help/examples/stats/win64/PartitionDataIntoTwoClustersExample_02.png

If you want to learn more about how K-Means works, I recommend reading https://towardsdatascience.com/understanding-k-means-clustering-in-machine-learning-6a6e67336aa1

Now that we got basics covered, let’s move on to implementation. All of the code is on my GitHub by the way. Let’s first read the data using mpl_finance python package which uses Yahoo api. I am interested in day trading so I am looking at a 1 day period with 1 minute bars but you can look at daily bar and and look for long term support/resistance, the idea is still the same.

Last line is a function that we use to plot the stock data so we know what we are working with. It’s creating 2 subplots, one for price and one for volume, using the EST since I am working with Nasdaq/NYSE listed securities and the trading times are 09:30 EST — 16:00 EST.

If I plot the first 60 minutes of the opening of CBIO on 12/19/2019, We get the following chart. I have actually traded CBIO on that day and was watching it live so I know we rendered the chart correct to the minute 🙂

Now before we move on to the next section: At the top we mentioned briefly about choosing K as the number of clusters in our classification. So taking this stock data and wanting to create clusters, how do we choose how many clusters to ask for? Fortunately, there is a straightforward solution to this problem.

In general it is recommended that K is a number between 1 to 10. That’s just a range of number that the algorithms best work for. If you have a classification of 1,000 unique classes or more for example, you may want to look at other alternatives (such as Neural Networks etc). But since we are looking at a chart with less than 100 data points (for the first trading hour or so), or at most a few hundred for the entire trading day, It is highly likely that the number of groups we need is less than 10 or so. And there is a scientific method of determining that number too actually.

What we can do is, create K-clusters with varying K from 1 to 10, and for each model we compute the cumulative coherence of each cluster which is named as inertia of the cluster. Then we chose the K so that the change in inertia is getting minimal, elbow method.

Inertia is internally computed this way: Let’s say we have 3 as our K so we have 3 clusters. For each cluster, the algorithm computes the center (mean) of its members, then computes the distance of each member to this center (mean). The total of these distances added is called inertia. So the lower the inertia, the better the coherence of the cluster we created. It should go without saying that an inertia=0 means all points are zero distance to its cluster center which means either there is no variance in your data, or you have K set to number of data points in which case you are over-fitting your data. If you have inertia so big, you may be under-fitting your data and may want to increase the K.

If we plot the K-Inertia graph we can see that after the elbow point at K=3 the change in inertia gets minimal so that, its getting close to convergence. Hence K is chosen as 3 in this example.

Courtesy of GeeksforGeeks

To programmatically chose this, there are couple of ways you can go. The simplest would be to set a delta threshold in between inertias at each stop so that if the progression of K from i to i+1 is not giving us at least that delta of improvement in inertia, we will stop and return i as the value of K.

Now that we got the optimum K and the clusters it created (optimum_clusters), we can access the centers of the clusters using cluster_centers_ property of the optimum_clusters.

I wanted to create 2 separate K-means one for lows and one for highs. To find supports we use lows of the candles, to find resistances we use highs of the candles.

and if we print the sorted low_centers and high_centers, we get:

Lows: [[5.80], [5.95], [6.10], [6.22] [6.44], [6.58]]
Highs: [[5.90],[6.12], [6.30], [6.56], [6.70]]

How do we know if these are good or not? Well, we can plot them in the chart and see for ourselves. All we have to do is add these numbers to the plot we used for the price. Here I am plotting the lowest 2 support points and highest resistance point only to keep the clutter to less… We are also accessing the highs in reverse since they are sorted in ascending order.

Chart including calculated support and resistance levels

Seems like it did a good job. How about different time frames, 15 minutes, 30 min? How about the whole trading day?

15 Minute Chart

15 minute chart of opening, determining immediate key levels at 5.83, 5.98 and 6.52

30 Minute Chart:

in 30 minute, our key levels didn’t change much.

Full day chart:

Entire day chart

Here I plotted one more support line since there are more data points, as you can see our key levels are now 5.82, 6.22 and 6.44 and resistance at 7.80ish.

This is it. With not much line of coding, and not much prior training or writing hardcoded definitions/algorithms, we let the machine learning models figure out support and resistance levels for us. The full code can be found on my GitHub. Thanks for reading.

--

--