Predicting Netflix stock prices using Machine Learning, using Python

An hands on application using SARIMA processes and Linear Regression to predict Netflix stock prices

Piero Paialunga

Published in

Geek Culture

5 min readJan 13, 2022

I’m going to start this article stating the obvious: Netflix is one of the major streaming company of the world.

In Italy there is a saying: “Roma non è stata costruita in un giorno”.
It basically means that “Everything that is great require time to be built”.

In this notebook we will try to predict the Netflix stock prices using some very well known Machine Learning techniques.

It is important though to be very precise about what we want to predict.

In other words, we are going to split our goals in three parts:

Long Term Prediction
Middle Term Prediction
Short Term Prediction

I want to make this introduction as short as possible so I will explain what I mean while we’re doing it.

0. The Dataset + the libraries

The dataset is open source and can be downloaded here. It is not heavy at all and super fast to be downloaded.
These are the libraries I used:

And this is the way you import data and plot them:

P.S. I don’t have any specific interest in using the High Stock Price, which I used. You can use the Low or the Open one, and actually I don’t even think it makes so much of a difference for our case. I believe that the general approach and maybe even the conclusion will be really similar. Anyway, you can easily run this code with another column of the dataset without any kind of problems!

1. Long Term Prediction

By Long Term Prediction I mean an all time prediction based on all the data we have. If you consider all the dataset from the beginning to today you are of course looking for the general behavior of your function. This means that you know you won’t be able to capture the ups and downs of the months or the days, but you want to be able to capture the trend of the stock price.

The tool we are going to use is called Polynomial Regression, and it is a natural extension of Linear Regression. There is a lot to read about it, starting from the more classical books like this one or a more direct approach like the Sklearn documentation (here, here and here).

In a nutshell we are transforming the time axis into a N-dimensional matrix and applying a linear regression model to this matrix. I applied it to the whole dataset with a three parts split:

90% for the training set
5% for the validation set
5% for the test set

I used the training set to test different kinds of models, with different kinds of degrees of the polynomial. Then I used the validation set to pick the one with the lowest Mean Squared Error, and applied the optimal degree to the test set.

This is how to test the different polynomials model:

And this is the result of this process:

Thus, the easier the better :)

The model with lowest MSE is the one with degrees = 3.
This is how it looks like:

We can see that we are capturing the trend pretty well. Of course, it is just a 3-degree polynomial so it doesn’t follow all the ups and downs of the real stock price. The good thing is that it is not supposed to predict the ups and downs as it is considered to be a long term prediction.

As this prediction has been made, we can isolate it and de-trend the original signal to study these ups and downs by themselves:

This is the new signal, which is ready to be considered for the Middle Term Prediction.

2. Middle Term Prediction

To do this kind of prediction, the first thing to do is to average the signal and consider its monthly mean:

Now we want to consider what happened in a certain number of months (let’s say 44) and predict what is going to happen in the next three months.

P.S. We don’t use the entire dataset because it is not going to be helpful to look at what happened in 2010 to understand what is going to happen in 2021… at least not in this model :)

The tool we are going to use in our case is an ARIMA model. Again, we are trying to understand the coefficient of a linear model and its uncertainty which is modeled to be noise (here you can find the documentation).

Let’s consider what happens from 2017:

Given what we have in blue we want to predict what we will see in red.

This is the function we are going to use:

This is the optimization process (it shouldn’t take more than 5 minutes):

This is the value of the AIC (Akaike Information Criterion): we are going to pick the best model out of it

This is a statistic table of our best model:

Here we define the results variable:

And here we plot it:

A few considerations:

The boundaries are pretty large, but it is because we want to have the 95% of confidence, we can decrease the boundaries if we accept the fact that we may be wrong with more than 95% of uncertainty
The model is actually predicting that we will have an uprising trend for the next three months. Yay. :D

3. Short Term Prediction

It’s pretty much the same theory and process, but now we want to predict the next three days given the previous 41 days.

I will very briefly explain the process and highlight the differences:

This is the definition of the new dataset and its plot:

This is the optimization process:

This is the model selection and result generation part:

And this is the final part:

Some considerations:

Short term predictions are much more uncertain and difficult to predict
Still, our model is predicting the first big down and then it is predicting that something will very slightly rise up, but mostly be flat. This is not the truth but it is broadly in line what what is going to happen in reality.

4. Conclusions

The goal of this notebook is just to highlight a concept that my professors highlighted before a million of times: prediction is a matter of time scales.

It is impossible to predict something for an “non definite period of time” because a short term prediction require different data and a different approach with respect to a middle term or long term prediction.

If you liked the article and you want to know more about Machine Learning, or you just want to ask me something you can:

A. Follow me on Linkedin, where I publish all my stories
B. Subscribe to my newsletter. It will keep you updated about new stories and give you the chance to text me to receive all the corrections or doubts you may have.
C. Become a referred member, so you won’t have any “maximum number of stories for the month” and you can read whatever I (and thousands of other Machine Learning and Data Science top writer) write about the newest technology available.Ciao :)