Forecasting & Anomaly Detection of Starbucks Stock Prices using TensorFlow and ADTK

Let’s predict the stock prices of the world’s largest coffee chain ☕️

Aziz Budiman

Published in

Data And Beyond

8 min readJul 1, 2023

Introduction

Starbucks has grown from a local coffee retail outlet at Seattle to a global franchise over the last 40 years. Founded by Jerry Baldwin, Zev Siegl, and Gordon Bowker in 1971, the merchandise has skyrocketed at 80 countries worldwide with over 15,000 outlets located within the United States.

Methodology

For the forecasting portion, we will apply shallow neural network models by using a simple neural network at the first instance followed by a recurrent neural network (RNN) called the Gated Recurrent Unit (GRU). The model performance will be evaluated by its mean squared error (MSE) and mean absolute error (MAE).

Code Development

As per usual practice, we will import the necessary libraries for data exploration and transformation. The stock prices of Starbucks will be extracted using the Yahoo! Finance Python library yfinance . For the model development portion, we will import Keras module from TensorFlow library to construct both the simple Neural Network and GRU.

#Import Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import yfinance as yf
import tensorflow as tf
from tensorflow import keras
from keras.layers import Sequential,Dense,Dropout,GRU
from keras.metrics import mean_squared_error, mean_absolute_error
from keras.callbacks import EarlyStopping, LearningRateScheduler


# Import Starbucks stock prices from Yahoo! Finance
data = yf.download('SBUX')
data

Starbucks stock price data as of 29/06/2023. Image from author.

The initial dataset displays the historical stock prices of Starbucks from 1992 to present day. For time-series forecasting, we will only require the Close stock prices for that business day.

Exploratory Data Analysis and Visualisation

From the time-series chart, we observe that the Starbucks stock price has grown from a few cents in the early 90s to over a hundred dollars since 2018. After the financial crisis in 2008, the market value started to grow rapidly with its global expansion to 80 countries and recently amassed a record high revenue of $8.4 billion in the final quarter of fiscal year 2022.

Forecasting using Simple Neural Network

The data is split from beginning of 2020. We will be using the test data to generate predictive models and assess the performance based on the model mean squared error (MSE) and mean absolute error (MAE).

After splitting the data, we will create a window dataset to reshape the test day to a 30-day window. This will be the input dataset to be fitted into the simple neural network model. For the model, the single Dense layer will consist of 32 neurons and the parameters for the model compilation will be Huber as the loss function, Stochastic Gradient Descent as the optimizer and the MAE as the metric to be evaluated.

#Split the data into training and test
split_time = '2020-01-01'
train_data = data.loc[:split_time]
test_data = data.loc[split_time:]


# Reshape the dataset based on 30-day window
def window_dataset(series, window_size, batch_size=32,
                   shuffle_buffer=1000):
  dataset = tf.data.Dataset.from_tensor_slices(series)
  dataset = dataset.window(window_size + 1, shift=1, drop_remainder=True)
  dataset = dataset.flat_map(lambda window: window.batch(window_size + 1))
  dataset = dataset.shuffle(shuffle_buffer)
  dataset = dataset.map(lambda window: (window[:-1], window[-1]))
  dataset = dataset.batch(batch_size).prefetch(1)
  return dataset

# Setup the Keras Backend and Random Seeds
keras.backend.clear_session()
tf.random.set_seed(42)
np.random.seed(42)


window_size=30
train_set = window_dataset(train_close_data, window_size)
valid_set = window_dataset(valid_close_data, window_size)

#Model building
def simple_NN(units):
      model = Sequential()
      #Input Layer
      model.add(Dense(1, input_shape=[window_size])
      model.compile(loss=keras.losses.Huber(),
              optimizer=keras.optimizers.SGD(learning_rate=1e-5, 
              momentum=0.9),
              metrics=["mae"])
     return model

nn_model = simple_NN(32)

#Model fit with early stopping
def fit_NN(model):
    early_stopping = EarlyStopping(patience=10, 
                  restore_best_weights=True)
    lr_schedule = LearningRateScheduler(
                      lambda epoch: 1e-6 * 10**(epoch /30))
    history = model.fit(train_set, epochs=200,
                    validation_data=valid_set,
                    callbacks=[early_stopping,lr_schedule])
   return history

fit_NN(nn_model)

Model MAE and Model Loss of Simple NN. Image from author.

The loss curves for both the MAE and MSE indicates that the learning rate and early stopping parameters complement well with the simple Neural Network model. The optimal value is found at roughly the 58th epoch during the model training period.

Forecast Model (Simple Neural Network)

Gated Recurrent Unit (GRU)

Unlike the simple Neural Network, GRU is a type of Recurrent Neural Networks (RNNs) that resolves the problem of vanishing gradients that occurs in standard RNNs. It employs a gated mechanism to manage memorisation processing. For a detailed explanation on GRU, you may refer the article below by Simeon Kostadinov.

Understanding GRU Networks

In this article, I will try to give a fairly simple and understandable explanation of one really fascinating type of…

towardsdatascience.com

Similar to building the simple neural network, we create a dataset based on a 30-day window. For this model, we will normalize the data using MinMaxScaler from the sklearn library in order to boost the overall model performance. We will also split the data into 60% training and 40% test set.

del closedf['Date']
# Scale the data using the Min Max Scaler 
scaler=MinMaxScaler(feature_range=(0,1))
closedf=scaler.fit_transform(np.array(closedf).reshape(-1,1))
print(closedf.shape)

# Split the training and test data from the training size of the data
training_size=int(len(closedf)*0.60)
test_size=len(closedf)-training_size
train_data,test_data=closedf[0:training_size,:],
                     closedf[training_size:len(closedf),:1]


#Convert an array of values into a dataset matrix
def create_dataset(dataset : np.array,time_step=1):
    dataX, dataY = [], []
    for i in range(len(dataset)-time_step-1):
        a = dataset[i:(i+time_step), 0]
        dataX.append(a)
        dataY.append(dataset[i + time_step, 0])
    return np.array(dataX), np.array(dataY)

time_step = 30

X_train, y_train = create_dataset(train_data, time_step)
X_test, y_test = create_dataset(test_data, time_step)

## Convert to 3-dimensional space
X_train =X_train.reshape(X_train.shape[0],X_train.shape[1] , 1)
X_test = X_test.reshape(X_test.shape[0],X_test.shape[1] , 1)

print("X_train: ", X_train.shape)
print("X_test: ", X_test.shape)

Training and test data reshaped into 3-dimensions for GRU model fit. Image from author.

The GRU model will be built using 32 neurons with 3 hidden layers. All the hidden layers will contain a Dropout layer of 0.2 to combat overfitting with the test data.The output layer will have 1 neuron to predict the normalized Starbucks stock price. Finally, for the loss function, we will use the mean_square_error and Adam gradient descent as the optimizer.

tf.keras.backend.clear_session()

def gru_model(units) -> int:
    model=Sequential()
    #First GRU Layer 
    model.add(GRU(units=units,return_sequences=True,
                input_shape=(time_step,1)))
    model.add(Dropout(0.20))
    #Second GRU Layer
    model.add(GRU(units=units,return_sequences=True))
    model.add(Dropout(0.20))
    #Third GRU Layer with Dropout Regularization
    model.add(GRU(units=units))
    model.add(Dropout(0.20))
    #Output Layer
    model.add(Dense(1))
    #Compile Model
    model.compile(loss='mean_squared_error',optimizer='adam',
                metrics=['mae'])
    return model

model_gru = gru(32)

Model MAE and Model Loss using GRU. Image from author.

From the model training history line chart for GRU, we observe that the MAE and Loss drops close to zero after 40 epochs. Employing the same early stopping and learning rate as the simple Neural Network, the optimal number of epoch for this deep learning model is around 132.

Forecast Model (GRU)

Comparison of Model Performance

Performance Metric of Simple NN and GRU. Image from author.

A quick comparison in the model performance metrics of the two deep learning models show that GRU performed significantly better than the simple Neural Network.

Anomaly Detection

The aim of detecting anomalies in a time-series chart is to discover abnormal trends at different time frames. The adtk library was created in 2019 by Arundo based on the same mechanism as the scikit-learn library for building ML models. For this project, we will be using the Open Starbucks stock prices and identify meaningful historical trends to detect anomalies such as outlier data points, spike levels and volatility shifts. You may refer to the link below for the documentation on this library.

Anomaly Detection Toolkit (ADTK) - ADTK 0.6.2 documentation

As the nature of anomaly varies over different cases, a model may not work universally for all anomaly detection…

adtk.readthedocs.io

Threshold Anomaly Detector

For outlier detection, the ThresholdAD is employed to identify traces of irregular data points within the time-series chart. We can set a specific range of values to derive these outliers and highlight the points using the anomaly_tag argument inside the plotting parameters.

#Install and Import libraries
!pip install adtk
from adtk.data import validate_series
from adtk.visualization import plot
from adtk.detector import ThresholdAD, PersistAD, VolatilityShiftAD

#Use the Open stock prices and set the index as the Date
data = data[[‘Date’,’Open’]]
data = data.set_index(‘Date’)
data = data.reset_index()

def plot_threshold(data):
   threshold_detector = ThresholdAD(low=1, high=110)
   anomalies = threshold_detector.detect(data)  
   plot(data, anomaly=anomalies, anomaly_color='red', anomaly_tag='marker')
   plt.title('Threshold Anomaly Detection')
   plt.show()


plot_threshold(data)

Threshold Anomaly Detector. Image from author.

Persist Anomaly Detector

For this anomaly detector, we aim to trace any significant decrease in stock prices over a long period of time. Using a 30-day window, observe that there were price drop during the global pandemic in early 2020 and then towards the end of 2021 where sales are impacted due to the COVID-19 restrictions.

def plot_persist(data):
    persist_detector = PersistAD(c=5, side='negative', window=30)
    anomalies = persist_detector.fit_detect(open_data)
    plot(open_data, anomaly=anomalies, anomaly_color='red')
    plt.title('Negative Persist Anomaly Detector')
    plt.show()

plot_persist(data)

Negative Persist Anomaly Detector. Image from author.

Source code for this project can be found in this GitHub gist.

Conclusion

The application of deep learning models to forecast stock prices can be more efficient and accurate than using traditional methods like ARIMA and GARCH. Note that the use of regularization is required to prevent overfitting and also using early stopping during training to ensure that the model converges to its optimal value.

In anomaly detection, we are able identify outliers in stock market trends affected by external events such as financial crises and global pandemic. It will provide forecast analysts a different perspective in interpreting the trends and make informed decisions on investing other types of assets.

If you like enjoyed reading my content and found this valuable:

Give this article a clap 👏 and follow me Aziz Budiman for stories and blogs for all things Data, Artificial Intelligence, and Coding Tips.
Feedback and comments are welcome as this is a platform where we can learn from one another.
Show your support and buy me a coffee perhaps? It’s okay if are unable to do so at this moment.