Time Series Analysis and Forecasting: From Data Collection to Deployment using Flask and Docker (Part-II)
A beginner’s hands-on guideline for an end-to-end machine learning project on Time-Series Forecasting.
In part 1, you have learned how to analyze a problem, collect the data for it, and do the important visualizations to understand the data.
In the 2nd part, we are going to learn about the following(and more fun) things.
- Model creation and training
- Model Evaluation, Tuning, and improvement
- API creation with FLASK
- Dockerizing Flask application
- Learning Outcomes
1. Model creation and training
There are a lot of options for a time series model. ARIMA, and SARIMAX, are the most popular choices for AutoRegressive models, and FBProphet(or Prophet only in newer versions) are based on additive models but also have good support for non-linear trends via seasonality either custom or built-in, and holiday support.
We’ll be using fbprophet in this tutorial as it is simple and easy to use as compared to autoregressive models such as ARIMA or SARIMAX, which requires a lot of hyperparameter tuning and statistical testing before fitting the model, and FBProphet works well with the data that has seasonality effects, and strong sessions of historical data(Documentation Link).
Installation
Please note that FBProphet is the older name, and Prophet is the newer name of the library. If you are install fbprophet, then make sure to use fbprophet everywhere, and if you are installing prophet, then make sure to use prophet everywhere, or else you’ll waste a day or two in debugging like me.
We will be using fbprophet in this article.
If you are using Windows OS, I would highly highly recommend using Conda-Forge to install instead of PIP.
$ conda activate time_series
(time_series) $ conda install -c conda-forge fbprophet
If you are on Linux based system or macOS, you can use pip to install fbprophet.
$ conda activate time_series
(time_series) $ pip install fbprophet
You can confirm the installation via a simple REPL session.
(time_series) $ python
>>> import fbprophet
If this is working fine, this means that your installation is perfectly fine. Let’s now create a new jupyter notebook in the notebooks directory named training.ipynb
.
Pre-Processing
FBProphet has a weird rule(is it weird? or I am the only one who feels like it?), that is your dataframe must have 2 columns named “ds” for date stamp, and “y” for the value at that specific date stamp. And “ds” should be a column(not index).
SO let’s write some code to get the dataframe in the desired format.
# renaming for fbprophet
df.rename_axis('ds', inplace=True)
df.rename(columns={'US dollar':'y'}, inplace=True)
df.reset_index(inplace=True)df.head()
Let’s create a simple model with default parameters, and fit it.
from fbprophet import Prophet
prophet_model = Prophet()
prophet_model.fit(df)
It is not necessary that your results are going to be exactly similar to mine due to the nature of the machine learning algorithms. If we want exactly same performance, we need to specify some random seeds before hand.
We can check how the model is performing in the future by making a dataframe of future dates. FBProphet has this option built-in.
future_dataset= prophet_model.make_future_dataframe(periods=15, freq='y') # Next 15 YEARS OF DATA
future_dataset.tail()
We can now perform the predictions using our model.
pred = prophet_model.predict(future_dataset)
pred[['ds','yhat', 'yhat_lower', 'yhat_upper']].head() # only useful columns
There are a lot of columns in the predictions by FBProphet, but we are only going to deal with the ones that are useful to us right now. Here are we selecting 4 columns, which are the predicted value(yhat
), the upper range of the predicted value(yhat_upper
), and the lower range of the predicted values(yhat_lower
) and date.
We can simply call the plot function of the model to see the prediction behavior in the future.
prophet_model.plot(pred);
We can see that the model is performing mediocre on the seen dataset, and just a simple straight line from 2012-onwards, which does not seems a good representation of our dataset.
2. Model Evaluation, HyperParameter Tuning, and improvement
As we have seen in earlier sections, a long year average does have some kind of pattern, or at least it is not very irregular. So we can add a custom seasonality of 10 or 15 years to see how our model performs.
Let’s understand a few things before testing the model with custom seasonality.
Fourier Order is a number in the partial sum (the order) that is a parameter that determines how quickly the seasonality can change. So if you have chosen a high Fourier order and the seasonality is not changing quickly, then your model is going to overfit and vice-versa.
The default Fourier order for yearly seasonality is 10 according to the documentation of FBProphet.
So if we are going to add a custom seasonality of 10 years, the Fourier order should be 100, and for 15 years, it should be 150. Since we have seen in the graphs above that our data does not have a quickly changing seasonality, so we are going to use a lower value than calculated so that our model is not being overfitted.
Let’s create some helper functions, so we can quickly change the parameters and see the results of the model.
def fb_prophet_function(data, future_years, seasonality_name, seasonality_val,seasonality_fourier, **params):
"""
Trains a fb prophet model on given hyperparameters and custom
seasonality, predicts on future dataset, plot the results and
return the model.
"""
start= time.time()
prophet_model = Prophet(**params)
prophet_model.add_seasonality(name=seasonality_name, period=seasonality_val, fourier_order=seasonality_fourier)
prophet_model.fit(data)
future_dataset = prophet_model.make_future_dataframe(periods=future_years, freq='y')
pred = prophet_model.predict(future_dataset)
prophet_model.plot(pred, figsize=(15,7));
plt.ylim(-500, 3000)
plt.title(f"fourier order{seasonality_fourier}, seasonality time {seasonality_name}")
plt.show()
end = time.time()
print(f"Total Execution Time {end-start} seconds")
return prophet_model
And 2nd function for just plotting the predictions of the last few years given via size parameter.
def plot_valid(validation_set, size, model):
pred = model.predict(validation_set)
temp = df[-size:].copy().reset_index()
temp['pred']=pred['yhat']
temp.set_index('ds')[['y', 'pred']].plot()
plt.tight_layout();
Let’s now have a small split for the validation set to evaluate the model even better.
training_set = df[:-1000]
validation_set = df[-1000:] #last 1000 rows, i.e from Jul 2018# 15 years seasonlaity, additive, no other seasonality, less fourier value
fifteen_years = fb_prophet_function(data=training_set, future_years=6, seasonality_name='15_years', seasonality_val=365*15, seasonality_fourier=100,seasonality_mode='additive')
Seasonality value defines that what is the length of our seasonality. For 1 year, it is 365 days, for 15 years, it is 365*15 days.
The plot for 15 years of custom seasonality will be
The performance on the validation set would be:
plot_valid(validation_set, 1000, fifteen_years)
plt.title("Hyp parameters: 15_years seasonality, seasonality_fourier=100, seasonality_mode=additive\n prediction from Jul2018-Apr2022(from training set i.e validation set)");
Let’s now try with 10 years of custom seasonality. The Fourier value should be 100(based on 10 for a single year as the default value) but we are going to use 80 as we do not have a much-changing seasonality for 10 years graph.
# 10 years seasonlaity, no other seasonlaity, additive, less fourier
training_set = df[:-1000]
validation_set = df[-1000:]ten_years_model = fb_prophet_function(data=training_set, future_years=6, seasonality_name='10_years', seasonality_val=365*10, seasonality_fourier=80,seasonality_mode='additive')
The plot is:
Let’s plot the predictions on the validation set.
plot_valid(validation_set, 1000, ten_years_model)
plt.title("Hyp parameters: 10_years seasonality, seasonality_fourier=80, seasonality_mode=additive\n prediction from Jul2018-Apr2022(from training set i.e validation set)");
Ten Years custom seasonality model seems to be performing well, so we can save it. Note that if you are going to use a model in production, make sure to test it further using other metrics as well before saving and finalizing a model.
Similarly, there are other areas of improvement for the model which we can decide on, based on our visualizations. For example, in the first plot, we can see that the trend has changed after 2001, so instead of feeding the model complete data, we can just try feeding it data from 2001 onwards with some other suitable custom seasonality and Fourier value, but I have left this as a homework for you!! Let me know in the comments if you try this.
Saving a model
We can use the Pickle library to save a model, as it would be also convenient to load the model via pickle and perform predictions on it.
import pickle
with open('../models/fbprophet.pckl', 'wb') as fout: # saving the model in models directory
pickle.dump(ten_years_model, fout)
3. API Creation with Flask
Flask is a micro web framework written in Python, that can be used to create web apps, and APIs quickly and easily in Python.
Enough of using Jupyter Notebooks, let’s now shift toward scripts 🤗, for creating our API in Flask. The code is going to be structured in a way that you can easily add your models to it, and compare the performance of all the models.
Creating Utility classes for our application
Let’s create some helper classes that will help us in predicting the values. We are going to follow the Open/Closed design principle here, which states
Software entities (classes, modules, functions, etc.) should be open for extension, but closed for modification.
This means that we can extend our class i.e via inheritance without having to modify it. A good way to achieve this in Python is by using Abstract Classes. Let’s move into the api/utils.py file, and do the important imports.
import pickle
import datetime
from typing import List
import pandas as pd
from abc import ABC, abstractmethod
Let’s now create a base prediction class, and you can derive a prediction class for every different model architecture you apply to it.
import pickle
import datetime
from typing import List
import pandas as pd
from abc import ABC, abstractmethodclass GoldPricePredictor(ABC):
def __init__(self, model_name) -> None:
"""
Loads the model from the given file
"""
with open(f"models/{model_name}.pckl", "rb",) as fin:
try:
self.model = pickle.load(fin)
except (OSError, FileNotFoundError, TypeError):
print("wrong path/ model not available")
exit(-1)def calculate_next_date(self, prev_date):
"""
Calculates next date
date_format = yyyy-mm-dd
"""
self.next_date = datetime.datetime(
*list(map(lambda x: int(x), prev_date.split("-")))
) + datetime.timedelta(
days=1
) # next datedef get_next_date(self, prev_date):
try:
return self.next_date.strftime("%y-%m-%d")
except NameError:
self.calculate_next_date(prev_date)@abstractmethod
def predict(self, prev_date) -> List:
pass@abstractmethod
def preprocess_inputs(self, prev_date):
pass@abstractmethod
def postprocess_outputs(self, output_from_model) -> List:
pass
We have created some simple methods(which are self-explanatory) and 3 abstract method which is what the base classes are going to implement. So if we have 3 separate models i.e LSTM, FBProphet, and Arima, we are going to create 3 derived classes each with 3 different methods based on their model requirements.
Let’s create the FBProphet Model Class now.
class FBProphetPredictor(GoldPricePredictor):
def __init__(self,) -> None:
"""
Load the Model from file models/fbprophet.pckl
"""
super().__init__("fbprophet")def preprocess_inputs(self, prev_date):
"""
Model takes in an input as a pandas dataframe having index
as the day to be predicted
"""
self.calculate_next_date(prev_date) # get the self.next_date var
next_date_series = pd.DataFrame(
{"ds": pd.date_range(start=self.next_date, end=self.next_date)}
)
return next_date_seriesdef postprocess_outputs(self, output_from_model) -> List:
"""
Return the yhat in the list format
"""
return output_from_model["yhat"].tolist()def predict(self, prev_date) -> List:
next_date_series = self.preprocess_inputs(prev_date) # preprocess inppred = self.model.predict(next_date_series) # predictionpred = self.postprocess_outputs(pred) # postprocess prediction
return pred # return prediction
Let’s have a look inside the code now. As we have seen that fbprophet model takes in an input which is a dataframe having a column “ds” with a date to be predicted, so we created a dataframe using Pandas with the next date given in the preprocess_inputs
function. In the postprocess_outputs
function, we have taken the desired value from the dataframe and converted it to a list. The predict
function takes in an input which is the previous date from the date we have to predict, we preprocess this input via preprocess input function, get the predictions from the models, and post-process them.
Configuration file for our models
Let’s create a config file having all our models so that we can use it in our application. Every new model and its related class can be added to the config file for further use.
import utilsmodels = {
"fbprophet": utils.FBProphetPredictor,
"auto_arima": utils.ArimaPredictor,
}
Flask Application
As we all know, every flask app has an app.py
file in which you describe all the routes and stuff. Let’s create that.
import flask
from cfg import models as model_list
from flask import jsonify, render_template, requestmodels = model_list.models # dict of all models from which to selectapp = flask.Flask(__name__)
app.config["debug"] = True
Let’s now define a route, which has a basic UI for us to test the model performance quickly.
@app.route("/")
def home():
return render_template("index.html", models=list(models.keys()))
This index.html file which is returned via render_template is present in the api/templates directory. Let’s add some HTML and JINJA to it.
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Home Page</title>
</head>
<body><h2>
Available Models
</h2><form action="{{ url_for('predict_gold') }}" method="post">
<label for="model_val">Choose a model</label>
<select id="model_val" name="model_name">
{% for model in models %}<option value="{{model}}">{{model}}</option>
{% endfor %}
</select><input type="text" name="date" , placeholder="DATE:YYYY-MM-DD">
<input type="submit" value="Submit">
</form></body></html>
We have simply added a form where you can select a model within all the available models from the config file, choose a date, and get the predictions via it. The simple HTML looks as follows when running the server.
Let’s now go back to api/app.py and add the functionality for predictions.
@app.route("/predict", methods=["POST"])
def predict_gold():
"""
Given the date, predict the gold price for next date
"""
model_name = request.form.get("model_name")
date_given = request.form.get("date")
model = models[model_name]() # get and initialize the model class from dictionary
pred = model.predict(date_given)return jsonify(
{
"given_date": date_given,
"next_date": model.get_next_date(date_given),
"price": pred,
},
)
Now you can test the API and the predict route for your model to be giving predictions in your web app via a UI.
Handling other types of requests with our API
As you are well aware of the fact that most APIs are used via requests either using Postman or Curl, or requests within a programming language such as requests
module in Python.
So we need to make sure that our API handles that.
@app.route("/predict", methods=["POST"])
def predict_gold():
"""
Given the date, predict the gold price for next date
"""
# print(request.form.get)
try:
model_name = request.form.get("model_name")
date_given = request.form.get("date")
model = models[model_name]()
pred = model.predict(date_given)
except KeyError: # get value from curl header
model_name = request.headers.get("model_name")
date_given = request.headers.get("date")
model = models[model_name]()
pred = model.predict(date_given)return jsonify(
{
"given_date": date_given,
"next_date": model.get_next_date(date_given),
"price": pred,
},
)
We have caught the KeyError exception which tells us that the specified keys(model_name, date) are not available in the form, so it must be in the headers i.e via CURL, Postman, or a related tool. We used request.headers.get
from Flask to get the value of the headers.
Now we can send a request via curl to test it.
$ curl -XPOST -H 'model_name: fbprophet' -H 'date: 2022-04-21' 'http://127.0.0.1:5000/predict'
You can use Python’s request module to access the API.
import requests
response = requests.post('http://127.0.0.1:5000/predict', headers={'model_name':'fbprophet', 'date': '2022-04-21'})
print(response.json())
The output is
{'given_date': '2022-04-21', 'next_date': '22-04-22', 'price': [1922.3042464629968]}
Note that I am calling this API from an environment that does not have fbprophet installed, so this is a great advantage of creating an API that your end-user does not need to install these packages, he just needs to send a request to this API URL i.e server, and get back the predicted response.
Let’s add a main block in the code to make sure that the app runs on the 0.0.0.0 port. Running on 0.0.0.0 is important especially when you deploy the application.
if __name__ == "__main__":
app.run(host="0.0.0.0", debug=True)
4. Dockerizing Flask Application
Let’s now move towards the final part of our tutorial. We need to understand the concept of Docker first. Docker runs over the Host OS and uses the OS-level virtualization to run the apps in packages known as containers.
So the containerized applications can run on Docker regardless of Operating System. It handles all the headaches of requirements installation and pre-requisites. All you need to do is to run the image, and the software will be running on your machine.
Requirements.txt
To containerize your application, you need to make sure that when your container runs, you have all the necessary packages installed in the container. Having a requirements.txt file is the best option in Python.
numpy
flask
wheel
pystan==2.19.1.1
fbprophet
scikit-learn
openpyxl
pandas
pmdarima
statsmodels
seaborn
matplotlib
gunicorn
One of the errors I faced was that when installing fbprophet via requirements.txt, it was not able to build via pip as the previous packages were not built at that time. So a good hack to this was installing all the necessary packages that are required for fbprophet before. You can find the prophet requirements.txt in the original repo by Facebook.
Also if you are using fbprophet, make sure to add it as fbprophet in the requirements file, and if you are using prophet, make sure to add it as prophet in the requirements file.
I just copied the contents of this and created a new requirement file named as requirements_proph.txt
.
To create an image, we need a Dockerfile. Let’s create a new file name Dockerfile(without any extension).
FROM python:3.7RUN mkdir /app
WORKDIR /app
COPY . /appRUN python -m pip install --upgrade pip setuptools wheel
RUN python -m pip install -r requirements_proph.txt
RUN python -m pip install -r requirements.txtCMD ["python", "api/app.py"]
Let’s analyze it step by step.
We have specified that we are going to use the Python:3.7 image as our base image. Next, we have run the command mkdir
to create a directory named app, changed the Working Directory to the app, and copied all the contents of our local directory to /app in the container. Next, we upgraded the pip and wheel and installed the requirements file which I copied from the FBProphet repo, and our requirements.txt file. After that, I have just specified the command to run the app.py file(which already has the app.run()
in if __name__==”__main__”
.
You can now build the image via
$ sudo docker image build -t gold_price_prediction_api .
Now we can run the docker image via the following commands.
$ sudo docker run gold_price_prediction_api
That’s it, now you can go to your http://localhost:5000/ and see your docker container being run there. Now you can deploy your docker container anywhere with ease, which we will be discussing in a future article.
5. Learning Outcomes
In this article you have learned the following things:
- Time Series modeling with FBProphet
- Custom Seasonality with FBProphet
- Creating an API with Flask
- OPEN/CLOSED Design principle in Python
- Dockerizing your Flask Application for deployment
Overall in both parts, you have now a good idea about the end-to-end pipeline of a simple machine learning problem from data collection to dockerization. If you haven't read the first part, you can read it here.
Complete Source Code: Github
I hope you enjoyed the article, let me know in the comments what you think about it. I hope you will implement some other models like LSTM, and ARIMA and let me know in the comments how they perform.
Follow me on Medium for more articles, and connect with me on LinkedIn(I am mostly active on LinkedIn).