Realtime MSFT Stock price predictor using Azure ML

Github repository for the project: Click here

Syed Sohaib Uddin
11 min readMay 29, 2020

Stock markets are a pool of uncertainty, they plummet one moment and rocket the other. The latest developments have showcased traders opting for methods and tools to predict the performance of stocks, all of them revolving around AI and ML concepts. In this blog, we will create an ML model that trains itself on realtime data from current market trends and predicts stock value for any given time.

This article is also a part of the MSP Developer Stories initiative by the Microsoft Student Partners (India) program.

Project Overview

Goal: We will be making a web application that predicts the maximum value of MSFT stock at any given DateTime depending on real-time stock data.

The model will be made using Linear regression in Azure ML designer and trained on data from the yfinance API. The data will be downloaded locally and then uploaded onto Azure blob storage. Beforehand, we will write a trigger using Azure ML Python SDK that runs the training pipeline on every occurrence of data upload to blob storage. In this way, the model will always be trained on the latest MSFT stock data. The pipeline would be deployed via a real-time inference that provides an endpoint to get real-time predictions from the model. The entire process will be automated via a web application in flask.

Let's get started. 🚀

Pre-requisites

  1. Knowledge in Python, REST API calls and Machine learning.
  2. An active Azure subscription. You may use the Azure for Students to get a subscription for free or else you can use Azure free account.

Our route at a glance

  1. Downloading the MSFT stock database using the Yahoo finance API.
  2. Uploading the database to Azure Blob storage.
  3. Training the model on the data using Linear regression and creating a real-time inference endpoint via deployment on AKS Cluster.
  4. Triggering the ML pipeline run to retrain the model when DB in blob storage is modified.
  5. Building a web application that automates all the above processes and adds functionality to the user end.

Step 1: Downloading the MSFT stock database using the Yahoo finance API

Yahoo finance provides financial news, data and commentary including stock quotes, press releases, financial reports, and original content on stock markets. Yahoo Finance API helps to query for all information about finance summary, stocks, quotes, movers, etc. It has been deprecated and many programs that relied on it stopped working. However, yfinance aims to solve this problem by offering a reliable, threaded, and Pythonic way to download historical market data from Yahoo! finance. It is open-sourced on Github and can be found here.

  • Create a working directory/folder on your computer.
  • Before beginning, let's create a virtual environment into our directory to avoid conflict with site-packages.
  • Open your terminal and execute the following in the same order.
dir> pip install virtualenv 
dir> virtualenv venv
  • This installs and creates a virtual environment for all the dependencies. Activate the virtual environment by navigating to the activate.bat file.
dir> venv\Scripts\activate
  • Now, clone the yfinance repository into your local directory and execute the following into your terminal.
pip install --r requiremnets.txt
  • Once all dependencies are successfully installed, create a new file in the root called fetchdb.py and paste the following code into it.
import yfinance as yf
from datetime import datetime, timedelta
x=datetime.now()
date_N_days_ago = datetime.now() - timedelta(days=7)
msft = yf.Ticker("MSFT")data_df = yf.download("MSFT",
start=date_N_days_ago.strftime("%Y"+"-"+"%m"+"-"+"%d"), interval="1m", end=x.strftime("%Y"+"-"+"%m"+"-"+"%d"))

data_df.to_csv('ds.csv')
  • The following code downloads the MSFT stock values at intervals of 1 min for the last 7 days into a file called ds.csv .
  • Execute the above code by running python fetchdb.py in your terminal. On successful execution, you will find ds.csv in your root directory.

Step 2: Uploading the database to Azure Blob storage

Azure Blob storage is Microsoft’s object storage solution for the cloud. Blob storage is optimized for storing massive amounts of unstructured data. We will upload our dataset ds.csv to a blob and from there download it as a dataset into our ML pipeline.

In order to access blob storage, we need a storage account. When we create our Azure ML workspace, we have one created by default. So let’s create a workspace first.

  • Log in to your Azure portal and search for Machine Learning.
  • Navigate to the Azure ML page and click on add.
  • Build your workspace by filling in the details. Create a new resource group for all your Azure resources related to this project. Basic edition should also work fine for this project but choose Enterprise edition just in case you want to run auto-ml.
  • Once your workspace is deployed, search for ‘storage accounts’ and navigate to the storage accounts page. Click on the storage account created under your ML workspace resource group and navigate to the main page.
  • On the main page, navigate to storage explorer and add a folder(container) into ‘BLOB CONTAINERS’. This folder will be the container for your dataset.
  • Next, navigate to the Access keys and copy connection string under key1.
  • Open your CMD and set up an environment variable by executing the following. Remember to paste your connection string.
setx AZURE_STORAGE_CONNECTION_STRING "<your-conntn-string>"
  • Next, move back to your local directory and install the following
pip install azure.storage.blob
  • Create a new file in the project root directory called upload_to_blob.py and paste the following code into it.
import os, uuid
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient
try:
connect_str = os.getenv('AZURE_STORAGE_CONNECTION_STRING')
# print(connect_str)
# Create the BlobServiceClient object which will be used to create a container client
blob_service_client = BlobServiceClient.from_connection_string(connect_str)
# Create a unique name for the container
container_name = "<your-containername-from-blob-storage>"
# Create a file in local data directory to upload and download
local_path = "./"
local_file_name = "ds.csv"
upload_file_path = os.path.join(local_path, local_file_name)
# Create a blob client using the local file name as the name for the blob
blob_client = blob_service_client.get_blob_client(container=container_name, blob=local_file_name)
# Instantiate a ContainerClient
container_client = blob_service_client.get_container_client("datasetcontainer")
# Upload the created /file
with open(upload_file_path, "rb") as data:
blob_client.upload_blob(data)

except Exception as ex:
print('Exception:')
print(ex)

The following code uploads the file ds.csv as a blob into the container we created earlier.

  • Execute the code above and navigate to your storage account to see if the blob uploaded successfully.
  • Once the blob is uploaded, for every change now, the blob has to be deleted and then uploaded again with the latest data. Hence, we add into the file, container_client.delete_blob(“ds.csv”) before
with open(upload_file_path, "rb") as data:
blob_client.upload_blob(data)

Step3: Training the model on the data using Linear regression and creating a real-time inference endpoint via deployment on AKS Cluster

In order to train the model, we need to create an experiment and then set up a pipeline.

  • Go to your Azure portal and navigate to the Machine learning page. Click on your project and launch the studio. Choose your workspace and move in.
  • Now, navigate to Datastores.
  • Create a new Datastore by entering a Datastore name you like, choose the type as Azure Blob Storage and choose your subscription ID. Carefully choose your storage account and select the Blob container you created earlier. Choose the authentication type as account key and enter the Account key from Access key under Key1 in your storage account from Step2. Click on create.

On success, you should see your datastore on the Datastore page.

  • Next, click on Datasets from the left pane and create a new dataset from your datastore. Enter your dataset name and select type as tabular. Choose the datastore you created above and browse to select your file ds.csv . In the settings, set ‘Column headers’ to ‘Use headers from first file’. For the schema, exclude the last two columns as shown below. Click on create.
  • Navigate to Pipelines and create a new pipeline. In the settings prompt on the right pane, click on ‘select compute target’ and create a new compute target.
  • Create the following pipeline flow and configure Split data, Linear Regression and Train the model blocks as described below:

Split data: Set ‘Fraction of rows in the first output dataset’ to 0.7.

Linear Regression: Set ‘Number of training epochs’ to 1000.

Train Model: This pipeline actually trains two models, even though we couldn’t deploy both of them together, via real-time endpoint. The reason is to just have two models, each trained on High and Low respectively. Click on train model and choose ‘label column’ as ‘High’ for one and ‘Low’ for the other.

Click on submit to run the pipeline and train the model.

  • Once the model is trained, click on Evaluate model block and view the evaluation results
  • Now, create a real-time inference pipeline by first selecting one Train Model block and clicking on ‘Create real-time inference’. Once done, the real-time pipeline is ready to be deployed for that model.
  • Deploy the real-time inference pipeline on an inference cluster i.e an AKS cluster. Follow the prompts on the screen to do so.
  • On successful deployment, navigate to endpoints > real-time endpoints and test your model. Copy the python code from the consume tab and try running it from your local directory.

Step4: Triggering the ML pipeline run to retrain the model when DB in blob storage is modified

Once the model is trained and ready, it needs to be retrained and deployed timely to ensure real-time predictions. Hence, we need to trigger the pipeline run externally from the web application.

  • Click on Notebooks from the menu bar, create a blank new notebook and run the following code into the cells.
import azureml.core
from azureml.core import Workspace
from azureml.pipeline.core import Pipeline, PublishedPipeline
from azureml.core.experiment import Experiment
from azureml.core.datastore import Datastore

ws = Workspace.from_config()

experiments = Experiment.list(ws)
for experiment in experiments:
print(experiment.name)


published_pipelines = PublishedPipeline.list(ws)
for published_pipeline in published_pipelines:
print(f"{published_pipeline.name},'{published_pipeline.id}'")

#experiment_name = "<Your-pipelin-name>"
#pipeline_id = "<your-pipeline-id>"
  • After running the above code, select the experiment name and pipeline from the output, update the last two lines, uncomment and run the cell again. Follow that by running the code below into another cell.
from azureml.pipeline.core.schedule import ScheduleRecurrence, Schedule

datastore = Datastore(workspace=ws, name="mydatastore")

reactive_schedule = Schedule.create(ws, name="MyReactiveSchedule", description="Based on input file change.",
pipeline_id=pipeline_id, experiment_name=experiment_name, datastore=datastore, data_path_parameter_name="input_data")
  • On success, you have the trigger set.

Step5: Building a web application that automates all the above processes and adds functionality to the user end

  • I have built a web application using flask and integrated all the code above appropriately under the app route to home. Check out my repository here.
  • You can create a fresh new directory and clone the entire repo. Create a virtual environment and activate it.
  • Run pip install --r requirements.txt to install all the dependencies.
  • Make sure to edit your real-time inference endpoint in the views.py file before deploying.
  • Execute run flask in your terminal to serve the application locally.

Results

The app predicts the max value or High for MSFT stock, since we deployed real-time inference selecting the model trained on the ‘High’ column. The user can either use the stock parameters i.e Open, High, Close and Low from current market watch to predict the future max value of the MSFT stock or choose to provide inputs for the same in order to ignore the current trend. The website also shows the current state of the market i.e Open or Closed and shows the live value when the market is open. When the market is closed, Open, High, Low and Close values are shown and all predictions are based on the week-long database until market close.

Live market watch

Let us set a prediction for the maximum bound values of the stock at 8.30 pm local time.

Setting up prediction

The results are below.

Prediction result comparison

The model predicts the max stock value to not go above 183.51 at 8.30 pm and that is what happens:)

Ml pipeline triggered by the blob

Whenever a user presses the submit button, the training pipeline is triggered in the ML studio due to dataset update in the Blob. However, end-point needs to be manually deployed for the latest results. I have raised a Github issue requesting SDK support for the same.

For anyone wishing to provide their stock parameters to predict future trends, they can simply click ‘No’ in the radio box and enter the 4 corresponding values.

LIVE DEMO

The web application is hosted live, click here to check the live demo. However, if my monthly credits don’t last, consider it to be unlucky.

Conclusion

Machine learning is very easy with Azure’s ML studio. It is a revolution for anyone who works on model training and spends days optimizing it. Certainly, it allows anyone with barely any experience in ML to train and use models.

And for this project, accuracy always a matter of doubt in stock markets but we did pretty well here, the MSE was very very low. I forgot to snip the evaluation graph and now my credits have expired. Surely the accuracy is satisfying.

I hope that you like the simple project.

Cheers:)

Just today, Microsoft’s Azure ML team replied to my GitHub issue regarding deployment of the real-time endpoint with the SDK. We had the discussion in a meeting and the update is on the way. 😉 🔥

Further Reading

--

--