Machine Learning for Sales Forecasting: A Capstone Project with Columbia University

Francesca Lazzeri
Microsoft Azure
Published in
3 min readJan 5, 2020

This past semester we have been collaborating on a machine learning Capstone Project with Columbia University’s Master of Science in Applied Analytics: capstone projects are applied and experimental projects where students take what they have learned throughout the course of their graduate program and apply it to examine a specific area of study.

Capstone projects are specifically designed to encourage students to think critically, solve challenging data science problems, and develop analytical skills.

Two group of students built an end-to-end data science solution using Azure Machine Learning to accurately forecast sales. Azure Machine Learning is a cloud-based environment that you can use to train, deploy, automate, manage, and track ML models.

Azure Machine Learning can be used for any kind of machine learning, from classical machine learning to deep learning, supervised, and unsupervised learning. Whether you prefer to write Python or R code or zero-code/low-code options such as the designer, you can build, train, and track highly accurate machine learning and deep-learning models in an Azure Machine Learning Workspace.

To explore the solution developed by students at Columbia University, you can look at their Time-Series-Prediction repository on GitHub. In this article, we use an approach also used by Columbia University students, which is Automated Machine Learning (Automated ML or AutoML) to train, select, and operationalize a time-series forecasting model for multiple time-series. Make sure you have executed the configuration notebook before running this notebook.

Automated ML (as illustrated in Figure 1 below) is the process of automating the time consuming, iterative tasks of machine learning model development. It allows data scientists, analysts, and developers to build ML models with high scale, efficiency, and productivity all while sustaining model quality.

Figure 1 — Automated ML process on Azure — Source: www.aka.ms/AutomatedMLDocs

The examples below use the Dominick’s Finer Foods data set from James M. Kilts Center, University of Chicago Booth School of Business, to forecast orange juice sales. In the rest of this article we will go through the following steps:

Figure — Time Series Forecasting Solution Template on the Cloud

To read all the details and Python code for this solution, see https://techcommunity.microsoft.com/t5/educator-developer-blog/machine-learning-for-sales-forecasting-a-capstone-project-with/ba-p/1091578

Resources to learn more

To learn more, you can read the following articles and notebooks:

--

--

Francesca Lazzeri
Microsoft Azure

Principal Data Scientist Director @Microsoft ~ Adjunct Professor @Columbia University ~ PhD