Getting started with Azure Machine Learning: build a digit recognizer
Azure is Microsoft’s cloud computing service for building, deploying and managing applications. Similar to Amazon’s AWS and Google’s Cloud Platform, Azure provides tools for a plethora of cloud services. There exists tools for providing more compute power to gathering insights from big data to deploying web applications. In this article, I will talk about the Machine Learning toolkit in Azure and point to a tutorial where you can build a model that recognizes hand-written digits and deploy a web application that can predict what digit a user drew out on the screen. The final web application will look like this:
Machine learning is a subfield in computer science that ‘learns’ from data rather than being explicitly programmed. The main purpose of machine learning to create a statistical model from past data (referred to as training data) so that it can make predictions or classifications on unseen data (referred to as test data). For example, we can train a model on images of hand-written digits (0,1,2,…,9) to learn what a digits looks like and then test on unseen images to evaluate its performance. A typical machine learning application pipeline looks like this:
We start out with raw data, in our case, images of digits with labels that tell us what digit is contained in the image. Then, the data is cleaned up and transformed for analysis. Often, this process of data wrangling and feature engineering is the most time-consuming part of the workflow. Then, machine learning models are trained on this clean data and evaluated to see which model performs the best and then, the best performing model is deployed in a real-world application.
Azure Machine Learning provides a fully managed cloud service for building machine learning models and deploying the models as a web service. I have been using Azure for some time and these are the top features for me:
- Portable: Software and their dependencies can be annoying to manage and update. Azure ML Studio is fully managed by Microsoft professionals and your project/experiment can be accessed on the web from anywhere.
- Simple interface: The interface contains drag-and-drop components that are common to data science applications so there is no need to program these modules. There exists a lot of components ranging from data wrangling to model evaluation and this helps to quickly set up a pipeline.
- External support for Python and R: This is a big one for me. Although there exists a wide array of machine learning algorithms that are already implemented in Azure, it helps to have the option to write your own Python or R script to provide that flexibility when needed.
- Deployment: It is very simple to set up a web service with the model developed in Azure and then call that from a front-end application using a REST API.
Personally, I feel that Azure is an excellent tool for data/business analysts since it provides a wide range of tools and is very simple to use. In terms of disadvantages, I would say that it is not a tool desired for complex models such as training a Long Short Term Memory networks (LSTM) for text classification but that is a rare task for data analysts and can be achieved through external Python scripts.
I would also like to point to an excellent tutorial on Azure Machine Learning from the Microsoft Team on Github:
This tutorial builds a machine learning model that recognizes hand-written digits with accuracy over 95%. The model trains on images of size 8 X 8 such as the figure shown below:
Each image contains 64 pixels with each pixel in grayscale ranging from 0 (white) to 15 (black). The digit labels are between 0 to 9 and the tutorial walks you through the process of training a Logistic Regression classifier that is around 97% accurate at recognizing hand-written digits. The model is then deployed as a web service and a front-end application is created where the user can draw out a digit in a grid box and request the web service to to predict what digit it is and return a prediction to the user.
Acknowledgements: I would like to thank Simran Chaudhry and Sage Franch for inspiring me to start writing. I would also like to thank Jeff Prosise for contributing the Github tutorial that I linked to and learned so much from.