A Developer’s Perspective on Azure Machine Learning Studio

By Brian Rush, Senior Software Engineer, Capax Global

brianmrush
Hitachi Solutions Braintrust
12 min readMar 12, 2019

--

This story is meant to provide a brief background of what Microsoft Azure Machine Learning Studio(AMLS) is and, more importantly, from a developer’s perspective. In this article, I am going to show how developers can tap into AMLS from within applications — taking advantage of the strides being made in machine learning and applying those techniques directly within applications. An important disclaimer: this article is in no way, shape, or form about data science. That topic is an entire discipline in itself. This article is about how we can leverage data science models within our applications.

A term you hear a lot is democratization of machine learning. AMLS is one example of that. AMLS offers access to many sophisticated ML models through a simple graphical UI (A loose analogy I have in my head is SSIS). It doesn’t require a Ph.D. in statistics to leverage some very sophisticated ML algorithms. Granted, there needs to be an understanding of what algorithms to use and when to use them, but we are not writing the code for, say, a linear regression analysis.

There is no doubt with the advances in technology, specifically the cloud, the coalescing of disciplines and access to those disciplines is making the types of things we can do as developers not only easier but brings in a new level of sophistication that was previously hard to achieve. Take for example distributed and scaling our applications worldwide. What once was extremely difficult is now almost as easy as pushing a button. AMLS is no different.

What is Azure Machine Learning Studio

AMLS is a collaborative, drag-and-drop cloud-based tool you can use to build, test, and deploy predictive analytics solutions on data. One aspect that becomes directly useful for our applications is that AMLS can publish ML models as web services that we can then consume from within our custom apps or BI tools such as Excel, Power BI, or any custom application.

The focus of this article is not on AMLS directly, but rather on how we can leverage the output of AMLS. Before we get into this, let me just mention a few key things about AMLS.

How to get started

Microsoft offers a free AMLS workspace. A great place as a developer to put on our data scientist hat and try stuff out.

Sign up for a free account here.

I highly recommend signing up for a free workspace and tinkering. It’s a brave new world!

What kind of things can I do in Azure Machine Learning Studio

Through AMLS you get a visual workspace where you can build, test, and iterate on machine learning models. You can drag-and-drop datasets and incorporate proven, out of the box analysis modules (think known algorithms such as linear regressions) onto an interactive canvas. Connect data and algorithms through a workflow to form machine learning experiments. You use AMLS to run your experiments, see results, evaluate results, and iterate. Once you are satisfied with your machine learning experiment, you can publish it to a web service that, in turn, makes it useful to consumers — just like applications. The following diagram depicts the type of workflow from within AMLS.

What kind of business problems can Azure Machine Learning Studio help solve

In this day and age of Big Data, how can we leverage this data to help make important business decisions? Having the data coupled with running machine learning techniques and algorithms on that data can lead to important business solutions. Here are the types of questions and answers that AMLS can help answer:

  1. I want to predict something. Examples include things like wanting to predict the demand for a product, future sales growth, or when equipment will need servicing. These all fall into the category of predicting things through machine learning via regression algorithms.
  2. I want to find unusual things. Examples include predicting a credit default, identifying credit card fraud, or detecting equipment failures. These examples all fall into the category machine learning via anomaly detection algorithms.
  3. I want to find structure. Examples include segmentation of customers, segmentation of voters, predicting customer likes, or predicting which products are typically purchased together and by whom. These all fall into the category of machine learning via clustering algorithms.
  4. I want to predict between two things. Examples include is this tweet positive? Is this email positive? Which coupons result in the most sales? These all fall into the category of two class classification algorithms.
  5. I want to predict between many things. Examples include what is the mood of this tweet, what service will a customer likely choose, or which sales promotions draw the most customers. These fall into the category of machine learning through classification algorithms.

These are just a few examples of the types of questions we can answer by leveraging data, AMLS, and machine learning techniques.

What are the tools that Azure Machine Learning Studio provides

To develop a predictive analysis model, you typically use data from one or more sources, transform and analyze that data through various data manipulation and statistical functions and algorithms, and generate a set of results. Developing a model like this is an iterative process. As you modify the various functions and their parameters, your results converge until you are satisfied that you have a trained, effective model. AMLS gives us the tools we need to complete the machine learning workflow. The major components of AMLS that help organize our machine learning experience are:

Projects — This helps us organize all the experiments, datasets, notebooks, and other resources representing a single machine learning project. Think of this as similar to a project or solution in Visual Studio.

Experiments — This is where we run the actual experiments like predicting sales figures within AMLS. Within our experiments we will incorporate items such as:

  • Datasets— Data upload or extracted in our experiments and incorporated into our machine learning models.
  • Modules — A set of code that can run independently and perform a machine learning task, given the required inputs. A module might contain a particular algorithm, or perform a task that is important in machine learning, such as missing value replacement or statistical analysis. This includes known algorithms and programming languages such as R and Python. More details can be found here.
  • Data flows. This is a visual representation of how our experiments flow. Think of this as similar to an SSIS package where we have data flow through an ETL process. Similarly, there is a machine learning process.

Web Services — From a developer perspective, this has the most use for us. This is where we deploy our experiments to web services so they can be consumed by our applications.

Notebooks — Jupyter notebooks that you have created that can be incorporated into our experiments

Datasets — Datasets used in our experiments that you have been uploaded into AMLS.

Trained Models — Models that you have trained in experiments and saved in Studio.

Setting — A collection of settings that you can use to configure your account and resources.

“This is all great but show me how all this really makes a difference to me as a developer and the applications I build….”

Don’t bore us, get to the chorus…

I completely agree with this sentiment. All of the above information is really just to provide a background and a brief understanding of what AMLS is. We have to start somewhere, right?…

Let's talk about a specific example of how this all can be incorporated into an application.

Let's imagine for a moment we work for a bank and we are taking applications for a credit card. Potential customers go to some application page and we gather important information about them during the application process. Important and relevant information includes items such as age, education level, current salary, monthly rent/mortgage, marital status, number of dependents, and monthly bill amounts. All great information and we can leverage this information to get our customers aligned with the right credit card. Maybe, more importantly, we can predict how good a customer they will be by predicting their credit risk.

What if we could know right there, at the time of filling out an application, things like:

A) Is this applicant a high credit risk

B) This applicant is likely a low credit risk so let’s pre-approve them

C) We are not sure if this is a real customer or not and they may be trying to game the application process

These are the exact type of questions and answers that AMLS can help us with. We can build models that we can expose to our applications that can answer these questions. Imagine in real time, if you could ask a web service at the end of an application process if this user is a low credit risk? Knowing that answer would most likely inform the application process. For example, if we predict its a low credit risk then pre-approve the card. If we predict the user is a high credit risk, funnel the application through a human validation process. If we detect that we have seen too many similar applications in a short time frame, flag the application as potential fraud. These are examples of the exact type of things we can now do in our applications leveraging machine learning.

Machine learning in action

AMLS isa great example of how easy it is to create an experiment in which we can predict credit risk using existing data. I don’t want to go into the specifics here as Azure already has a really useful tutorial. I highly recommend signing up for a free AMLS workspace and walking through this tutorial. This will give you a feel for:

  • Creating a Machine Learning Studio workspace
  • Uploading and leveraging existing data in a machine learning experiment
  • Training your machine learning experiment

The more important thing I want to cover here is that after the experiment has been created, we can publish that experiment to a web service that we can then leverage directly from our applications. This gives us the ability to predict right there, on the spot, if our credit card applicant is a credit risk.

Publish machine learning to web services

So now we are speaking the language of developers. In the ever prevalent microservice, API ecosystem, we add web services into that ecosystem that use machine learning. Let’s drill a bit further into this.

After walking through the credit risk machine learning experiment, we come to a spot where we want to expose this to our applications. In our specific case, the application process for a credit card.

The credit experiment should look something like the following in AMLS prior to publishing to a web service:

Inside of AMLS, the next step is to set this experiment as a predictive web service. We do that by selecting the Setup Web Service menu item:

Once the web service has been published, you will see something like the following within AMLS:

The key here is that we can see we have two new components within AMLS, web service input and output, respectively. I imagine you can see how our experiment is starting to take shape in the sense that we can send input through it and get output.

Our last step is to deploy this credit risk predictive model as a web service. To do that, select the deploy web service menu item. That should lead you to a screen that looks like this:

From here we can test our web service by entering in the required input data. Doing so will give us a predicted credit risk response by running the variable input through our machine learning model.

Other important information from AMLS is the API Key. This gives us the information we need to connect and consume this predictive web service from, say, some C# web application. In our case, it’s the credit card application web site. Once the web service is published, from within AMLS we will now see an entry that details our new service:

We can click on each web service to get all kinds of good information about it. For example, the expected inputs, the expected outputs, the key to access the service published within Azure, and more.

All told the workflow we went through within AMLS looks something like the following:

As developers, we are probably going to be more focused on the consumption side of the predictive web service. All the work being done within the develop, iterate, and test is more than likely in the realm of data scientists. However, there is no reason that with a good understanding of AMLS and an understanding of how, when, and what machine learning modules/algorithms to use, that we can’t do this ourselves. This goes back to my earlier point about how disciplines are coalescing.

Consume a predictive web service

This last section details how we can code our C# applications to consume web services published through AMLS. The key here is that this predictive credit risk web service is no different than any other REST API service we are all familiar with. In our example of the credit card application process, during the save of an application, we can create an API request to the credit risk web service, get a response, and act accordingly. In this specific example, we can get the probability of a credit risk based on the input parameters.

The following is a sample C# request to the service:

And, here’s a JavaScript example for the client-side version of interacting with our predictive web service:

For our particular example, the key bits of information that we are looking for, namely credit risk probability, will be on the response from our API request to the web service. In this case, the field called Scored Probability. That will represent the probability of a credit risk based on the input parameters and scored by our predictive credit risk web service. Reading that value and acting appropriately is an example of how we can use machine learning within an application.

Overall AMLS seems really powerful. The interface seems to pull in concepts similar to other tools such as SSIS, Azure Data Factory, and even Visual Studio. I am sure pure data scientists may say that AMLS does not have all the features required that they would typically use in their day-to-day jobs. That may be a fair point. Since I am not a data scientist, I can’t fully comment on that sentiment. That being said though, features are being added frequently and with time I bet this will be a full-fledged suite of machine learning technologies. Recent interrogations with R, Python, and Jupyter Notebook indicate that catering to the data scientist crowd is at the forefront. Similar to Visual Studio, AMLS will evolve and mature with time. Already it seems extremely robust to me. I invite you to learn more about AMLS here.

--

--