Painless Machine Learning with Azure

Learn Azure Machine Learning and start your professional data science career today.

John Paul Ada
Programmers — Developers

--

This is the written version of my talk on Azure Machine Learning during the Programmers, Developers Meetup/Conference on August 20, 2016 at Microsoft PH.

The video of the my presentation and demo. Courtesy of sir Edison Tan of HubnobTV.

Bla bla bla. What is Machine Learning anyway?

Machine Learning is basically giving a machine the ability to: 1) learn and 2) create something based on what it has learned.

More accurately, Machine Learning gives a machine the ability to recognize patterns and reproduce those patterns.

Even more accurately, given an input and an output, it creates the algorithm — an equation.

SO..? What does that mean?

So… that means Machine Learning algorithms are generic algorithms that create algorithms for you.

Yep.

They are ALGORITHMS THAT CREATE ALGORITHMS.

BOOM.

Do you understand how amazing that is?

Let’s take facial detection, for example.

To do facial detection with Machine Learning, you just give the machine pictures of faces and tell the machine that those are pictures of faces and then give the machine pictures of non-faces and tell the machine that those are not pictures of faces. It then “learns” what a face “looks like” and when shown a picture of a face, it will be able to know that the picture is that of a face, even if the face shown was not previously shown before. The more you teach the machine, the better it gets at detecting faces.

Even if this is just a rather simple way of putting it while skipping some important steps such as feature extraction, this is basically what machine learning does. Notice that you didn’t have to create a special algorithm for detecting faces. Uh-huh. The machine learning algorithm creates the face detection algorithm for you. Then you can just save the created algorithm, also called a model [1], and voila! You can now use it for your various face detection needs.

Case Study: Employee Performance Prediction

This is one of my favorite Machine Learning applications because:

  1. It can be a good machine learning exercise for students and;
  2. This can be very useful for business who want to hire employees who perform effectively and efficiently.

Problem

We want to predict whether an employee will perform excellently or poorly.

Solution

This kind of problem is called a classification problem, wherein we are tasked to classify something. In this case, we have to classify an employee as either excellent or poor.

Like the facial detection example, we need to teach the machine that employees that perform excellently should be categorized as excellent and employees that perform poorly should be categorized as poor. To do that, we need to feed the machine some information about the employees.

For this problem, let’s suppose we have an existing employee database wherein each employee has the following fields:

  1. Employee Number (id)
  2. Name (name)
  3. Age (age)
  4. Marital Status (marital_status)
  5. Educational Attainment (education)
  6. General Weighted Average (gwa)
  7. Hiring Exam Score (exam_score)
  8. Hiring Interview Score (interview_score)
  9. Work Performance (performance)

I have prepared a spreadsheet of 100 employees here. Download the spreadsheet as a CSV file.

Using Azure Machine Learning Studio

A faster and easier way of applying Machine Learning to solve this problem involves the use of Azure Machine Learning Studio.

First things first, you need to sign-in with your Microsoft Account. If you don’t have one then make one. You’re missing out on one of the better things in life. After logging in, you enter the Workspace.

Importing Data

We’ll need the employee data CSV file that we downloaded earlier so let’s import that into our Workspace.

To import:

  1. Click on the Datasets tab.
  2. Click the big plus button at the button that says New. A window will slide up.
  3. Click the button that says From Local File. A modal that says Upload a new dataset will appear.
  4. Choose the file, enter the whatever you want in the name text box and select Generic CSV with a header. Then click the check button. It will then start uploading the file.
An empty Datasets tab.
Click the button that says From Local File.
Uploading a Generic CSV dataset.
The Datasets tab now lists the uploaded CSV dataset.

Creating an Experiment

Now that we have uploaded a dataset, we can now start applying Machine Learning to it by creating an Experiment. To create an Experiment in Azure Machine Learning Studio you:

  1. Click on the Experiments tab.
  2. Click the huge Plus button that says New on the bottom. A window will slide up.
  3. Click Blank Experiment. An experiment with the date as the title should appear.
  4. Rename your experiment.
An empty Experiments tab.
Choose Blank Experiment.
A newly created Experiment.
Experiment renamed to Employee Performance Prediction.

Building the Experiment

Now that we have our experiment, let’s set it up so that it solves our problem of predicting whether an employee will perform excellently or poorly.

First, we add our data set to the experiment. Click on Saved Datasets -> My Datasets. Select the Sample Employee Database and drag it on to the main panel.

Select the Dataset from My Datasets under Saved Datasets.
Drag the dataset to the main panel.

Then we need to Split the dataset into a training set and testing set, because we want to test how well the machine predicts the actual outcome after we run the algorithm.

So we search for the Split module. Search Split on the search bar on the side and a couple of results should show up. On Data Transformation -> Sample and Split, drag the module that says Split Data into the main panel.

Split Data module on the main panel.

Now click on the Split Data module. On the side, the properties tab will show the properties of the module. Set the Splitting mode to Split Rows, the Fraction of rows to 0.8 and tick the randomized split check box.

What this does is place a random 80% of the employees on the left circle that says 1 and 20% on the right circle.

Setting the Split Data module properties.

Connect the two modules and now we’re splitting the data into two parts.

We’re now ready to train/teach our machine. Select Machine Learning -> Train -> Train Model and drag it into the main panel.

Select Machine Learning -> Train -> Train Model.
Drag the Train Model module to the main panel.

From Machine Learning -> Initialize Model -> Classification, select Two-Class Boosted Decision Tree and drag it to the main panel.

From Machine Learning -> Initialize Model -> Classification, select Two-Class Boosted Decision Tree.
Drag the Two-Class Boosted Decision module to the main panel.

The Train Model module needs the algorithm that will be trained and the data it will be trained with. So we connect the Two-Class Boosted Decision Tree module and the Split Data module to the Train Model module. We used the left output circle of the Split Data module to connect to the Train Model module. Why, you ask?

Because we want to train the algorithm with 80% of the employees and test it with the remaining 20%. If you remember, earlier we set that left circle of the split data to deliver a random 80% of the employees.

Connecting the modules.

Now click the Train Model module and on the Properties tab, click Launch Column Selector.

Launching the Column Selector.

A modal will popup and ask us to select a single column. This column is the column that will be predicted by the algorithm. In our case, we want to predict the performance so we select the performance column. Then click the check button to proceed.

Select the column to be predicted.

Now we’ll try out our setup. We need to use the Score Model module. Select the Score Model module from Machine Learning -> Score and drag it on to the main panel.

Now connect the Train Model module and the right output circle of the to the Score Model.

After connecting the modules, hit the Run button at the bottom of the screen. This will run the set up.

The Score Model module will use the algorithm that was taught from the Trained Model module to try and predict the performance of the remaining 20% of the employees.

Every module that finishes will have a check mark. The dataset will not have a check mark because it is not a runnable module.

After it finishes, click the output circle of the Score Model module and click Visualize.

The Scored Labels column is the predicted performance of the employee.

We can see here that it actually predicted the performances of the employees.

We want to know how well the algorithm predicted the outcomes, so we’ll use the Evaluate Model module under the Machine Learning -> Evaluate -> Evaluate Model.

Connect the Score Model module to the Evaluate Model module and then hit Run. After that, click the output circle of the Evaluate Model module and click Visualize.

If we want other people to use this to predict their employee performances, we’ll need to setup a web service. Click Setup Web Service at the bottom. After the transformation, it will look like this:

Now to actually expose this web service and make it accessible to those who want to access our algorithm, click Deploy Web Service at the bottom. We will be redirected to a page that contains information about the web service, like the API key, the links to the API help pages, etc.

We can also test our API by clicking the blue Test button. It will ask us some sample details. Let’s enter some and click the OK button. The result will display on the bottom of the screen.

We can also test this using the Postman Chrome extension.

Header values for the Request.
Body of the response to be sent to the web service.
Response sent by the web service.

If you want to try it with real code, the API documentation for your web service also provides code that you can use to send requests and process the responses sent to and by the service. The samples provided are for C#, Python, and R.

I hope you learned something awesome today. Have fun playing around with Azure Machine Learning Studio! :D

[1] Because they are mathematical models — they are equations.

Also, if you liked this post, click the little green heart down below! Many thanks! :D

Personal Pages: http://johnpaulada.github.io

--

--