Recently a post about Predictive Power Score attracted the attention of many data scientists. Let’s see what it is and how to use it in R

Image for post
Image for post

In recent months Florian Wetschoreck published a story on Toward Data Science’s Medium channel that attracted the attention of many data scientists on LinkedIn thanks to its very provocative title: “RIP correlation. Introducing the Predictive Power Score”. Let’s see what it is and how to use it in R.

Definition of Predictive Power Score

The Predictive Power Score (PPS) is a normalized index (it ranges from 0 to 1) that tells us how much the variable x (be it numerical or categorical) could be used to predict the variable y (numerical or categorical). …


Image for post
Image for post

With Azure Machine Learning, you can easily submit your training script to various compute targets using an R estimator or a Python one, that wraps run configuration information to specify the details for the execution of a script. During the training, the entire folder containing the training script is copied to the target training cluster. A Docker Base image, which meets the requirements stated in the estimator, is built and becomes the environment on which the training script is executed.

Even if the upon mentioned executor allows you to declare additional CRAN and Github packages, or Conda and PIP packages, to load into the training environment, there may be some features you need for your script that can’t simply be declared using the usual estimator options. …


Image for post
Image for post

In my previous article I’ve shown how to install the Azure Machine Learning R SDK:

After verified the installation is done and it’s working, you need to log in into Azure Machine Learning and to get a reference to the Workspace object, that will allow you to work with all the artifacts you create when you use Azure Machine Learning. The centrality of the Workspace is shown in the following taxonomy:

Workspace taxonomy
Workspace taxonomy
fig. 1 — Workspace taxonomy

This article will show you how to authenticate in Azure ML using different technologies in order to get the reference to a Workspace object.

Authentication Methods

You can authenticate in multiple…


Azure Machine Learning R SDK
Azure Machine Learning R SDK

As probably you already know, Microsoft provided its Azure Machine Learning SDK for Python to build and run machine learning workflows, helping organizations to use massive data sets and bring all the benefits of the Azure cloud to machine learning.

Although Microsoft initially invested in R as the Advanced Analytics preferred language introducing the SQL Server R server and R services in the 2016 version, they abruptly shifted their attention to Python, investing exclusively on it. This basically happened for the following reasons:

  • Python’s simple syntax and readability make the language accessible to non-programmers
  • The most popular machine learning and deep learning open source libraries (such as Pandas, scikit-learn, TensorFlow, PyTorch, etc.) are deeply used by the Python…


Image for post
Image for post

Some third party applications usually ask for JSON files as input to import new data. An example is Splunk, a software platform to search, analyze and visualize the machine-generated data gathered from the websites, applications, sensors, devices etc. If the JSON format is mandatory for sharing information and the data you need to analyze is stored in a database, you need to transform your data from a tabular format to a JSON one, following a JSON schema. If you give this task to developers, the first idea they usually follow is to develop an application (in C#, Java or whatever programming language) that connects to the source database, loads data using an ORM (with consequent possible performance issues due to the inability to write optimum SQL code), transforms them using proper libraries and then exports the output in a JSON file. …


If you usually develop predictive models in R for your customers, you might need to provide them with a practical GUI to test their model. A really simple and convenient way would be to provide the users with as many sliders and/or combo boxes as the input features of the model, and a simple label showing the predicted value. The first obvious choice for a R developer would be a Shiny App. But if your customer IT infrastructure is Microsoft-centric, a Power BI report could be the best choice.

Our Test Predictive Model

First of all, let’s create an R model to test with the Power BI report. We’ll use the mtcars dataset, which is a dataset that was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models). After an in-depth analysis you can find here, the final model to fit will be the…


Image for post
Image for post

The new Azure Machine Learning Services are Python-based. If you want to operationalize predictive models developed in R on Azure, there isn’t a straightforward way to do that.

Image for post
Image for post
https://medium.com/microsoftazure/azure-machine-learning-for-r-practitioners-with-the-r-sdk-323454d338ae

Recently I read the following two articles:

Both the above mentioned methods are related to the same architecture. But both the solutions are quite complex. They require the following technologies:


If you want to do Advanced Analytics or Predictive Modeling directly into SQL Server, you can since the version 2016 thanks to SQL Server Machine Learning Services. In particular, you can take advantage from the extensibility framework running scripts in R, Python or Java (only available from the version 2019) directly into a system stored procedure, the sp_execute_external_script one.

Looking at the sp_execute_external_script syntax, among others, it accepts also the @input_data_1 parameter. It allows to pass the input data used by the external script in the form of a T-SQL query. The first question you’ll ask yourself is: “Ok, the parameter name is in the form of input_data_N. So I suppose multiple input data can be passed to the stored procedure!”. And guess what? The answer is: “No, you can’t!”. Maybe Microsoft called this parameter in such a way so that in the future you’ll be able to add multiple input data sets using the input_data_2 or input_data_3 and so on. …


Image for post
Image for post

In one of my previous post here I described how to evaluate regressions, using the most used metrics and plots. I took an experiment about modeling price elasticity as an example and, after analyzing the model with residual plots, it turned out there’s a problem after the 1st of September in the test data set:

Image for post
Image for post
fig. 1 — The calendar_date variable vs residuals plot shows that something strange happens after the 1st of September

Tools in R for a better data exploration will be shown in this post, showing a good way to prepare the data for a high performing predictive modeling. It’s also the only way to try to give an explanation of the problem emerged previously.

If you want to start from data sets without using the above suggested experiment, you can get them here. …


In a Data Science project it’s really important to get the more insights out of your data. There is a specific phase, the first one in the project, that has the data analysis as goal: the Data Exploration phase. Among other kinds of analysis, one of the most interesting is the bi-variate one, that finds out the relationship between two variables. If the two variables are categorical, the most common plot used to analyze their relationship is the mosaic plot. At first sight it may appear a little bit confusing. People not aware of some statistical concepts can miss important information this plot can give us. …

About

Luca Zavarella

Mentor & Technical Director at Lucient. Classical pianist in the free time.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store