Microsoft R. Where, How, Why.
I know, a massive corporate encroaching on open source territory, it must be a trap right! I’m yet to find one, so let’s have a look at quick look at the where, how, why of Microsoft R and you can tell me if I’m being duped.
Microsoft R Open
Microsoft R Open (MRO) is the open source Microsoft variant of R. It is based on the CRAN R project with a number of enhancements. It is installed a single workstation. This is not to be confused with, but easily is, Microsoft R server (MRS) which is a commercial offering for serious R work load. I will discuss Microsoft R server next. Microsoft R Open has all the CRAN libraries as well as additional libraries. So you are getting everything and a bit more.
Many R use cases require multiple complex calculations in a single model over large amounts of data. There are some computational limitations to CRAN R that Microsoft R Open goes a long way to solving.
Two of those limitations are:
1. CRAN R is single threaded making complex computations slow.
2. Using CRAN R data is loaded into RAM for computation so data is limited to the amount of RAM on the user’s workstation.
Microsoft R Open is multi-threaded. That is a very short sentence but it is a huge advantage as highlighted below.
I will cover off the RAM limitation in the next section.
You don’t have to use the Microsoft Open R client with Microsoft Open R. (Which is good because it is less than awesome.) You can use the client of your choice such as the command line, CRAN R Studio, Visual Studio etc and configure them to use the Microsoft R Open server to give the extra libraries and multi-threaded goodness.
Microsoft R Server
Microsoft R Server is the commercial R server from Microsoft. This introduces a number of extra packages including those that facilitate R at scale. ScaleR manages large data amounts by using RAM and disk along with the parallel processing already mentioned. DistributedR, is exactly what it sounds like. Allows Microsoft R server to be distributed on multiple nodes to increase parallelism. This is getting a bit more techo than I hoped so here is nice diagram from Microsoft to show the value.
So here is the only gotcha. If you want to run R at scale you have to pay for a Microsoft R Server license. Well that is pretty fair I reckon PLUS there is a good chance your organisation has this as part of Microsoft SQL Server licensing. *Big disclaimer about me not being a Microsoft licensing expert, this does not consist of an offer or guarantee and all that stuff.
R in Power BI
Microsoft Power BI has two ways of exploiting R. Neither of these needs Microsoft R Open server. Power BI will go for a look around and try and find a R server, could be CRAN, could be Microsoft R Open, could be other. In this case my installation of Power BI found my Microsoft R for SQL Server R server. More about that later.
The two ways to use R in Power BI are:
1. A R Script Visual within Power BI to create a R visualisation based on data in a Power BI model.
2. R Script as a data source which creates a data frame within Power BI with R applied.
So even in Power BI Microsoft do not lock you in Microsoft R Open. Suspiciously un-corporate like. How about Microsoft Azure?
Microsoft Azure Machine Learning
Azure ML has a large number of statistical and predictive functions built in. Azure ML also allows R and Python to included in the model but Microsoft do restrict the R functions that can be used in Azure ML as some could be damaging in a cloud environment.
Other than the simple drag and drop interface the big advantage of Azure ML is that it is incredibly simple to make the machine learning ‘experiment’ a web service that can then be used in applications and website. This could be called ‘operationalising data science’. Outside of Azure ML creating a web service, or operationalising, traditional R or Python can be a bit difficult.
Here again users can choose to run their R script using CRAN R server of Microsoft Open R server.
SQL Server R Services
SQL Server R Services is very exciting. It allows R to be used within a SQL Server Stored Procedure. So an R expert can create some code, send it via email to a DBA and the DBA pastes that into a SQL Server stored procedure. This could be called ‘productionising data science’. This method can be used as part of standard data flows within a database or data warehouse. When new data comes in the R stored procedure is run and the output is updated.
As a data and visualisation person I find the graphs available in R like BI Back to the Future. Very 1980’s. But I get it, R about important sciencey things, this is not a beauty contest. Now we can have both. Have data available in SQL Server that has already had R applied and display using your visualisation and/or reporting tool of choice.
Where & How
Desktop with Microsoft R Open.
Cloud with Azure ML, Azure App Service (a web service created in Azure ML or other), Power BI, Azure VM Running SQL Server, Azure Machine Learning VM.
On premise with SQL Server R server, or Microsoft R Server running on traditional single server hardware or in a cluster.
Scale beyond RAM
Microsoft Open R — Open source, use all your cores for processing, pass on the client.
Microsoft R Server — Go parallel and scale up your R as large as you want.
Power BI — Good for ad-hoc use of R and display R graphs as part of a dashboard.
Azure ML — Good for exploiting Machine Learning in other applications via a web service.
SQL Server R Services — Great for embedding R into a repeating business processes within a SQL Server environment.
I’m David Myall. Analytics Practice Manager at DWS.