Data Science for developers: You have a Predictive Model… now what? Expose and Consume an R Model
Now is when the fun begins… on the previous post we created a basic predictive model, at this stage we have a functional model that would help us predict an estimated number of rentals for a given date. Now… how do we use it from our application? How can we call the ‘predict’ method from our NodeJS / C# / Python / Java / <your language here> application?
Let’s see few options
Option 1 — Re-code the R algorithm into your preferred language
You can implement the math behind the R model on your preferred language — hehe… you’re fun… no way
Option 2 — Expose the algorithm as a service
ok… doable, we can easily do it on-premises (with R Server) or in Azure, we even have a browser friendly ‘Machine Learning Studio’ that we can use to build experiments and expose them as service endpoints. This is not the case for this demo though :)
Let’s imagine that either: A — we can’t upload this data to the cloud (for any reason) or B — It would be a PIA (pain in the…) to keep that data up to date once it’s in the cloud (ETL Hell). So… let’s jump to the option 3.
Option 3 — Use the SQL Server R services
et voilá! one of the new SQL Server 2016 features is ‘R Services’, we can have the R script in the database itself, so no need to move data around, no problems with managing data bigger than your memory, and no problems with the traditional limitations of a (non-enterprise) R server. We’re going to bring the algorithm to the data :)
So let’s do it!
btw… the scripts that I reference below are part of the demo application used at this Build 2016 session, you can find the full app on GitHub, thanks to the terrific PlainConcepts team (ibonilm, pablodoval, and more! ) in collaboration with Tara Shankar and Pablo Castro.
Get the tools
you’ll need to download SQL Server 2016 (Developer edition is ok, now it’s free but you’d need to create a free DevEssentials account) and also enable R Services (just check steps 1 and 2).
Once you have R services working you will need to create a database and a table with the sample data we used on the previous post.
Creating and hosting the model
Well… it’s as easy as having the R script we created inside an stored procedure :) That stored procedure will execute the script that generates the predictive model and serialize that binary model into a table. You just need to run that stored procedure weekly or monthly to make sure that you keep your model trained with the latest data.
Apart from creating the stored procedure, you will need to create an additional table ‘Models’ with one column ‘Serialized’ : VARBINARY(MAX)
CREATE PROCEDURE TrainRentalModel AS
BEGIN
DECLARE @q NVARCHAR(MAX) = N'SELECT * FROM RentalFeatures ORDER BY Year, Month, Day'
DECLARE @s NVARCHAR(MAX) = N'
rentals = InputDataSet
rentals$FHoliday = factor(rentals$Holiday)
rentals$FSnow = factor(rentals$Snow)
rentals$FWeekDay = factor(rentals$WeekDay)
model = rpart(RentalCount ~ Month + Day + FWeekDay + FHoliday + FSnow, rentals)
serialized = data.frame(model = as.raw(serialize(model, NULL)))' BEGIN TRY
BEGIN TRANSACTION
DELETE FROM Models
INSERT INTO Models
EXEC sp_execute_external_script @language = N'R', @script = @s, @input_data_1 = @q, @output_data_1_name = N'serialized'
COMMIT
END TRY
BEGIN CATCH
PRINT 'Failed to recompute model, rolling back to previous trained model'
ROLLBACK
END CATCH
END
see?
Exposing the ‘prediction’ endpoint
…and yes! this is just another stored procedure! It will de-serialize and execute the model with the parameters fed from the application layer.
CREATE PROCEDURE PredictRentals @q NVARCHAR(MAX) AS
BEGIN
DECLARE @serialized VARBINARY(MAX) = (SELECT TOP 1 Serialized FROM Models)
DECLARE @s NVARCHAR(MAX) = N'
rentals = InputDataSet
rentals$FHoliday = factor(rentals$Holiday)
rentals$FSnow = factor(rentals$Snow)
rentals$FWeekDay = factor(rentals$WeekDay)
model = unserialize(as.raw(m))
p = predict(model, rentals)
OutputDataSet = data.frame(p)'
EXEC sp_execute_external_script @language = N'R', @script = @s, @input_data_1 = @q, @params = N'@m VARBINARY(MAX)', @m = @serialized
WITH RESULT SETS ((Predicted FLOAT))
END
This is what we see if we execute the stored procedure directly from the management studio or Visual Studio. It predicts 638.
Connecting the app to the database
Easy! now you just need to connect your app to the database and call a stored procedure, that should be straightforward. Here you can see how it’s done from the .NET application used for the session. It creates a SqlCommand to invoke the stored procedure and sends a string with a query as the parameter.
string sqlInput =
string.Format("SELECT CONVERT(INT, {0}) AS Month, CONVERT(INT, {1}) AS Day, CONVERT(INT, {2}) AS WeekDay, CONVERT(BIT, {3}) AS Snow, CONVERT(BIT, {4}) AS Holiday",
date.Month,
date.Day,
1 + (int)date.DayOfWeek,
snowedDayBefore ? 1 : 0,
holiday ? 1 : 0);
SqlCommand cmd = new SqlCommand();
cmd.CommandText = "PredictRentals";
cmd.CommandType = System.Data.CommandType.StoredProcedure; cmd.Parameters.Add(new SqlParameter("@q", sqlInput)); object r = await ExecuteSqlServerCommandAsync(cmd);
done!
Now every time that the function is called you will be able to show an estimated prediction for your users, you would just need to invest some time on keeping the model up to date.
Happy Hacking!
.ds