Deploying Machine Learning Models for Ruby applications using Sklearn-porter
Data science and Machine Learning are hot topics recently in the IT industry, but I personally find it extremely exciting to see what computers can really do! Data wrangling and insights can provide a lot of fun (or is it only me that feels this way?), building Machine Learning models based on the data is amazing, but it’s only the moment you connect the two — this model and your application, that can apply the prediction to the user’s interaction, that is the most amazing experience of all!
Simple application — “How Much”
“How Much” is a Machine Learning supported project for predicting the income level class. It distinguishes between classes of “<= 50K” and “> 50K” in US Dollars. The model is trained using Adults income dataset and is integrated with really simple Rails application using Sklearn-porter project, that generated native Ruby code.
Please note, the model’s accuracy is around 83% but is based on data gathered in 1994, so will not be very accurate for Today’s answers, nevertheless, it was great experience and fun project to build!
Train the Model
The full research, engineering and choosing features for the model, and then searching for the best model and parameters is described here, but the summary of findings are described below:
- Most correlated to the target variable features were:
2. These have been selected for training the models, which accuracy was:
| Model | Best accuracy |
| Decision Tree | 82.0% |
| Random Forest | 82.5% |
| KNN | 82.1% |
3. The best performing model was based on the
RandomForest algorithm, and this one will be deployed.
Deploy the Model
Sklearn-Porter is able to generate native Ruby code, which will be used to deploy the trained model:
This would generate a class with the following interface:
That could be used as follows:
Integrate with the rest of the application
The features required for performing the prediction are expected to be passed in a specific order and the predicted value will be either
1 (for “<= 50K” and “> 50K” respectively), so for convenience - these can be wrapped in another method.
Given the prediction will be performed based on values obtained from the form submitted by the user, the helper method can look like the following:
Even though the Machine Learning model has been trained on really old data (1994 was 25 years ago, when writing this post in 2019) and will most likely not be accurate for submissions of data reflecting Today’s circumstances — this still was great exercise and an amazing experience!
Now, when retrospectively considering where I spent most of my time when developing this simple app — I had to put more effort in building the frontend of the application, rather than coming up with the Machine Learning engine powering it…