Making Machine Learning RAZOR-Sharp with Ensemble Modeling

Razorthink Inc
Razorthink AI
Published in
4 min readJan 17, 2019

The rationale for ensemble modeling in artificial intelligence is as straightforward as is the concept effective.

Just like in an orchestra, Ensemble modeling combines the power of individual models to create better results

Instead of building multiple machine learning models and choosing one that seems to be the most accurate, why not combine the predictive power of each of those individual models into a composite to get even more reliable predictions?

Ensemble modeling does exactly that: it enables organizations to aggregate the predictive capabilities of hundreds or even thousands of machine learning models to accomplish a single task. It’s a powerful technique used by seasoned data scientists to increase the number of predictors, features, and variables for foretelling data-driven outcomes.

It’s also responsible for enhancing the supervised learning capabilities of a growing number of use cases, many of which involve complex, unstructured data — such as analyzing video footage for security purposes or refining text analytics.

The two most common methods for ensemble modeling are called bagging and boosting. Bagging reduces the variance of machine learning models while creating more bias; boosting increases the variance of such models while decreasing bias. Both of these techniques are instrumental in implementing effective ensemble modeling and are crucial in helping organizations maximize the output of classic machine learning.

Bagging

The most widespread example of bagging is Random Forest, a machine learning technique that evolved over the past decade in which the predictions of multiple models are synthesized to optimize accuracy. Typically these models involve trees (such as decision trees); a large group of these trees is considered a forest.

Bagging makes it possible to build & run multiple models in parallel

Bagging involves creating several independent models from essentially the same dataset. Building these models in parallel with one another enables data scientists to effect two advantages. They’re able to rapidly produce them while building them with different parts of the same dataset.

Different deep learning models, for example, could be focused on different predictors, features, or even different hypotheses. This way, each model is independent. When stacked together, their predictive output can be pooled together for a much more powerful model than each of them is on its own.

With bagging, users can also involve different types of models such as decision trees, linear regression, or binary classification models, to combine their insight into a composite benefiting from all these approaches.

Boosting

Gradient boosting has gained immense popularity within the past five years, partly in response to perceived limitations of bagging. Although bagging makes it possible to build so many models in parallel with one another, there’s the risk that some are too similar and are simply wasting data scientists’ valuable time by not significantly contributing to the overall model’s predictive capabilities.

In Boosting, models are created sequentially instead of independently

Gradient boosting is an alternative to this approach in which models are created sequentially instead of independently. With this method, data scientists build and deploy one model, see how it does, then use those results to construct the next model.

It’s unimportant whether or not the first model is good; what matters is the data gained from it. That data is used to improve the performance of the next model, then the next, and so forth. In this way, each model improves its predictions.

Again, Gradient boosting has its own drawbacks so a better alternative to it, Extreme Gradient Boosting (XGB) has come to prominence in the last couple of years because of its scalability and popular algorithms.

Tangible Business Value

The business value of these two forms of ensemble modeling is considerable, especially with noisy, unstructured data. Organizations attempting to secure ATMs via video footage, for example, can use the multiplicity of models in ensembling to concentrate on different aspects of this task. For instance, some models can concentrate on facial recognition, while others hone in on the movements of users hands near the keypad, while others focus on different aspects of body language.

Ensemble modeling also becomes critical when tackling the ever-evolving patterns of fraud carried out in organizations of all types.

The same approach is applicable to improve text analytics which involves ensemble modeling on Optical Character Recognition. Here is a step-by-step guide to a real-world use case implemented on Razorthink AI platform.

The ensemble modeling isn’t limited to only the above-mentioned use-cases. It can be used for any analytics task in which specific models should focus on only a certain part of a particular use case.

By combining the predictive power of any number of models and types of models, ensemble modeling makes machine learning’s predictions faster & more reliable.

--

--

Razorthink Inc
Razorthink AI

The First True Enterprise Grade Artificial Intelligence Platform