Sitemap
Panoramic

Panoramic is an enterprise SaaS company that provides the world’s most successful brands with the tools they need to ingest and model marketing data into meaningful insights.

Technical Deep Dive: Random Forests

10 min readAug 3, 2019

--

Random Forests are one of the most popular machine learning models used by data scientists today. How they are actually implemented and the variety of use cases they can be applied to are often overlooked.

While this article will focus on the inner workings of Random Forests, we’ll start off by exploring the main problems this model solves.

The Bias Variance Tradeoff

One of the central tenets of statistics and machine learning is the concept of the Bias-Variance tradeoff, which states that as a machine learning model’s complexity increases, its bias (the average difference between its prediction and the true value) tends to decrease while the variance of its predictions will increase. This means for many models, we represent its overall error as

Press enter or click to view image in full size
Decomposition of overall error into three components: 1) bias, 2) variance, and 3) irreducible error.

To illustrate some of the more technical concepts, we will utilize a Kaggle sample sales conversion dataset of Facebook ad campaigns contributed by an anonymous organization. Let’s use this dataset to see how the the breakdown of bias/variance affects the quality of our insights by exploring its relationship with model complexity. We’ll perform some initial preprocessing to arrive at our features (X) and target array (y):

--

--

Panoramic
Panoramic

Published in Panoramic

Panoramic is an enterprise SaaS company that provides the world’s most successful brands with the tools they need to ingest and model marketing data into meaningful insights.

Yu Chen
Yu Chen

Written by Yu Chen

Software engineer focused on ML and distributed systems

Responses (3)