Random Forest vs Logistic Regression in Python

Asel Mendis
bitgrit Data Science Publication
6 min readJan 23, 2020

--

Random forest and logistic regression are two of the most heavily used machine learning techniques in the industry.

These two techniques are simple and powerful, making them an absolute must in every data scientist’s arsenal. It is especially important to understand logistic regression as it serves as a backend to neural networks. So if you are practicing and learning neural networks, logistic regression will be a good starting point. This tutorial follows a concise technical walkthrough complete with code explanations.

Random Forest

The random forest algorithm is a made up of an ensemble of decision trees that are independent of each other and each will predict the outcome variable using their own set of rules. Decision trees are the basic building blocks of a random forest.

To understand Random Forest, we first need to understand how decision trees work. It is a method of approximating discrete and continuous values that are robust to noise and outliers. Decision trees are represented as a set of if-then rules and can have probabilities of occurrence within the data.

Decision trees are known among data science practitioners to be highly interpretable and powerful to classify categories and use regression when linear regression fails. The reason they can be so powerful — especially for regression methods — is that they can map non-linear relationships within the data. Since most real-world data does not follow a linear…

--

--