DECISION TREE AND RANDOM FOREST CLASSIFIER IN ML..

Jivanjot
3 min readJul 17, 2023

--

Decision tree classifier is a supervised machine learning algorithm which does both classification and regression tasks. It is a hierarchal tree like structure where we have nodes , leaves , root node , internal node . Here internal nodes represent features , leaves represent the result where can’t split furthermore . We also have root node from where the problem statement begins or from where the problem statement for splitting begins.

We select best attribute for splitting based on information gain or entropy which measures the randomness or disorderness within the dataset.

The point is to decrease this randomness in dataset.

Let’s understand what is meant by this .

Let’s say you and your family is deciding to order from Pizza or Burger.Out of let’s say 7 members 4 had chosen pizza and 3 had chosen burger. So, this is called uncertainty , where we cant decide because of difference of only 1 vote .

Let’s say we want to find out how much entropy had decreased from the parent node to the split , so we use the metric called information gain.

Decision tree is a powerful algorithm because it could be used for both classification and regression.But if have to increase the accuracy of model even better , we can use random forest classifier

Ensemble Learning Technique :- It relies on usage of multi-models rather than single model .It includes boosting and bagging.

Random Forest Classifier : It uses the concept of bagging which means we use the subset of training data rather than the entire dataset without replacement for different models. Output is decided by majority , in case of classification it would be mode and for regression we would take the mean of outputs.

We can understand this with example like you want to shift to different country , and you want to decide which country to shift , you would take advice of friends , family , acquaintances from the different countries , now based on different features let’s say cost of living , taxes , salary aspect , quality of living etc , and these questions asked works like decision tree and most on majority of answers , a decision is made .

Random Forest Classifier is better than single decision tree because of the accuracy .

Similarly we have boosting also , where we use the entire dataset , but we do things sequentially rather than parallel , where every decision tree learns from the mistakes of previous decision tree.

ADVANTAGES :

1 It works well with both continuous and categorical values

2 we can improve accuracy and efficiency with this

3 It could be used for classification and regression problem

DISADVANTAGES:

1 It takes a lot of training time due to multiple trees , which makes it slow

And hey , If you found this information helpful , then don't forget to share , clap and follow

Happy reading :)

--

--

Jivanjot

Chemical engineering undergrad | amateur writer | discovering my passion for writing | ..... it may take a while 💫