What is the use of decision tree in machine learning.?

digistackedu academy
3 min readMar 17, 2022

--

Best Data Science Course | Best Machine Learning Course

A decision tree is a fundamental algorithm in machine learning and professionals use this algorithm to make statistical decisions for the clients. I hope, you have a basic understanding of what is machine learning. a decision tree is a predictive modeling technique that can solve regression and classification problems. let us understand the difference between regression and classification.

In the regression technique, we find the future numeric value based on the past values, for example, you have a data set where you want to find out the price of a house based on the number of bedrooms and area of the house. here price is a dependent variable and the number of bedrooms and area of the house are independent.

y ( price) =m1* (number of bedrooms) + m2*( area of house) + b

Classification based models are also used to make predictions but the major difference between regression and classification is the type of the “y” or dependent variable, here y variable is always categorical, for example, you want to predict that a loan for an employee will be approved or not based on the credit score and income.

How to decide which feature should be located at the root node.

The fundamental idea to construct a decision tree is, we pass all the data into a decision tree model than this model will calculate which is the most valuable feature or a column is present in the data set using the entropy and information gain formula. for example, in the above picture employed column is having the highest information gain and this is why we place this column at the root of a tree.

Entropy is the randomness or impurity in the data set, when you load all the data in the decision tree model then impurity or entropy is very high at this time. decision tree further divides the data into two parts so that it can reduce the entropy in the data set and can be used for predictions. information gain is the entropy reduction formula and it is a measure of the decrease in the entropy.

How to divide a tree.?

we already selected our root node based on the highest information gain, now say the employed column has two types of values “yes” or “no” so we make 2 branches of this tree first “no” on the left side and “yes” at the right side and further check the information gain of the remaining columns based on the “yes” and “no” respectively and we found that credit score column has the highest information gain with respect to “no” so credit score will be part of “no”, this is the way we divide our tree until we got the leaf node, a leaf node is a final node where we got our answer or prediction.

Benefits of a decision tree

  1. A Decision tree can handle missing data or Null values.
  2. No need to clean the data
  3. No need to specify the distribution of the variables.
  4. There is no need to scale the data
  5. Decision trees are very simple to explain.

Disadvantage of a decision tree.

  1. If you will change the data then the decision tree can change its structure.
  2. decision tree calculation is very complex.
  3. It takes a lot of time to trained the model if the data is huge.

Conclusion:

I hope, you are able to understand the role of decision tree models in machine learning, next part is the random forest algorithm which is one of the best machine learning algorithms that work on ensemble learning and use multiple decision trees to make combined decisions for the client.

Thanks, hope this will help.
Team Digistackedu.

--

--

digistackedu academy

digistackedu is the most popular training center in India, we provide job-oriented courses for data science and digital marketing https://digistackedu.com/.