To Tree or Not to Tree? That Is The Decision.

Andre Williams
Analytics Vidhya
Published in
6 min readMar 7, 2020
Source:https://fineartamerica.com/featured/colorful-tree-painting-brahaman-dhumsi.html

What if I asked you to name an animal that has feathers and cannot fly. What if I asked you to predict if whether Elon Musk would’ve survived the titanic? How sure are you that you’ll provide the right answer? And what steps do you perform in your head to calculate the answer?

My guess is that you probably come up with a solution by asking other leading questions that then help you come to your answer. Congratulations! You’re actually on the verge of creating and implementing your own decision tree algorithm in your head. This methodology is very similar conceptually on how decision trees work. In fact, we use decision trees every day of our lives to make meaningful judgments in our day to day livelihood.

Decision Trees

A decision tree is a versatile machine learning algorithm that can be used for classification and regression tasks. A decision tree is one type of supervised machine learning algorithm used for graphical representation of all the possible solutions to a decision based on certain conditions. The idea behind a decision tree comes from the computer science fundamental data structure called a tree. A tree has a root node and each root node may have zero or many child nodes. The maximum number of children of a single node and the maximum depth of children are limited in some cases by the exact type of data represented by the tree.

Properties

  • Decision trees are a type of classification algorithm for supervised learning
  • Decision trees build classification models in the form of a tree structure
  • Each branch node represents a choice between a number of alternatives and each leaf node represents a decision.

A common example of a tree is an XML document. “The top-level document element is the root node, and each tag found within that is a child. Each of those tags may have children, and so on. At each node, the type of tag, and any attributes, constitutes the data for that node.” In such a tree, the hierarchy and order of the nodes are well defined, and an important part of the data itself. Another good example is the outline of a paper. The entire outline itself is a root node containing each of the top-level bullet points, each of which may contain one or more sub-bullets, and so on. The file storage system on most disks is also a tree structure.

Decision Trees

Why the name Decision Tree?

Well, it starts at the root node and then branches off to a number of various solutions just like an actual tree. The decision tree can grow bigger and bigger with new solutions and possibilities that carry certain weights just like how real branches in a tree might weigh differently than another branch.

Source: https://www.slideshare.net/jaseelashajahan/decision-trees-91553243

Examples of Modern Day Decision Trees

Image result for decision trees
Source: https://mc.ai/machine-learning-algorithms-decision-trees/
  1. When on the phone with an international bank or telephone service, the operator will say, “Press 1 for English, press 2 for Spanish, press 3 for French, press 4 for Mandarin, and press # to repeat these options.” Then follow up questions will be asked based on a person’s reason for calling. The goal of the company isn’t to bore the customer but to help redirect them to the right person in the right department to help customer satisfaction.
Source: https://nuacom.ie/call-routing-for-business-phone-systems/

2. Corporate Structures: “Corporate structures also lend themselves well to trees. In a classical management hierarchy, a President may have one or more vice presidents, each of whom is in charge of several managers, each of whom presides over several employees.”

Decision Trees vs Binary Search Trees

A binary search tree is a fundamental data structure that’s useful for storing data that can easily be looked up later. The beauty of the data structure is that it has an average case lookup time of log n time. At each node in the binary tree, you’re asking whether the number you’re looking for is greater than (higher) or less than (lower) the number you’re looking for until you find the item you’re looking for.

In contrast, a binary decision tree that maps an input space of data to an output space of classes in machine learning. For example, if a company wanted to know if an online user was going to buy their product or not. They would generate data on screen time, click time, how expensive are the items they’re looking at, are they a returning customer, etc. The output boils down to a simple yes or no.

Each node in the decision tree asks a binary question about the data. Is the product on sale? How many more are in stock? Based on the answer for each question you will end up traversing down to the bottom leaf that will either end with the customer buying or the customer not buying.

Photo by Jens Lelie on Unsplash

Ultimately, the binary decision tree is a tool to make higher-level decisions with mitigated human bias. The algorithm asks a serious of questions to give you an answer.

Understanding Decision Trees

Photo by Jens Lelie on Unsplash

The beauty of decision trees is that they’re easy to understand because of their graphical representation. Decision trees are also easier to clean and are not influenced by outliers and missing values in a dataset. Decision trees can also hold both numerical and categorical data. Within the decision tree methodology, there are several different algorithms that one could use. In this article, the focus will be on the CART: Classification And Regression Tree Algorithm.

The Implementation of A Decision Tree

Enough chatter and theory. I’m a proponent of learning by doing. The world belongs to those who take action! Let's build a decision tree to predict the likelihood of someone surviving the titanic given the titanic training data.

Import Our Dependencies

Missing Value Identification

Handling Categorical Data

Training Results

Looks like our Model is 79% accurate with a 79% precision score. Let’s do some parameter tuning and visualize our model.

Parameter Tuning

Classification Analysis Results

Visualize Tree With Graph

Visualize the Tree

Make Predictions on Our Testing Dataset

Let's test our model by applying it to our test dataset!

Conclusion

Congratulations on implementing your own decision tree on the titanic dataset! Decision trees can also be used for more serious tasks. Companies use them every day to help make informative data-driven decisions; hence, people interact with decision tree algorithms every day. There are many different applications for decision trees mentioned in this article like predicting the weather, identifying an animal, or predicting the likelihood of someone dying on a ship like our titanic example in this article. I hope the practical implementation of a decision tree and its real-world applications serves you well for applying the concept. For a more theoretical explanation please visit some sources listed below.

Theoretical Explanations for Decision Trees

Sources For Article:

--

--

Andre Williams
Analytics Vidhya

Andre is a data scientist in the Bay Area who loves sharing content and making the complex simple.