Decision Trees — Day 10 #100DaysOfMLCode
Saving Data
You have a phone with a limited data plan that only sends messages in binary. Every night you want to tell your friend where to meet you, which could be your house, the park, the movies, or your favorite restaurant. You could send 00 for your house, 01 for the park, 10 for the restaurant, or 11 for the movies. This is a waste of your data plan.
You see movies only 10% of the times you go out. There is a 40% chance you will meet at your house, 20% chance you will go to the park, and a 30% chance of going out to eat. Since you are most likely to go to your house, this can be sent as a 0. If you aren’t staying in, the next decision to make is whether or not you will get food. If yes, then you can send a 10. If not, you need to pick if you are going to the park. A yes is a 110 and a no is a 111 (to the movies!).
Most of the time, you will only be sending a 0, one bit instead of the original two for 00. Rarely will you send the three bit 111 for the movies. Over time this will day you data.
Making Choices
This graph is not linearly separable, but we can make some decisions lines that will isolate the green points from the red. The goal is to get the most information out of every decision.
We could place the first split at y=1. This will ensure that everything below 1 is green. We will have to make additional decisions about the points above that line. We could also use y =6. This split would tell us that everything about y = 6 is red. There are more data points above y=6, which means this is the better first split. Then we can split at x = 4. If a point is less than y=6 and less then x = 4, then it is green. If it is not, then we have another choice to make. Our last split can be at x=1. Now we have a decisions tree for splitting red and green data.

Udacity — Intro to Machine Learning, Sebatian and Katie
Coursera — Machine Learning, Andrew Ng
Machine Learning, Stephen Marsland (2015)
Claude Shannon — Father of the Information Age
[https://www.youtube.com/watch?v=z2Whj_nL-x8&t=203s]
Information entropy | Journey into information theory | Computer Science | Khan Academy
