New to Machine Learning? Avoid these three mistakes

Common pitfalls when learning from data

James Faghmous

--

Machine learning (ML) is one of the hottest fields in data science. As soon as ML entered the mainstream through Amazon, Netflix, and Facebook people have been giddy about what they can learn from their data. However, modern machine learning (i.e. not the theoretical statistical learning that emerged in the 70s) is very much an evolving field and despite its many successes we are still learning what exactly can ML do for data practitioners. I gave a talk on this topic earlier this fall at Northwestern University and I wanted to share these cautionary tales with a wider audience.

Machine learning is a field of computer science where algorithms improve their performance at a certain task as more data are observed. To do so, algorithms select a hypothesis that best explains the data at hand with the hope that the hypothesis would generalize to future (unseen) data. Take the left panel in the figure in the header, the crosses denote the observed data projected in a two-dimensional space — in this case, house prices and their corresponding size in square meters. The blue line is the algorithm’s best hypothesis to explain the observed data. It states: “there is a linear relationship between the price and size of a house. As the house’s size increases, so does its price in…

--

--

James Faghmous

@nomadic_mind. Sometimes the difference between success and failure is the same as between = and ==. Living is in the details.