Part I: Machine Learning: An Introduction

Arunank Divya Sharan
DataDreamsDragons
Published in
4 min readNov 6, 2018

Hi everyone,

This is the first post in a series of many that will soon come on anything and everything around data science. So, grab a double espresso and join me on this ride! :)

There are two quotes which have always stuck with me.

  • “Perfection is achieved not when there is nothing more to add, but when there is nothing left to take away.” — Antoine de Saint-Exupery (French Writer)
  • “80% of the effects come from 20% of the causes.” — Vilfredo Pareto (Italian Economist)

I have tried to model my every endeavour on these two principles of simplicity and prioritisation of key factors. Data Science is no different!

Keep things simple! Prioritize!

This and the following write-ups will be an effort to reach out and explain the core aspects of Machine Learning after removing all the noise and clutter. The goal is to keep these posts a light read but provide enough context for you to understand what is going on. Wherever possible, I will use data science jargons in a general context to help build an intuitive ‘feel’ for it.

So, in this journey, our goal will be to behave like a well-fitted machine learning model where neither unnecessary details are added (No Overfitting!) nor important details are left out (No Underfitting!).

Occasionally, some external links will be added at some places for those interested in going deeper. For others, please be rest assured that the content given in the posts will be sufficient to get a thorough understanding of ML and answer any interview questions.

The most frequently asked questions by anyone starting out in this field are

  • What is Machine Learning? (From now onwards, I will refer Machine Learning as ML because I am lazy!)
  • How is it different from AI? Is it AI? (This question will be addressed in a separate post. Remember, No Overfitting!)
  • Why, where and when do we use ML?

Let us tackle the first question: What is ML?

If you search long enough, you will realize that there are so many definitions, all given in different contexts and for different use cases. Some are technical, others more generic.

To keep things simple, I like this definition -

“A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E.” — Tom Mitchell (American Computer Scientist)

It is formal enough to give a taste of engineering perspective while simultaneously being generic enough to provide an intuitive ‘feel’.

The three key aspects are:

  • Task (T): This can be framed in terms of what we are looking to achieve, say, classification or regression.
  • Experience (E): This can be framed as every observation that is used to train a model and every mistake it makes. Cost function and updating weights can be said to be a ‘Learning Experience’ for the model. (Friends, Indians, Countrymen! I come to explain, not to confuse you!)
  • Performance (P): This can be framed as a metric used to assess how well our model is achieving the desired target. For a classification task, it may be accuracy or f-score. (Explanation is coming!)

The Why, When and Where of ML

Now, let us tackle the other often asked questions with respect to ML. On a generic level, these are the main conditions where it is suitable to use ML.

  • There is no mathematical solution easily available. The problem is too complex to obtain a traditional solution. So, instead of completing a PhD in mathematics and becoming a maths wizard, we use ML approach which detects the patterns and does the heavy lifting for us.
  • There is a decent amount of data present. Without a decent amount of data, it is difficult to use ML.
  • There is a pattern in data. If the data is completely at random, it may become difficult for ML algorithms to achieve any prediction power.
  • The problem has so many different variations that it is not feasible to write rules for all the different cases. ML approach helps us find patterns which can generalize better to unseen data. The ML model can adapt to changing data. Otherwise, the rule-based system may have to be written from scratch if the new and previously unseen data changes drastically.

Let us stop here and keep the post at this digestible length. It is best to avoid the trap of TL;DR!

Based on my experiences in interviews and interacting with experienced people working in the industry, the above written content is all that you will need to tackle generic interview questions discussing ML and its use cases at a meta-level. In the second post, we will start with the algorithms.

Just as a good ML algorithm and model building requires data to be cleaned and pre-processed, similarly, you need to to pre-process yourself and be prepared with the prerequisites mentioned below before you move into the next post of algorithms.

If you are really serious about learning ML, please go through:

  • Page 1 to 41 of Introduction to Statistical Learning in R.
  • It includes Chapter 1: Introduction and Chapter 2: Section 2.1, Section 2.2. It is fine if you are not able to exactly understand Section 2.2: Bias-variance Tradeoff. It will be discussed later.

See you there, soon! Ciao, ciao!

--

--