Demystified — Machine Learning

Yash Gupta
Data Science Simplified
5 min readOct 2, 2020

Good morning everyone!

We see numbers everywhere. In any setting, a set of numbers make sense to us. Be it the price of a product or your height. We often measure these metrics to get an understanding of things. My name is Yash and I’m just starting out with learning more about statistics and numbers and finding out their underlying meaning. In this article, I’ll try to make Machine Learning easier to understand for you!
P.S. Any experience with coding/statistics/analytics/math is not required.

In complete honesty, Data can be very vast and scary and thus analysts are needed to tell us more about them. Data Science and Machine Learning has come a long way since the last decade and analytics has progressed towards becoming a Science. The beauty of Data Science is in its simplicity and the power it holds. While Data Science is self-explanatory, Machine Learning is still like the Bermuda Triangle to some.

So what really is Machine Learning? It’s the second most amazing thing in Data Science after Deep Learning! It is often mistaken with Artificial Intelligence, so let’s get to understand what it actually is.

And Off we go!

Consider that you have to get yourself a chocolate! Let’s say a milk chocolate worth $2. You are a teenager who along with 50 of your classmates want to understand if a person (teenager, in this case) will make the purchase decision (to buy or not to buy) or not based on some metrics.

A collection of chocolates put together in small cubes
Chocolates (reference picture)

You note things such as age, height, weight, BMI, pocket money, price of the chocolate, the number of friends you have, the pocket money of your best friend, number of siblings, medical history, recent purchases of chocolates, marks in an exam, followers on Instagram or how often you get your pocket money in a week etc. for yourself and all the other people you are studying (classmates, in this case).

Now you must be wondering how do my followers on Instagram affect my purchase decision? Well, we won’t really know if it affects it or not unless we try to study it and analyze it for the presence of a pattern. Data can sometimes tell us things that we never would’ve expected.

When your metrics are in place, you study it to find if it has an impact on your outcome of the purchase decision and keep the metrics that do make an impact (to avoid things that don’t really matter). You try to remove any possible unknown values and outliers or anomalies; which in simple terms would be an old person’s data entering your entries which relate to you and your friends who are possibly just teenagers.

On a given list of metrics and how their combination affects your purchase decision, you might see your best friend and predict if he/she will make the purchase based on his/her metrics and how they fit in the data you have. We then try to explain the same things to the machine and teach it that when the data is in these combinations, this is possible outcome for all of your data (which is termed as the Training Data).

The machine then tries to keep these patterns in it’s memory for any future predictions to be made. (which is the Test Data unknown to the machine)

Now to test your model, you go on to analyze what would the outcome be for the teenagers in your neighborhood based on the data that you have gathered. You might do it for 1 person... or possibly a 100 teenagers in the neighborhood after you spend a day pondering over a lot of parameters and how they fit into your analysis.

This is where Machine Learning comes into the picture. The machine revisits its memory and takes the patterns that it already knows of and fits in the new data and tries predicting the outcomes on the new data. The best part? It can do so for as many new entries as you give it within a matter of seconds! It will find the possibility of someone purchasing or not purchasing the chocolate based on what it learnt about teenagers from you and your friends and their lifestyles (Training Data).

Machine Learning uses Supervised or Unsupervised Learning Algorithms, which in other words would mean, the machine can try to club together people based on the metrics and the outcome you specify (Supervised Learning) and also on the basis of a combination of numbers where the metrics (age, height, weight in the example) or the Outcome (the purchase decision) are not specified (Unsupervised Learning).

Image only for Representational Purposes.

This happens over a lot of possible algorithms that vary over predicting continuous variables like price, height etc. and categorical variables like purchase decision or yes/no questions etc. Some well known ML algorithms include:
Linear Regression/ Multilinear Regression, Logistic Regression, Clustering, Decision Trees, Random Forests, Gradient Boost, Extreme Gradient Boost, Support Vector Machines, Principal Component Analysis, Recommender Systems, Natural Language Processing etc.

The understanding of which then leads you to make a Neural Network or a Deep Learning model (which serves as the basis for AI).

The entire process of ML mainly depends on the kind of nutrients you feed to the model for it’s fitness or rather “Clean Data” which constitutes of 70% of the work in predictive analytics. The performance of the ML model depends on factors like accuracy and precision to find if correctly predicted the Positive/Negative outcomes in a set of data.

A set of series and movie recommendations on Netflix.
Netflix Movie Recommendations (file photo)

Some places where ML is used includes Recommendations of products on Amazon, movies on Netflix etc., predicting the price of a certain commodity based on its related products, customer segmentation in an unknown market, testing samples in the pharmaceutical industry etc.

Machine Learning is the next big development in analytics after spreadsheets. It is not difficult to understand but indeed hard to master. With this under your arsenal, you can own predictive analytics.

With open source languages like Python available to us today that come with open source libraries(or in other words, Free to use) like Scikit-Learn etc. it becomes simple to use ML in our day to day life and analyze dataset and predict/forecast outcomes. We hope that you now have a generic idea as to how ML works and if it is of interest to you, do go ahead and harness its power ASAP.

Stay tuned with us as we chart out paths on how you can get into coding and demystify other concepts related to Data Science and Coding. Thank you for reading this all the way to the end.

--

--

Yash Gupta
Data Science Simplified

Lead Analyst at Lognormal Analytics and self-taught Data Scientist! Connect with me at - https://www.linkedin.com/in/yash-gupta-dss