Machine Learning 101 — An Intro

Dhruv Kapoor
Analytics Vidhya
Published in
5 min readApr 7, 2020

What is Machine Learning?
Machine Learning, a subset of Artificial Intelligence, is essentially the branch of Computer Science that gives computers, i.e. machines the ability to learn, just as humans are capable of doing. Machine Learning techniques are used to find the underlying trends and patterns which would have been difficult to obtain otherwise.

“Computers are able to see, hear and learn. Welcome to the future.” — Dave Waters

Applications

Over the past few decades, Machine Learning has been applied in various disciplines. Here are some examples of Machine Learning applications which we see in our daily lives:

  • Virtual Assistants— Siri, Alexa and Google Home are virtual assistants developed by Apple, Amazon, and Google respectively. They are extremely prevalent in modern-day devices and implement voice-to-text features extremely well. Moreover, all of these assistants are now able to understand a variety of languages and commands which make them all the more appealing to us.
  • Recommendation Systems — Streaming websites such as Netflix, Prime Video, YouTube and Spotify often employ very powerful recommendation engines that provide users with new suggestions, often based on their search history and/or preferences.
  • Fraud Detection — Banks and Financial Services often set up systems that help in detecting and preventing fraud. Such systems previously made use of Rule-based Techniques which contained human-made rules. Now, companies are switching to Machine Learning Models which can learn on their own and improve with more time and data.
  • Self-driving Cars — Such vehicles have made immense progress over the last few years and they make use of Computer Vision for navigation with minimal human interference. Look no further than Tesla’s Autopilot feature here!
  • Healthcare — Hospitals, Medical Practices, and Researchers were among the first to embrace the Artificial Intelligence Wave and introduce it into the mainstream. The medical fraternity is now extensively using Machine Learning Techniques to develop medicines and vaccines (even COVID-19!), as well as to predict the onset of certain diseases.
  • Retail — Many companies often use historical data and take note of trends to offer their customers the best prices and deals, implement special schemes and to manage inventory. Amazon Go is a wonderful amalgamation of retail stores and technology at its finest.

Important Terms

Before we dive into the nuances and peculiarities of Machine Learning techniques, let us see familiarize ourselves with some common and useful terms:

  • Dataset — The data which we wish to analyze so as to extract meaningful information and discover underlying trends and patterns. It is divided into a training set and a test set.
  • Features — The measurable attributes present in a dataset that can provide insights and be used for analysis.
  • Independent Variables — Variables/attributes which are controlled inputs and whose values determine the output.
  • Dependent Variables — Also known as the target variable, these represent the outputs obtained from altering the values of independent variables.
  • Model — An algorithm we apply to our data to understand it and draw out inferences.
  • Training Set — It is the portion of data on which we train our Machine Learning model to make predictions.
  • Test Set — It is the remaining portion of data, which is not present in the Training Set, that is used to determine the correctness of the Machine Learning model we have trained.

Diving into Machine Learning

Image via Abdul Wahid

Machine Learning can be broadly classified into 3 categories:

  1. Supervised Learning: We use this technique when our data is labelled, i.e. when the outputs have integer values (Regression) or a set of categories (Classification) that are known to us. With the help of labelled data, our model is able to learn the relationship between the independent and dependent variables. As a result our model is able to make accurate predictions on never-seen-before data.
    For example, if we wish to determine whether or not an email is spam, then we can train a model using a Classification Algorithm to do so. Similarly, if we wish to determine the price of a house in a certain area based on various factors, then we can apply a Regression Algorithm to our data.
  2. Unsupervised Learning: Here the learning algorithm is given a dataset and asked to extract information which it believes is useful and important. Since, our output is not known to us beforehand our data is said to be unlabeled. A major challenge in such techniques is determining whether or not the output obtained is correct since we do not know what to expect. Dimensionality Reduction, Association Rule Mining and Clustering Algorithms are examples of this technique.
    For example, if we are given text data which contains news articles then we can use a clustering algorithm to determine the topics and keywords which are mentioned.
  3. Reinforcement Learning: It is used to solve interacting problems where the data observed up to time t is used to determine the action to be taken at time t+1. Every correct action fetches a reward, while an incorrect action leads to a penalty. Thus, the main aim is to take suitable actions in a given situation so as to obtain the maximum reward possible.
    For example, if we are training a robot to travel through a maze and our robot makes a correct move then it is rewarded. On the other hand, if the robot makes an incorrect move and gets stuck then a penalty is incurred. After multiple attempts, the robot should learn the ideal path to travel through the maze, thereby reaping the maximum reward.
Animation via ArcBotics

In my next blog post, I’ll be covering some essential libraries and tools that we will make use of in our journey.
Once again, thanks for sticking by and stay tuned for more!

--

--

Dhruv Kapoor
Analytics Vidhya

Data Science. Machine Learning. Always eager to learn.