First Step in Machine Learning

Bhartendu Dubey

Published in

Analytics Vidhya

5 min readJan 1, 2020

Today Machine Learning is one of the most influential, fascinating and powerful technologies around us.

Let’s start with the basic 3 words which one should be familiar with:

Data
Information
Knowledge

Defining Data, Information, Knowledge — Image courtesy : http://www.knowledge-management-tools.net

What is Machine Learning?

We are aware that there is a huge amount of data available around us, from the photos on instagram to the songs on spotify. This data keeps on increasing with time and if this data is organized as information then there’s a lot which can be done from this available data by analyzing it to predict certain values/patterns.

Machine learning as a tool helps us in turning information into knowledge. Machine learning techniques are used to automatically find the hidden patterns within complex data which could be used to predict future events and perform all kinds of complex decision making as well as forecasting.

The applications of Machine Learning are immense in day-to-day life.

Examples:

Face detection: Smartphones detecting faces while taking photos or unlocking themselves.
Fraud Detection: Banks using Machine Learning to detect Fraud transactions in real-time.
Time Series Analysis: ML is used for forecasting stock market prices.
Recommendations: Social media platforms(like Facebook, LinkedIn, etc) recommends us friends. E-commerce sites like Amazon & Flipkart recommends us products based on our browsing history.

Defining the Process

In general, Machine Learning can be used for a real life problem by following steps:

Step1: Define a ML problem & Propose a solution.

Step2: Construct Dataset.

Step3: Transform the data

Step4: Train the Model

Step5: Use model to make predictions

Hence, we can define Machine Learning as a process of training a piece of software, called a model, in order to make useful predictions which could help to in forecasting future actions/activities.

Types of Learning

we have 3 broad classifications:

Supervised Learning
Unsupervised Learning
Reinforcement Learning

Supervised Learning

Here, the goal is to learn mapping between a set of input & output. We need features & their corresponding labels in to an algorithm(this process is called training) where it determines the relationship between features & their corresponding labels.

Example: Suppose we have “weather forecast” as input & “no. of visitors to beach” as output. Here, the goal will be to learn mapping that describes relationship between temperature & no. of visitors.

Unsupervised Learning

Here, the aim is to identify meaningful patterns in the data. The model has no hints about how to categorize each piece of data & it must infer it’s own rules for doing so.

Example: sorting different color coins in to separate piles. Here, no prior information is provided about categories, so model will give different clusters based on ‘color’ property of coins.

Reinforcement Learning

Here, the machine is trained to make specific decisions. It is exposed to an environment where it trains itself continually using trial and error. The machine learns from past experience and tries to capture the best possible knowledge to make accurate decisions.

Example: shortest path problem

Examples of Various ML problems

Classification: Picking 1 of N labels.

Example: cat, dog, house, or bear)

Regression: It is a statistical tool which is used to determine the strength of the relationship between a dependent variable and a series of other changing variables (known as independent variables).

Example: amount of rainfall (independent variable) and the crop yield (dependent variable). We use one variable to forecast another variable value.

Clustering: Grouping of similar entities.

Example: Document analysis(to organize documents in to different themes)

Association Rule Learning: It infers likely association patterns in data.

Example: If you buy a toothpaste, how likely you are to buy a tooth-brush also.

Now, a question may stuck in to our minds that isn’t this similar to Automation? To answer that, let’s look for the difference between two.

How is machine learning different from automation?

Let us take an example of E-Mail to understand it. Automating flows in our mailbox needs us to define the rules. These rules act in the same manner every time. On the other hand, machine learning helps machines learn by past data/experience & change their decisions/performance accordingly. Spam detection in our mailboxes is driven by machine learning. Hence, it continues to evolve with time.

Therefore, we can say that the only relation between the two is that ML enables us with a better automation.

Tools?

There are several tools and languages being used for machine learning. The choice of the tools may differ from person to person as it depends on our need & scale of operations. Few of the most widely used are listed below:

Languages

DataBases

Visualization tools

I personally code in python & use Jupyter, Spyder, Google Colab for practice and development purposes.

Resources:

To begin your journey in Machine Learning, you can refer the ML cheat sheet.

link: https://drive.google.com/file/d/1QtAhf0Y0fzg4dUeF8ppILBPNDbrKeMDh/view?usp=drivesdk