Quick Intro to Machine Learning

5 min readDec 29, 2022

What is machine learning?

Well simply put, machine learning essentially means that a computer can learn and produce specific outputs from data on its own without explicitly being programmed to do so.

Here are two common machine learning definitions:

[Machine Learning is the] field of study that gives computers the ability to learn without being explicitly programmed. — Arthur Samuel, 1959
A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E. — Tom Mitchell, 1997

So what do these definitions mean?

I am not going to really explain what the first definition means because it’s pretty self-explanatory. But I am going to delve deeper into the second definition and provide a neat example.

Imperatively, what the second definition is telling us is that a computer is said to learn when it has to perform a task T and its performance measure P on T improves with experience E.

In simpler terms, this basically means that a computer will learn if its performance to do a task improves on a given dataset.

Here is an example:

Let’s say a program is used to identify dogs (Task T). Every time the program correctly identifies a dog, it gets a reward. If it doesn’t, it gets a penalty. The program's performance at this task can be measured by its ratio of correctly identifying dogs (Performance P). The dataset will be Experience E because the program will be learning from that dataset.

Real-world examples of machine learning:

spam filter
medical imaging
recommendations (such as movies, videos, and YouTube)
speech recognition
chatbots

Types of Machine Learning Systems

In machine learning, there are many different systems that people use for various tasks which are split into many different categories. But the 2 systems that I am going to talk about are Supervised Learning and Unsupervised Learning.

Supervised Learning

In a supervised learning system, the computer is trained on training data (data used to create a model) that has labels and features. The labels are the solutions to the data. Essentially, you are training the computer on pre-determined examples that have the solutions. In a sense, it’s sort of like looking at an answer key and learning from your syllabus from that.

Key terminologies:

labels: solutions, or the y-values
features: attributes, or the x-values
example: A set of features and labels (x, y)
training data: data that is used to create the model that will be used for testing and production

So when is this helpful?

Supervised learning can be especially helpful in classification and prediction.

Examples of classification:

Spam filters (classifying spam email)
Image classification
Assigning data into categories (apples vs. oranges)

Examples of prediction:

House prices
Grocery prices
Stocks

Unsupervised Learning

Unlike supervised learning, unsupervised learning training data is unlabeled. The computer is trying to learn without already given solutions. Due to this, there is not a plethora of tasks an unsupervised learning system can perform, but that doesn’t also mean it can’t perform important ones.

So when is this helpful?

Unsupervised learning can be especially helpful in clustering. The computer can learn the different attributes and features in the data and group them together into different clusters. Also, let’s say you wanted a demographic of your audience on YouTube, you could run a clustering algorithm to group your viewers based on their age, gender, minutes watched, etc.

Examples of clustering:

Clustering animals by kingdoms
Clustering an audience on hours watched

Overview of Machine Learning Systems:

Labels are the solutions to the dataset
Training data is the data used to train and build your model upon
Supervised models use labeled data (they have labels)
Unsupervised models use unlabeled data

Things To Look Out For

As great as machine learning systems can be for various tasks, you can fall into a deep hole if you don’t look out for the fundamental elements that constitute a good machine learning model.

Simplifying data

You want to simplify your data without losing too much information. One way you can do this is by merging correlated features. For example, it might be a good idea to merge the mileage and age of a car into one feature.

Detecting anomalies

You want to remove anomalies such as outliers, and irrelevant info, as they might hurt your machine-learning model by providing unrepresentative data.

Overfitting and Underfitting

While examining your data and before putting it into production, you should check whether your model is either overfitting the data or underfitting the data. Overfitting means generalizing and/or stereotyping the data, whereas underfitting means the opposite. Overfitting generally occurs when your model is too complex and the data is too simple. Underfitting occurs when the model is too simple and the data is too complex. In these cases, the model should be modified by changing parameters or even the system. This is troublesome because the model cannot adapt to new data.

Overall, if you look out for unrepresentative data, and overfitting data, and generally use good statistical skills such as simplifying and gathering more data, you will be able to create good machine-learning models.

Key Takeaways 🔑

Machine Learning is a vast field.
Supervised Learning uses labeled data whereas unsupervised learning uses unlabeled data
Supervised learning is generally used for predictions and classification.
Unsupervised learning is used for clustering and visualization.
Make sure the data is simplified by removing irrelevant features without losing too much information.
Have representative data.
Avoid overfitting and underfitting data.

If you liked this article make sure to check out:

AI and Where It Can Lead Us