MSc Data Science Lecture Notes. L1: Introduction to Machine Learning

Ivan Reznikov, PhD
6 min readJan 21, 2024

--

My name is Dr. Ivan Reznikov.
I’m teaching MSc Data Science at Middlesex University Dubai.
This article is part of the series with my brief lecture notes.

· Introduction to Machine Learning
· Terminology
· Machine Learning Types
Supervised learning
Unsupervised learning
Reinforcement learning
· Applications

Introduction to Machine Learning

Machine learning (ML) is reshaping our world.
Let’s take a short journey down the memory road.

History of AI in Healthcare

In healthcare 30 years ago (early 1990s), medical diagnosis was primarily based on doctors’ experience and knowledge; medical imaging analysis was manual with long diagnosis times. About 20 years ago (early 2000s), the early stages of computer-aided diagnosis started to pick up slowly. Initial use of electronic health records was onboarded but with limited analytical capabilities. Ten years ago (early 2010s), we significantly increased the use of electronic health records with some analytics. The beginnings of artificial intelligence (AI) in diagnostic imaging were developed but have yet to be widespread. Today, AI-driven diagnostics can identify patterns in data beyond human capability, improving accuracy in areas like cancer detection, and AI algorithms analyze medical images more quickly and accurately than human radiologists in some cases.

A similar situation can be seen in other industries.

But is machine learning different from traditional programming?
In conventional programming, we feed rules and data into a computer for answers. If specific case A happens, act according to scenario A; otherwise, scenario B should be applied.

In contrast, machine learning uses data and answers to discover the rules. Previously, if case A happened in certain conditions — case A was applied. The score to implement case A is 0.92.

The beauty of machine learning lies in its adaptability. Unlike static code, machine learning models “evolve” as they consume more and more data. For example, ML is used in predictive maintenance for industries, where algorithms analyze data from machinery to predict failures before they occur, significantly reducing downtime and maintenance costs.

Terminology

The data sector has a sure mess: statistics, data science, machine learning, artificial intelligence, etc. The term “machine learning” represents algorithms that make “machine learn” on its own from data.
The term “data science” primarily relates to the “science of dealing with data.” For sure, these terms intersect, but they have differences.

Machine Learning Types

There are numerous machine learning algorithms. Most of the time, they are structured into a structure similar to the following:

  • Supervised learning (Classification, Regression)
  • Unsupervised learning (Clustering, Dimensionality Reduction)
  • Reinforcement learning (Model-based, Model-free)
Types of Machine Learning Algorithms

Before we look into the types closer, let’s take a minute to look into data that can be used for machine learning model training.

Data is most often either “labeled” or “unlabeled.” Label usually refers to the part of the data that denotes the outcome or target variable. For example, it might be price, time, yes/no, some category, etc. If the data doesn’t contain a label, such data is called unlabeled (duh :).

Labels in Data

In the table above, which can represent a slice of user session analysis, a possible target column can be “Buy Action” — whether the user will buy an item or not. So, this data can be used in both supervised and unsupervised learning. But imagine the table without this column — there would be nothing proper to predict. As the label column is absent, only unsupervised learning can be performed.

Supervised learning

Regression and Classification

Supervised learning is the cornerstone of machine learning. Most of the course, we’ll be covering various supervised ML algorithms.
A supervised learning model involves training a model on a labeled dataset. For example, in medical diagnostics, algorithms may be trained with images labeled “healthy” or “diseased” (the labels are 0 and 1, but we’ll cover that later). The algorithm learns to identify disease patterns as a child learns with examples and corrections.
Such tasks, when you need to identify a category from 2+ classes (options, categories), are called classification. There are also tasks when you need to predict a continuous numeric value — price, time, distance, etc. A classic example would be to predict the selling price of houses based on various features.

Unsupervised learning

Unsupervised learning explores the unknown, finding patterns in data without predefined labels. It’s like an explorer charting unmarked territories. A captivating example is market segmentation in business. By analyzing customer data without pre-set categories, unsupervised algorithms can uncover natural groupings based on purchasing behavior, driving targeted marketing strategies.

The two main unsupervised tasks are clustering and dimensionality reduction. We will cover these topics in more depth as the course progresses, but let’s get a brief introduction.

Clustering is a type of unsupervised learning technique used to group similar data points. It is widely used for pattern discovery, identifying categories in datasets where the categories are not previously known. There are multiple clustering algorithms, but their common goal is to group data so that data points within a cluster are more similar to each other than with other clusters.

Labeled and Unlabeled Data

Dimensionality Reduction is a technique in machine learning to reduce the number of columns or features in the data. Not all columns in an ordinary table are important, and some features might be combined. Imagine having a table with columns such as price, price_in_alternative_currency, height_in_foot, height_in_meters, etc. Most probably, all these columns can be combined in 1 (yes, 1, not 2 as it might look at first glance) column to reduce noise and number of dimensions.
Imagine dimensionality reduction as an archiving process.
Autoencoding is another popular method, often used for dimensionality reduction. The key is that the compressed representation in autoencoding is of lower dimensionality than the input data, forcing the autoencoder to capture the most essential features.

Autoencoding

Reinforcement learning

Reinforcement learning (RL) is an advanced topic of machine learning. RL teaches algorithms to make a sequence of decisions by rewarding desired outcomes. A striking example is its application in robotics, where robots learn complex tasks like navigating through an obstacle course, adapting and improving with each attempt, akin to a child learning to walk.

Applications

Machine learning’s real-world applications are vast and varied. Decision trees are employed in finance to assess credit risk by analyzing customer data. Convolutional Neural Network solutions are widely applied in medical imaging for early detection of diseases like cancer. E-commerce platforms use recommender systems, offering personalized product recommendations to customers, thereby increasing sales and customer satisfaction.

In the realm of the recent ChatGPT-related events, creativity in machine learning is pushing boundaries. Algorithms can now write texts, compose music, or create art, blurring the lines between human and machine creativity.

This is the end of the lecture and part 1/24 of the MSc Data Science course.

Check out the next Lecture 2, regarding statistics:

and Lecture 3 regarding Data Structures and Algorithms

Clap and follow me, as this motivates me to write new parts and articles :) Plus, you’ll get notified when the new part will be published.

--

--