MSc Notes: Machine Learning

Saurabh Pandey
4 min readAug 16, 2022

What comes to mind when you hear the term “machine learning”? Likely a humanoid robot capable of multitasking, as is typically portrayed by cinema and other mass media channels. While the future (apparently) exists for this to be the case, fortunately or unfortunately, we are not entirely there yet. The ML research community and funding agencies have been actively investing in both task-specific and general-purpose AI r&d and applications, particularly over the last decade.

Note: Via this and related articles in this series, I hope to record and share my learning (obtained recently during and after pursuing Data Science with Business Programme at the University of Exeter, UK) in a simplified manner so that you could use this resource to quickly refer to or grasp concepts of varying complexity. This may be of interest to you if you are a ML/DL practitioner or researcher who, like me, requires a reliable go-to place for quick reference every now and then, or if you are a student attempting to develop a solid intuition of relevant concepts in this field.

WHAT is Machine Learning?

The science and art of programming machines/computers to learn from data is machine learning. By learning from data rather than explicitly defined rules, machines are able to get much better at a given task. Here is my personal favourite definition from the past:

A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E. (Tom Mitchell, 1997)

The spam filter, which has improved the lives of hundreds of millions of people, is arguably the first ML application to truly take off in the 1990s and it has trained so well that you hardly ever need to mark emails as spam anymore. It was followed by countless ML apps, which are now discreetly powering countless features and products that we use on a daily basis, such as voice search and smarter recommendations etc.

WHY and WHEN to use Machine Learning?

Below are the scenarios where employing ML over more conventional approaches (symbolic or rule-based AI programming) is a game-changer.

  • Solving High Complexity Problems: such problems have no known algorithm to effectively solve them. For instance, Speech recognition. One could hardcode an algorithm to use pitch/sound intensity as a metric to distinguish between any two words. However, considering challenges such as varying accents, noisy environments etc., such techniques will not scale. The best solution, arguably, is to create an algorithm that learns on its own given a large number of example recordings for each word.
  • Countering Evolving or Fluctuating Environments: the traditional rule-based programming cannot sufficiently account for diverse, or evolving, scenarios. For instance, in contrast to software with pre-established rules, spam detection is an automatic online machine learning system that constantly learns new phrases from spam emails that users mark.
  • Uncovering Hidden Big Data Patterns: through inspection of what trained models have learned, ML can be employed to reveal unexpected relationships or new patterns, leading to a clearer understanding of the issue. Furthermore, Big data analysis using ML approaches can help identify patterns that weren’t initially obvious. This is also known as Data Mining.

HOW to apply Machine Learning (Project Pipeline)?

Below steps represent what all goes in an end-to-end Machine Learning project:

  • Big picture or Business Understanding
  • Sourcing Data
  • Exploratory Data Analysis (EDA) and Visualisations
  • Data Preparation
  • Building and Testing Model(s) ~ Algorithm(s) Selection and Training
  • Fine-tuning the Model(s)
  • Presenting Solution
  • Launching, Monitoring, and Maintaining the System

Other crucial aspects of how Machine Learning applications are developed and maintained include:

  • Tech stack(s) & MLOps
  • Team structure

Note: To delve deeper into what each bulleted topic (above and throughout the article) implies, I will be linking respective articles in due course.

Machine Learning Application Examples

ML applications and corresponding Machine Learning algorithm(s) used in each case (Please don’t worry about what each algorithm does just yet; we’ll get to that soon):

  • Autonomously classifying products on a production line using image analysis (Image Classification/CNN)
  • Detecting malignancies from brain scans (Semantic segmentation~Classification of each pixel/CNN)
  • Categorising news articles automatically (Text Classification/NLP)
  • Automatically summarising lengthy documents (Text Summarisation/NLP)
  • Chatbot or Personal Assistant (employs NLU)
  • Forecasting a company’s revenue based on a variety of performance metrics (Regression task/RNN,CNN, Transformer etc.)
  • Detecting credit card fraud (Anomaly detection)
  • Clients’ segmentation based on their purchases to facilitate unique marketing strategy for each segment (Clustering)
  • Using a clear and insightful diagram to represent a complex, high-dimensional dataset (Data visualisation involving dimensionality reduction)
  • Online product recommendation based on past purchases (Recommender System)
  • Intelligent game bot (Reinforcement Learning)

Machine Learning Problem Categories

  • Supervised Learning (Classification, Regression)
  • Unsupervised, Semi-Supervised, and Self-Supervised Learning (Clustering, Dimensionality Reduction, Association Rule Mining, Anomaly & Novelty Detection using Gaussian Mixture; Autoencoders, VAE, GANs)
  • Reinforcement Learning
  • Batch Learning
  • Online Learning
  • Instance-based Learning
  • Model-based Learning

The categorisation shown above is not exclusive. For instance, a deep learning neural network model trained on several samples of pictures and their corresponding product class can be used to automatically categorise products on a production line using image analysis. Thus, this represents an online, supervised learning system that is model-based.

Machine Learning Challenges

As ML project pipeline typically requires choosing a learning algorithm and training on some data; intuitively the two things that can go wrong are “data” and “algorithm”.

Bad Data Examples:

  • Insufficient Training Data or Very High Dimensional Data
  • Non-representative Data or Sampling Bias or Data Mismatch
  • Poor Quality Data: with errors, outliers, and noise
  • Irrelevant Features

Bad Algorithm Examples:

  • Overfitting the Training Data
  • Underfitting the Training Data

Note: I will continue to build, aggregate, and link content in this article such that it could become the finest go to resource for anything Machine Learning related. Further, I’d love to hear from and engage with you in this journey.

--

--

Saurabh Pandey

Data Science and Business graduate. Here to share value I discover or create.