A Beginner’s Guide for Getting Started with Machine Learning

Tanvi Penumudy
Analytics Vidhya
Published in
8 min readDec 29, 2020

If you’ve landed on this article, chances are that you’ve been wondering what Machine Learning is all about or perhaps how to get started off. Do not worry, let us get all of this covered in the next few minutes!

Image Source: Thermo Fisher Scientific — Machine Learning is a subset of AI

Lately, it seems that every time you open your browser or casually scroll across the news feed, there’s always someone writing about machine learning, its impact on human-kind or the advancements in AI. What’s all this buzz about? Have you ever wondered how technologies ranging from Virtual Assistant Solutions to self-driving cars* and robots ever function?

*For more on Self-Driving Cars, check out my previous articles —
A Beginner’s Guide to Reinforcement Learning and its Basic Implementation from Scratch, Solving the Self-Driving Cab Problem without Reinforcement Learning

Do not worry! You’re at the right place to get all your questions answered!

What is AI?

Programs that behave externally like humans? Programs that operate internally as humans do? Computational systems that behave intelligently?

Artificial intelligence (AI) is a wide-ranging branch of Computer Science concerned with building smart machines capable of performing tasks that typically require human intelligence — Source: Wikipedia.

Machine Learning is a subset of Artificial Intelligence!

Machine Learning: A Precise Introduction

“Learning is any process by which a system improves performance from experience.” ~Herbert Simon

Machine Learning is an application of Artificial Intelligence (AI) that provides systems with the ability to automatically learn and improve from experience without being explicitly programmed—Source: Expert.ai

Machine learning is nothing new. The history, in fact, dates back over 60 years to when Alan Turing created the ‘Turing test’ to determine whether a computer had real intelligence or not. The idea was — “To be called intelligent, a machine must produce responses that are indistinguishable from those of a human.”

Image Source: Zeolearn — A Brief Timeline of Machine Learning

As a human, and as a user of technology, you complete certain tasks that require you to make an important decision or classify something. For instance, when you read your inbox in the morning, you decide to mark that ‘Win a Free Car if You Click Here’ email as spam. How would a computer know to do the same thing?

Machine learning is comprised of algorithms that teach computers to perform tasks that human beings do naturally on a daily basis such as this one! The first attempts at artificial intelligence involve teaching a computer by writing a rule/set of rules.

Image Source: InterSystems — Traditional Programming vs Machine learning

If we wanted to teach a computer to make recommendations based on the weather, then we might write a rule that said — IF the weather is cloudy AND the chance of rainfall is greater than 50%, THEN suggest taking an umbrella.

The problem with this approach used in traditional expert systems, however, is that we don’t know how much confidence to place on the rule. Is it right 50% of the time? Less or More?

(You may also refer the above diagram illustrating Traditional Programming vs Machine Learning)

Types of Machine Learning

Depending on the context of the problem, they can be classified into three major categories —

Unsupervised Learning

  • Hindsight — Descriptive Analysis (What happened?)
  • No Labels
  • No Feedback
  • Find Hidden Structure in Data

Supervised Learning

  • Insight — Predictive Analysis (What will happen?)
  • Labelled Data
  • Direct Feedback
  • Predict Outcome/Future

Reinforcement Learning*

  • Foresight — Prescriptive Analysis (How can we make it happen?)
  • Decision Process
  • Reward System
  • Learn Series of Actions

Uses of Machine Learning

Unsupervised Learning

Algorithms Used — K-means Clustering, Hierarchical Clustering, Dimensionality Reduction, etc.

  • Organize Computing Clusters
  • Social Network Analysis
  • Market Segmentation
  • Astronomical Data Analysis

Related Blogs:

Supervised Learning

Algorithms Used — Regression, Classification, Decision Tree, Random Forest, KNN, SVM, Naive Bayes, etc.

  • Stock Prediction Problem
  • House Price Prediction
  • Cancer Prognosis (Malignant or Benign)
  • Weather Forecasting

Related Blogs:

Reinforcement Learning*

Algorithms Used — Q-Learning, SARSA, DQN, DDPG, etc.

  • Credit Assignment Problem
  • Game Playing
  • Robot in a Maze
  • Balancing a Pole

*Related Blogs:

Machine learning has made dramatic improvements in the past few years, but we are still very far from reaching human-level performance. Many times, it still requires the assistance of a human to complete its task.

Getting Started with Machine Learning

Requirements and Libraries

Python (latest release — 3.9.0) is the most preferred language coming to Machine Learning. Preference for Python3 over Python2 is usually observed. GPU and larger RAM is usually preferred when you’re handling huge amount of data or when a greater amount of processing is required otherwise, it isn’t.

Here’s a list of useful libraries for Machine Learning —

  • Pandas Popular Python library for data analysis. It provides many inbuilt methods for grouping, combining and filtering data.

Blog: Statistical Analysis in Python using Pandas

  • NumpyPopular Python library for matrix processing and to handle multi-dimensional arrays.

Blog: Getting Familiar with Numpy

  • MatplotlibPopular Python library for data-visualization.

Blog: Data Visualization using Python Part-I

  • Seaborn — Yet another popular Python Library for data-visualization.

Blog: Data Visualization using Python Part-II

  • Scikit-learnPopular library to get started off with Machine Learning, has inbuilt functions for most Supervised and Unsupervised algorithms.
  • ScipyIt contains different modules for optimization, linear algebra, integration and statistics, popular for Machine Learning.

Related Blogs (Using Scipy): Mathematics for Machine Learning Part-1, Mathematics for Machine Learning Part-2, Mathematics for Machine Learning Part-3

  • OpenCV — An open-source computer vision and machine learning software library for Computer Vision and Image Processing.

Blog: Computer Vision and Image Processing with OpenCV

  • TensorFlowPopular for high-performance numerical computation developed by Google. Widely employed for Deep Learning Applications.
  • KerasHas high-level neural networks API capable of running on top of TensorFlow, CNTK, or Theano.
  • PyTorchIt has an extensive choice of tools and libraries that support on Computer Vision, NLP and many Machine Learning programs.
  • Theano Library that is used to define, evaluate and optimize mathematical expressions involving multi-dimensional arrays in an efficient manner.

For more implementations using these libraries refer my GitHub Repos — https://github.com/tanvipenumudy/AI-ML-Weekly-Challenges https://github.com/tanvipenumudy/Deep-Learning-Labs https://github.com/tanvipenumudy/Winter-Internship-Internity

Here’s a list of useful IDEs/Platforms/Editors/Environments —

Getting Started with Google Colaboratory

Colaboratory or ‘Colab’ for short, is a product from Google Research. Colab allows anybody to write and execute any arbitrary Python code through any browser and is especially well suited to Machine Learning, Data Analysis and Education — Source: Google Research Docs.

Getting Started with Colab—Snapshot 1

As soon as you open Colab, a menu is popped-up that contains the following tabs:

  • Examples — Contains a number of notebooks of various examples.
  • Recent — Notebooks that you have recently worked with.
  • Google Drive — Notebook stored in your Google Drive.
  • GitHub — You can add Notebooks from your GitHub Repositories after connecting your Colab account with your GitHub account.
  • Upload — Upload from your local directory.

Otherwise, you may create a new Notebook by clicking on ‘New Python3/Python2 Notebook’

Getting Started with Colab — Snapshot 2

A new file is initialized to ‘UntitledX’, you may also change the file’s description on the top left corner of the document. Apart from that, there is a scope of changing your Runtime Type/Environment from the drop-down menu ‘Runtime’ to either None, GPU or TPU (It is None by default), in some versions, there is also a possibility to toggle between Python3 to Python2 and vice versa.

Getting Started with Colab — Snapshot 3

All Python/Jupyter Notebook commands are functional on Google Colaboratory.

By clicking on the +Code or +Text icon, you may initialize a new Code or Text Cell in the Notebook.

Commonly used Commands on Google Colab

Here’s a list of commonly used commands on Google Colaboratory —

Command for installing any python library (Only confined to that runtime)

! pip install PythonLibrary

Command to import files from a link (Only confined to that runtime)

!wget link

Command to upload any local file (Only confined to that runtime)

from google.colab import filesuploaded = files.upload()

Additional Topic: Google Sheets for Data Mining

In simple words, Data Mining is defined as a process used to extract usable data from a larger set of any raw data. It implies analysing data patterns in large batches of data using one or more software. For segmenting the data and evaluating the probability of future events, data mining uses sophisticated mathematical algorithms.

Image Source: MicroStrategy — Data Mining Explained

Data mining has applications in multiple fields, like science and research, it helps businesses get closer to their objectives and for better decision making. As an application of data mining, businesses can learn more about their customers and develop more effective strategies related to various business functions and in turn leverage resources in a more optimal and insightful manner.

Image Source: Zapier — Illustration of Google Sheets

Google Sheets is an online spreadsheet app that lets you create and format spreadsheets collaboratively with your team in real-time. Let’s now understand how Google Sheets could be utilized for Data Mining.

  • Helps in Collaboration
  • Add-ons for Text Analysis
  • Add-ons for Text Mining
  • Power Tools
  • Finding Fuzzy Matches
  • Google Analytics
  • Supports Socrative Data Mining

The inbuilt add-ons within Google Sheets provide statistics and data analysis functionality right in Google Sheets, avoiding the need to download data to a separate customized statistics application.

Instead, you have to select the variables you want to analyze, and do the entire analysis in a single go.

“Machine intelligence is the last invention that humanity will ever need to make.” ~Nick Bostrom

Additional Resources

Do check All My Blogs on various Machine Learning Algorithms/Implementations from Scratch/Mini Projects/ Libraries/ Mathematics behind Machine Learning

--

--