A Beginner’s Guide for Getting Started with Machine Learning
If you’ve landed on this article, chances are that you’ve been wondering what Machine Learning is all about or perhaps how to get started off. Do not worry, let us get all of this covered in the next few minutes!
Lately, it seems that every time you open your browser or casually scroll across the news feed, there’s always someone writing about machine learning, its impact on human-kind or the advancements in AI. What’s all this buzz about? Have you ever wondered how technologies ranging from Virtual Assistant Solutions to self-driving cars* and robots ever function?
*For more on Self-Driving Cars, check out my previous articles —
A Beginner’s Guide to Reinforcement Learning and its Basic Implementation from Scratch, Solving the Self-Driving Cab Problem without Reinforcement Learning
Do not worry! You’re at the right place to get all your questions answered!
What is AI?
Programs that behave externally like humans? Programs that operate internally as humans do? Computational systems that behave intelligently?
Artificial intelligence (AI) is a wide-ranging branch of Computer Science concerned with building smart machines capable of performing tasks that typically require human intelligence — Source: Wikipedia.
Machine Learning is a subset of Artificial Intelligence!
Machine Learning: A Precise Introduction
“Learning is any process by which a system improves performance from experience.” ~Herbert Simon
Machine Learning is an application of Artificial Intelligence (AI) that provides systems with the ability to automatically learn and improve from experience without being explicitly programmed—Source: Expert.ai
Machine learning is nothing new. The history, in fact, dates back over 60 years to when Alan Turing created the ‘Turing test’ to determine whether a computer had real intelligence or not. The idea was — “To be called intelligent, a machine must produce responses that are indistinguishable from those of a human.”
As a human, and as a user of technology, you complete certain tasks that require you to make an important decision or classify something. For instance, when you read your inbox in the morning, you decide to mark that ‘Win a Free Car if You Click Here’ email as spam. How would a computer know to do the same thing?
Machine learning is comprised of algorithms that teach computers to perform tasks that human beings do naturally on a daily basis such as this one! The first attempts at artificial intelligence involve teaching a computer by writing a rule/set of rules.
If we wanted to teach a computer to make recommendations based on the weather, then we might write a rule that said — IF the weather is cloudy AND the chance of rainfall is greater than 50%, THEN suggest taking an umbrella.
The problem with this approach used in traditional expert systems, however, is that we don’t know how much confidence to place on the rule. Is it right 50% of the time? Less or More?
(You may also refer the above diagram illustrating Traditional Programming vs Machine Learning)
Types of Machine Learning
Depending on the context of the problem, they can be classified into three major categories —
Unsupervised Learning
- Hindsight — Descriptive Analysis (What happened?)
- No Labels
- No Feedback
- Find Hidden Structure in Data
Supervised Learning
- Insight — Predictive Analysis (What will happen?)
- Labelled Data
- Direct Feedback
- Predict Outcome/Future
Reinforcement Learning*
- Foresight — Prescriptive Analysis (How can we make it happen?)
- Decision Process
- Reward System
- Learn Series of Actions
Uses of Machine Learning
Unsupervised Learning
Algorithms Used — K-means Clustering, Hierarchical Clustering, Dimensionality Reduction, etc.
- Organize Computing Clusters
- Social Network Analysis
- Market Segmentation
- Astronomical Data Analysis
Related Blogs:
- Everything you need to know about K-Means Clustering
- Crime Data Pattern Analysis and Visualization using K-means Clustering
- Image Segmentation using K-means Clustering from Scratch
- Extracting Dominant Colours from an Image using K-means Clustering from Scratch
Supervised Learning
Algorithms Used — Regression, Classification, Decision Tree, Random Forest, KNN, SVM, Naive Bayes, etc.
- Stock Prediction Problem
- House Price Prediction
- Cancer Prognosis (Malignant or Benign)
- Weather Forecasting
Related Blogs:
- Everything You Need to Know About Linear Regression
- House Price Prediction using Linear Regression from Scratch
- A Comprehensive Guide to Logistic Regression
- Logistic Regression From Scratch
- Decision Trees for Dummies
- Decision Tree From Scratch
- Random Forest: Simplified
- Random Forest from Scratch
- A Beginner’s Guide to KNN and MNIST Handwritten Digits Recognition using KNN from Scratch
- Celebrity Face Recognition using KNN from Scratch
- Face Detection and Recognition using OpenCV and KNN from Scratch
- A Beginner’s Introduction to SVM
- SVM From Scratch
- A Machine Learning Roadmap to Naive Bayes
- Naive Bayes From Scratch
Reinforcement Learning*
Algorithms Used — Q-Learning, SARSA, DQN, DDPG, etc.
- Credit Assignment Problem
- Game Playing
- Robot in a Maze
- Balancing a Pole
*Related Blogs:
- A Beginner’s Guide to Reinforcement Learning and its Basic Implementation from Scratch
- Solving the Self-Driving Cab Problem without Reinforcement Learning
- Introduction to Q-Learning for the Self-Driving Cab Problem
Machine learning has made dramatic improvements in the past few years, but we are still very far from reaching human-level performance. Many times, it still requires the assistance of a human to complete its task.
Getting Started with Machine Learning
Requirements and Libraries
Python (latest release — 3.9.0) is the most preferred language coming to Machine Learning. Preference for Python3 over Python2 is usually observed. GPU and larger RAM is usually preferred when you’re handling huge amount of data or when a greater amount of processing is required otherwise, it isn’t.
Here’s a list of useful libraries for Machine Learning —
- Pandas — Popular Python library for data analysis. It provides many inbuilt methods for grouping, combining and filtering data.
Blog: Statistical Analysis in Python using Pandas
- Numpy — Popular Python library for matrix processing and to handle multi-dimensional arrays.
Blog: Getting Familiar with Numpy
- Matplotlib — Popular Python library for data-visualization.
Blog: Data Visualization using Python Part-I
- Seaborn — Yet another popular Python Library for data-visualization.
Blog: Data Visualization using Python Part-II
- Scikit-learn — Popular library to get started off with Machine Learning, has inbuilt functions for most Supervised and Unsupervised algorithms.
- Scipy—It contains different modules for optimization, linear algebra, integration and statistics, popular for Machine Learning.
Related Blogs (Using Scipy): Mathematics for Machine Learning Part-1, Mathematics for Machine Learning Part-2, Mathematics for Machine Learning Part-3
- OpenCV — An open-source computer vision and machine learning software library for Computer Vision and Image Processing.
Blog: Computer Vision and Image Processing with OpenCV
- TensorFlow — Popular for high-performance numerical computation developed by Google. Widely employed for Deep Learning Applications.
- Keras — Has high-level neural networks API capable of running on top of TensorFlow, CNTK, or Theano.
- PyTorch — It has an extensive choice of tools and libraries that support on Computer Vision, NLP and many Machine Learning programs.
- Theano — Library that is used to define, evaluate and optimize mathematical expressions involving multi-dimensional arrays in an efficient manner.
For more implementations using these libraries refer my GitHub Repos — https://github.com/tanvipenumudy/AI-ML-Weekly-Challenges https://github.com/tanvipenumudy/Deep-Learning-Labs https://github.com/tanvipenumudy/Winter-Internship-Internity
Here’s a list of useful IDEs/Platforms/Editors/Environments —
Getting Started with Google Colaboratory
Colaboratory or ‘Colab’ for short, is a product from Google Research. Colab allows anybody to write and execute any arbitrary Python code through any browser and is especially well suited to Machine Learning, Data Analysis and Education — Source: Google Research Docs.
As soon as you open Colab, a menu is popped-up that contains the following tabs:
- Examples — Contains a number of notebooks of various examples.
- Recent — Notebooks that you have recently worked with.
- Google Drive — Notebook stored in your Google Drive.
- GitHub — You can add Notebooks from your GitHub Repositories after connecting your Colab account with your GitHub account.
- Upload — Upload from your local directory.
Otherwise, you may create a new Notebook by clicking on ‘New Python3/Python2 Notebook’
A new file is initialized to ‘UntitledX’, you may also change the file’s description on the top left corner of the document. Apart from that, there is a scope of changing your Runtime Type/Environment from the drop-down menu ‘Runtime’ to either None, GPU or TPU (It is None by default), in some versions, there is also a possibility to toggle between Python3 to Python2 and vice versa.
All Python/Jupyter Notebook commands are functional on Google Colaboratory.
By clicking on the +Code or +Text icon, you may initialize a new Code or Text Cell in the Notebook.
Commonly used Commands on Google Colab
Here’s a list of commonly used commands on Google Colaboratory —
Command for installing any python library (Only confined to that runtime)
! pip install PythonLibrary
Command to import files from a link (Only confined to that runtime)
!wget link
Command to upload any local file (Only confined to that runtime)
from google.colab import filesuploaded = files.upload()
Additional Topic: Google Sheets for Data Mining
In simple words, Data Mining is defined as a process used to extract usable data from a larger set of any raw data. It implies analysing data patterns in large batches of data using one or more software. For segmenting the data and evaluating the probability of future events, data mining uses sophisticated mathematical algorithms.
Data mining has applications in multiple fields, like science and research, it helps businesses get closer to their objectives and for better decision making. As an application of data mining, businesses can learn more about their customers and develop more effective strategies related to various business functions and in turn leverage resources in a more optimal and insightful manner.
Google Sheets is an online spreadsheet app that lets you create and format spreadsheets collaboratively with your team in real-time. Let’s now understand how Google Sheets could be utilized for Data Mining.
- Helps in Collaboration
- Add-ons for Text Analysis
- Add-ons for Text Mining
- Power Tools
- Finding Fuzzy Matches
- Google Analytics
- Supports Socrative Data Mining
The inbuilt add-ons within Google Sheets provide statistics and data analysis functionality right in Google Sheets, avoiding the need to download data to a separate customized statistics application.
Instead, you have to select the variables you want to analyze, and do the entire analysis in a single go.
“Machine intelligence is the last invention that humanity will ever need to make.” ~Nick Bostrom
Additional Resources
Do check All My Blogs on various Machine Learning Algorithms/Implementations from Scratch/Mini Projects/ Libraries/ Mathematics behind Machine Learning