Supervised learning with scikit-learn (Part 2)-Classification

5 min readMay 12, 2023

Part 2: Classification (Supervised Learning)

1- What is Machine Learning?

Machine learning is the science and art of giving computers the ability to learn to make decisions from data without being explicitly programmed. For example, your computer can learn to predict whether an email is spam or not spam given its content and sender. Another example: your computer can learn to cluster, say, Wikipedia entries, into different categories based on the words they contain. It could then assign any new Wikipedia article to one of the existing clusters. Notice that, in the first example, we are trying to predict a particular class label, that is, spam or not spam. In the second example, there is no such label. When there are labels present, we call it supervised learning. When there are no labels present, we call it unsupervised learning.

2. Unsupervised learning?

Unsupervised learning, in essence, is the machine learning task of uncovering hidden patterns and structures from unlabeled data. For example, a business may wish to group its customers into distinct categories based on their purchasing behavior without knowing in advance what these categories may be. This is known as clustering, one branch of unsupervised learning.

3-Reinforcement learning

There is also reinforcement learning, in which machines or software agents interact with an environment. Reinforcement agents are able to automatically figure out how to optimize their behavior given a system of rewards and punishments. Reinforcement learning draws inspiration from behavioral psychology and has applications in many fields, such as, economics, genetics, as well as game playing. In 2016, reinforcement learning was used to train Google DeepMind’s AlphaGo, which was the first computer program to beat the world champion in Go.

4. Supervised learning explanation

In supervised learning, we have several data points or samples, described using predictor variables or features and a target variable. Our data is commonly represented in a table structure such as the one you see here, in which there is a row for each data point and a column for each feature. Here, we see the iris dataset: each row represents measurements of a different flower and each column is a particular kind of measurement, like the width and length of a certain part of the flower. The aim of supervised learning is to build a model that is able to predict the target variable, here the particular species of a flower, given the predictor variables, here the physical measurements. If the target variable consists of categories, like ‘click’ or ‘no click’, ‘spam’ or ‘not spam’, or different species of flowers, we call the learning task classification. Alternatively, if the target is a continuously varying variable, for example, the price of a house, it is a regression task. In this chapter, we will focus on classification. In the following, on regression.

4. Naming conventions

A note on naming conventions: out in the wild, you will find that what we call a feature, others may call a predictor variable or independent variable, and what we call the target variable, others may call dependent variable or response variable.

5. Supervised learning more explanation

The goal of supervised learning is frequently to either automate a time-consuming or expensive manual task, such as a doctor’s diagnosis or to make predictions about the future, say whether a customer will click on an ad or not. For supervised learning, you need labeled data and there are many ways to get it: you can get historical data, which already has labels that you are interested in; you can perform experiments to get labeled data, such as A/B-testing to see how many clicks you get; or you can also crowdsourced labeling data which, like reCAPTCHA does for text recognition. In any case, the goal is to learn from data for which the right output is known, so that we can make predictions on new data for which we don’t know the output.

6. Supervised learning in Python

There are many ways to perform supervised learning in Python. In this course, we will use scikit-learn, or sklearn, one of the most popular and user-friendly machine-learning libraries for Python. It also integrates very well with the SciPy stack, including libraries such as NumPy. There are a number of other ML libraries out there, such as TensorFlow and Keras, which are well worth checking out once you got the basics down.

7- What is scikit -Learn?

What is it:In simple terms, Scikit Learn is an open source and one of the most useful libraries for machine learning in Python. It has tools for predictive data analysis [6]. Scikit learn is a library that is written in Python and built upon Scipy, Matplotlib and Numpy provides a set of useful and efficient tools for machine learning and statistical modeling including regression, classification, clustering, predictive data analysis and dimensionality reduction etc and known as the most robust and useful library for Machine Learning [6].

Background: A developer named David Cournapeau originally released scikit-learn as a student in 2007. The open source community quickly adopted it and has updated it numerous times over the years [5]. This library, which is largely written in Python, is built upon NumPy, SciPy and Matplotlib [2].The library provides a unified API (Application Programming Interface) for practitioners to ease the use of machine learning algorithms with only writing a few lines to accomplish the predictive or classification task [3].The package is written heavily in python, and it incorporates C++ libraries like LibSVM and LibLinear for support vector machines and generalized linear model implementation [3]

Features: The packages in Scikit-learn focus on modeling data.

Scikit-learn includes every core machine learning algorithm, among them vector machines, random forests, gradient boosting, k-means clustering, and DBSCAN.
It was designed to work seamlessly with NumPy and SciPy (both described below) for data cleaning, preparation, and calculation.
It has modules for loading data as well as splitting it into training and test sets.
It supports feature extraction for text and image data.

Please Follow coursesteach to see latest updates on this story

If you want to learn more about these topics: Python, Machine Learning Data Science, Statistic For Machine learning, Linear Algebra for Machine learning Computer Vision and Research

Then Login and Enroll in Coursesteach to get fantastic content in the data field.

References

1- 1_28_2020_Supervised_learning_with_Sklearn.ipynb

2-Scikit Learn Tutorial

3-Scikit-Learn: A silver bullet for basic machine learning

4–Machine Learning with scikit-learning (Datacamp)

5-Essential Python Libraries for Machine Learning and Data Science