Classification: Basic things you need to know

Published in

WiCDS

2 min readJan 7, 2021

Data scientists are often faced with a problem that requires an automated
decision. Classification is considered the most important form of a prediction.

In statistics, classification is the problem of identifying to which of a set of categories (sub-populations) a new observation belongs, on the basis of a training set of data containing observations (or instances) whose category membership is known. Classification is considered an instance of supervised learning, i.e., learning where a training set of correctly identified observations is available. The corresponding unsupervised procedure is known as clustering and involves grouping data into categories based on some measure of inherent similarity or distance.

An algorithm that implements classification, especially in a concrete implementation, is known as a classifier. The term “classifier” sometimes also refers to the mathematical function, implemented by a classification algorithm, that maps input data to a category. Most algorithms return a probability score (propensity) of belonging to the class of interest. A sliding cutoff can then be used to convert the propensity score to a decision.

The general approach is as follows:
1. Establish a cutoff probability for the class of interest above which we
consider a record as belonging to that class.
2. Estimate (with any model) the probability that a record belongs to the class of interest.
3. If that probability is above the cutoff probability, assign the new record to the class of interest.
The higher the cutoff, the fewer records predicted as 1 — that is, belonging to the class of interest. The lower the cutoff, the more records predicted as 1.

In statistics, where classification is often done with logistic regression or a similar procedure, the properties of observations are termed explanatory variables (or independent variables, regressors, etc.), and the categories to be predicted are known as outcomes, which are considered to be possible values of the dependent variable. In machine learning, the observations are often known as instances, the explanatory variables are termed features (grouped into a feature vector), and the possible categories to be predicted are classes.

Application domains :
1.Computer vision
2.Medical imaging and medical image analysis
3.Optical character recognition
4.Video tracking
5.Drug discovery and development
6.Toxicogenomics
7.Quantitative structure-activity relationship
8.Geostatistics
9.Speech recognition
10.Handwriting recognition
11.Biometric identification
12.Biological classification
13.Statistical natural language processing
14.Document classification
15.Internet search engines
16.Credit scoring
17.Pattern recognition
18.Recommender system
19.Micro-array classification.

Types of classification in Machine Learning:

1.Binary Classification
2.Multi-Class Classification
3.Multi-Label Classification
4.Imbalanced Classification

Sources: 1.Wikipidea 2. Practical-Statistics-for-Data-scientists.pdf

Classification: Basic things you need to know

Written by pooja