Machine Learning Classification

Sabiha Ali
6 min readFeb 10, 2024

--

Image from Google

Machine learning algorithms can be classified based on the level of supervision required into 4 main categories as Supervised, Unsupervised, Semi supervised and Reinforcement learning.

Supervised

When discussing supervised learning, we typically categorize it into two main types: Regression and Classification. In supervised learning, we have input data along with corresponding output labels. The goal is to establish the relationship between this input and output through the use of algorithms. Once the model is trained on this data, it can provide predictions for new inputs based on the learned patterns.

For instance, consider a dataset containing information about individuals and their job placements, including factors like age, IQ score, and whether they secured a job or not. The task of a machine learning algorithm in this scenario is to identify a mathematical relationship between the input variables (such as IQ and age) and the output variable (job placement). Subsequently, when provided with details of a new candidate, like an IQ of 67 and age 25, the algorithm can predict the likelihood of job placement.

Regression and Classification

The choice between using a regression or classification model wothin supervised learning depends on the nature of the desired output. In regression models, the goal is to predict a continuous value within a certain range. This is suitable when predicting quantities like salary based on input variables. On the other hand, classification models are used when the output falls into distinct categories. For example, determining whether a candidate will get a job (yes or no) based on their attributes.

To illustrate further, if the algorithm is tasked with analyzing images to count the number of traffic lights present, it would utilize a regression model since the output is a numeric value representing the count. Conversely, if the objective is to detect the presence or absence of a traffic light in an image, a classification model would be employed to categorize images as containing a traffic light or not.

In summary, supervised learning involves training algorithms to make predictions based on labeled data, with regression models used for predicting continuous values and classification models for categorizing data into distinct classes or categories.

Image from Google

Unsupervised

Unsupervised machine learning involves extracting insights from data without labeled outcomes. Consider a scenario with a dataset of 10,000 individuals, their IQ scores, and ages, yet lacking information on their job placements. In such cases, predicting outcomes is unfeasible due to the absence of the variable to be predicted. However, unsupervised learning methods like clustering, dimensionality reduction, anomaly detection, and association rule learning can still yield meaningful insights from the data.

Clustering

Clustering endeavors to group similar data points together, enabling the segregation of data into distinct clusters. This process aids in identifying patterns or behavioral trends within the dataset, facilitating subsequent labeling or deeper understanding of the data’s structure. Moreover, clustering is not limited to two-dimensional data and can effectively handle multi-dimensional datasets, enhancing its versatility and applicability.

Image from Google

Dimensionality reduction

Dimensionality reduction techniques aim to streamline datasets by reducing the number of features while preserving essential relationships. By eliminating redundant or irrelevant features, dimensionality reduction simplifies analysis and visualization, particularly beneficial when dealing with complex datasets with numerous input variables. For instance, merging features like the number of bedrooms and bathrooms into a single metric like square footage can streamline the dataset without sacrificing predictive accuracy.

Image from Google

Anomaly detection

Anomaly detection, as its name suggests, identifies unusual patterns or outliers within data. This capability is invaluable in various domains, such as detecting fraudulent transactions in credit card data or identifying defects in manufacturing processes. By flagging anomalies, this technique helps maintain data integrity and enables timely intervention to address potential issues.

Image from Google

Association rule

Association rule learning focuses on uncovering relationships between variables within large datasets. For instance, in a retail setting, this method might reveal patterns in the arrangement of items on shelves, highlighting associations between products that inform strategic merchandising decisions. By elucidating these relationships, association rule learning offers valuable insights into consumer behavior and preferences.

In summary, unsupervised machine learning techniques provide valuable tools for extracting insights and uncovering patterns within data, even in the absence of labeled outcomes. Whether through clustering, dimensionality reduction, anomaly detection, or association rule learning, these methods empower analysts to derive meaningful conclusions and drive informed decision-making processes.

Semi supervised learning

Semi-supervised learning occupies a unique space between supervised and unsupervised learning paradigms. In semi-supervised learning, only a fraction of the data is labeled, while the majority remains unlabeled. This approach harnesses both labeled and unlabeled data to train models, enabling them to make predictions or identify patterns more effectively.

The fundamental premise driving semi-supervised learning is the scarcity and expense of labeled data. Annotated data often necessitates significant human effort and time, contrasting with the abundance and accessibility of unlabeled data. By leveraging both labeled and unlabeled samples, semi-supervised learning optimizes the utility of available resources, enhancing model performance.

Consider a practical example: imagine organizing a collection of family photos on a computer. Initially, these images are unlabeled. As you identify individuals in the pictures — let’s say, recognizing your dad — the system starts to categorize them accordingly. By iteratively assigning labels based on user input, the system refines its clustering algorithm, grouping similar images together. This iterative process embodies the essence of semi-supervised learning, as it utilizes both labeled (identified individuals) and unlabeled (unidentified individuals) data to improve the organization and categorization of the photo collection.

In summary, semi-supervised learning represents a strategic compromise between the resource-intensive nature of supervised learning and the exploratory potential of unsupervised learning. By capitalizing on the combination of labeled and unlabeled data, this approach maximizes the efficiency and effectiveness of machine learning models, enabling them to uncover meaningful insights and make accurate predictions in various applications.

Image from Google

Reinforcement Learning

Reinforcement learning represents a paradigm shift in machine learning where traditional input-output datasets are replaced by an interactive framework where an agent learns to make decisions through direct interaction with an environment. In this setup, the agent is analogous to an individual navigating a new city without prior knowledge of its workings. Just as a newcomer to a city learns through trial and error, the agent learns by experimenting with different actions and observing the outcomes.

The crux of reinforcement learning lies in the feedback mechanism provided to the agent. Instead of labeled data, the agent receives feedback in the form of rewards or penalties based on the outcomes of its actions. Much like how an individual in a new city gauges the success of their actions through the responses they receive from the environment, the agent adjusts its behavior to maximize cumulative rewards over time.

This process mirrors the adaptive nature of human learning, where mistakes are opportunities for growth. Through repeated interactions with the environment, the agent refines its decision-making strategies, gradually uncovering optimal policies to achieve its goals. Just as a newcomer to a city gradually learns the nuances of its streets and social dynamics, the agent acquires a deeper understanding of the environment it operates in.

Ultimately, reinforcement learning empowers machines to autonomously learn and adapt to complex environments without explicit guidance. By leveraging trial and error alongside feedback mechanisms, agents can navigate uncertain terrains, optimize resource allocation, and achieve predefined objectives. Much like the journey of exploration and discovery undertaken by a newcomer to a city, reinforcement learning embodies the essence of learning through experience and interaction.

Image from Goolgle

Stay tuned for more on Machine Learning and Cloud

Sabiha Ali, Solutions Architect, ScaleCapacity.

--

--