An Introduction to Classification Models

Sean Gahagan
3 min readOct 8, 2022

--

My last note looked at the sigmoid function, as an important way for models to turn inputs into an output between 1 and 0 for things like prediction and classification.

This week, we’ll look at classification models.

Classification: “What’s that thing?”

Classification models aim to determine what something is (which class of thing is it?).

To show how they work, we’ll look at a simple example with 2 features of the thing we’re trying to classify (x1 and x2) and 3 classes/types of things that it could be (class #1, class #2, or class #3).

For each of these classes, our classification model will have three different hypothesis functions (one for each class) that will give a probability that something is that function’s class.

The first hypothesis function is calculating a probability that something is class #1, the second hypothesis function is calculating a probability that something is class #2, and the third hypothesis function is calculating a probability that something is class #3.

Each of these functions is then trained separately to predict if something belongs to its specific class of things.

When each hypothesis function is trained, we can use our set of hypothesis functions to predict which class of things a new thing belongs to.

To do this:

  1. Each function takes the features of the new thing and outputs a probability that it belongs to its specific class.
  2. Whichever hypothesis function gives it the highest probability is the class that our classification model will select as our prediction for which class the new thing belongs to.

Let’s say hypothesis function #1 calculates a 41% probability that the new thing belongs to class #1, hypothesis function #2 calculates a 73% probability that the new thing belongs to class #2, and hypothesis function #3 calculates a 22% probability that the new thing belongs to class #3. Because hypothesis function #2 calculated the highest probability based on the features of the new thing, our model will predict that the new thing is class #2 type of thing.

Recall and Precision

In developing classification models, two key metrics are recall and precision.

Recall is the number of true positives divided by the number of all positives (false negatives + true positives). In other words, recall measures what percentage of a population of a certain type of thing is your model identifying as that type of thing? You can imagine situations where you would want a high recall metric, like classifying faulty parts in aircraft manufacturing.

Precision is the number of true positives divided by the number of true positives + false positives. In other words, precision measures what percentage of the things your model classified as belonging to a certain class actually belong to that class?

Both metrics are usually important. If you only focus on recall, you can just make a model that always outputs a 100% probability, and it will never miss a faulty aircraft part, but you will also never build an airplane, because every part will be classified as faulty.

Classification models with more inputs and outputs

In our illustrative example above, we only used 2 features in our classification model, but (and you probably saw this coming) classification models in machine learning can use an infinite number of features.

Lastly, in our example, our model used distinct hypothesis functions for each class of thing, and each function only gave one output (i.e., the probability that something is that function’s specific class of thing), but later on when we look at neural networks, you’ll see that a neural network can have multiple outputs, and so it can do the job of multiple separate functions by having one output for class #1, another output for class #2, and so on.

Up Next:

My next note will look at the problem of overfitting machine learning models and a best practice to avoid it.

Past Posts in this Series:

  1. Towards a High-Level Understanding of Machine Learning
  2. Building Intuition around Supervised Machine Learning with Gradient Descent
  3. Helping Supervised Learning Models Learn Better & Faster
  4. The Sigmoid function as a conceptual introduction to activation and hypothesis functions

--

--