The Sigmoid function as a conceptual introduction to activation and hypothesis functions

Sean Gahagan
2 min readOct 2, 2022

--

My last note shared a few ways to make gradient descent more efficient so that ML models can learn faster, how to check if gradient descent is working, and how the normal equation can sometimes be a good shortcut for training models.

This note will focus in on just one new key idea: the Sigmoid function.

What is the Sigmoid function?

The Sigmoid function (also known as the logistic function) is a function that outputs a value between 0 and 1 depending on its input. For very positive input values, it outputs a number close to 1, and for very negative input values, it outputs a number close to 0. At 0, it outputs 0.5 (or 50% as a probability).

Where is it used?

It’s value to machine learning models comes from being able to take a set of inputs (parameters θ multiplied by their respective features x) and translate them into a number between 0 and 1, especially as a probability.

A couple important uses of this function are in the hypothesis function of:

  • Classification models — producing a probability of whether or not (1 or 0) something is the thing you want to classify. For example, is this a picture of a car or not?
  • Prediction models — producing a probability about whether or not an event will occur. For example, will a given storm turn into a hurricane before it makes landfall next week?

Here, it’s also good to know that the sigmoid function is often used as the function within the nodes of a neural network. When used this way, it’s called an activation function (related to the idea of whether or not a neuron in your brain fires). More on that in the future.

The sigmoid function is used less today than it was in the earlier days of ML (pre-2011). Today, Rectified Linear Units (“ReLUs”) are used more frequently for activation functions for many reasons, including allowing models to be less computationally expensive. Nevertheless, the sigmoid function provides a good conceptual foundation for the role of these functions.

Up next

In my next note, I’ll share more on classification models.

Past posts in this series:

  1. Towards a High-Level Understanding of Machine Learning
  2. Building Intuition around Supervised Machine Learning with Gradient Descent
  3. Helping Supervised Learning Models Learn Better & Faster

--

--