Sigmoid as my new “42”?

Sabrina Palis
Udacity PyTorch Challengers
3 min readDec 19, 2018

--

A PyTorch Scholarship Challenge student quip

Do you have a favorite number? I sure do. A whole lot of them. I like those that sound funny like an old French movie — 33, or those that used to make you feel wealthy just pronouncing them — 1 000 000, and those that make you feel lucky like 13 or 8. And then there was that one number, somewhere between magic and geekiness, in The HitchHiker’s Guide to the Galaxy:

“Forty-two!” yelled Loonquawl. “Is that all you’ve got to show for seven and a half million years’ work?”

“I checked it very thoroughly,” said the computer, “and that quite definitely is the answer. I think the problem, to be quite honest with you, is that you’ve never actually known what the question is.”

“42”, according to Deep Thought, is the one number from which all meaning (“the meaning of life, the universe, and everything”) could be derived. It doesn’t seem grand or special, quite bathetic really. You could put it on a T-shirt and nobody would get scared. Disturbingly enough, in the nineties, astronomers in Cambridge who were carrying out measurements of the Hubble constant — that is to say how quickly objects in the universe are flying apart, for sometime considered that the average of their measures was 42.

I really loved “42” until I was selected to the PyTorch Scholarship challenge and all my ideas about numbers got shattered. Learning how to build deep neural networks with PyTorch blew my mind. The way to everything is not a number but a function. Bye bye “42”, it’s all sigmoid baby!

So what’s sigmoid?

The sigmoid function, also called the sigmoidal curve or logistic function, is defined as:

The sigmoid of x is 1 over 1 plus the exponential of negative x.

The sigmoid of x is 1 over 1 plus the exponential of negative x.

During the PyTorch Scholarship challenge course, we learnt how replacing the activation function of the perceptron from a step function to a sigmoid function changed the output from discrete to continuous. While the perceptron with a step function was outputting discrete values, returning one or zero values, the perceptron with the sigmoid activation function returned a smooth continuous range of values from 0 to 1, providing us with probabilities.

The sigmoid function is not just one lesson topic studied among others while learning about perceptrons. It actually played an important role in Neural Networks research by allowing to prove the Universal Approximation Theorem.

Three works published in 1989 by Ken-Ichi Funahashi, Kurt Hornik, and George Cybenko proved that the multilayer feedforward networks are universal approximators performing the superpositions of sigmoidal functions to approximate a given map with finite precision. (Efe 2011)

In 1991, Kurt Hornik explained that it is not the specific function chosen but rather the multilayer feedforward architecture itself which makes neural networks universal approximators. Basically if you have a function in the form of a list of inputs and outputs, there is a neural network that, given those inputs, will approximate those outputs with great precision. And it doesn’t have to be a sigmoid function.

Other functions include softmax, tanh, and the most used activation function in the world ReLu, which stands for Rectified Linear Unit.

But since I started my Neural Networks learning journey with the use of the sigmoid and its pretty derivative, it is the one function that will remain my new “42”.

--

--