From Y=MX+C to Deep Neural Network — Part 1.

Olaniyi Yusuf
5 min readMay 15, 2023

--

image from pixabay.com

Table of Content

  1. Introduction.
  2. The Human Brain and Neural Network.
  3. Steps in Building a Neural Network.
  4. What is an Activation Function?
  5. Conclusion.

Introduction

Have you ever wondered how machines can learn to perform complex tasks, like recognizing faces or predicting stock prices? It all starts with a basic mathematical concept — the equation of a straight line. While this may seem daunting, don’t worry — even if you’re not a math whiz, you can still understand the basics of building a machine learning model. In this write-up series, we’ll break down the process in a way that’s easy to follow and show you how you can get started with deep neural networks. Whether you’re a complete beginner or have some experience with machine learning, this article will help you gain a better understanding of this exciting field.

The Human Brain and Neural Network

A neuron in the human brain from pixabay

“I’m not trying to make a model of how the brain works. I’m looking at the brain and saying, ‘This thing works, and if we want to make something else that works, we should sort of look to it for inspiration.’” — GEOFFREY HINTON

According to the pioneer of neural networks, Geoffrey Hinton, neural networks are inspired by the way the human brain is organized, they are not exact copies of it. While the human brain is much more complicated, neural networks are made to process information by connecting layers, much like how neurons do. It’s important to keep in mind that not all neural networks are built in the same way, and some, like Convolutional Neural Networks (CNNs), are created to resemble other biological structures, such as the eyes. A neural network’s layers include:

  • Input layer: A pass-through layer that accepts data for processing.
  • Hidden layer(s): Layers that analyze and identify patterns in the input data, much like how the human brain processes information. Each hidden layer can detect increasingly complex features of the input data.
  • Output layer: The final layer of a neural network, which produces a prediction based on the input and the patterns identified by the hidden layers.

This layered structure allows neural networks to perform complex tasks such as image recognition and language processing.

Steps in Building a Neural Network

When training a neural network, two main stages are involved: forward propagation and backward propagation.

Forward propagation involves processing input data through the network’s layers, including computing a weighted sum of the inputs and adding a bias term. This computation can be represented by the linear equation y = wx + b, where “w” represents the weights, “x” represents the input, and “b” represents the bias. As the input data moves through the layers, the network identifies patterns in the data and makes a prediction or classification based on those patterns.

Backward propagation, also known as backpropagation, involves adjusting the weights and biases based on the error between the predicted output and the actual output. This error is calculated using a loss function, which is a measure of how well the network is performing on a given task. By computing the gradients of the loss function concerning the weights and biases, the network can adjust these parameters to improve its accuracy over time.

These two stages are essential for training a neural network and getting it to perform well on specific tasks such as image recognition or language processing.

Let’s deep dive into Forward Propagation:

Recall from the equation of the straight line (y=mx+c) which implies that y is dependent on x i.e. if x changes y will also change but the extent to which y will change for the same value of x is dependent on the gradient m. For example, consider the lines below

y = 4x+2 and y=2x+2

At x=5, the first line will return y as 22 while the second will return it as 12 which shows the impact of the gradient (i.e dy/dx in calculus and m in the equation of straight line)

In neural networks, our gradient(m) and y-intercept(c) are replaced with weight(w) and bias(b) respectively. To understand this better, biases are additional constants that must be added to the value of the output in the layer; these are pre-defined conditions in real life while weight is the rate at which the output value(y) changes concerning the input value(x). Therefore our first equation becomes y = wx+b.

What is an Activation Function?

In machine learning, problems are generally divided into two major categories: classification and regression. A classification problem requires the output to be true or false, while a regression problem requires the output to be a numerical value. However, the output of the formula y = wx + b is typically a numerical value, which is not ideal for a true or false prediction.

An activation function is a mathematical function used to convert a linear input to a non-linear output. Activation functions are used not only in converting linear outputs to non-linear but also in improving the performance of the model. Various types of activation functions are available, each with unique properties and advantages. Here are some of the most commonly used activation functions:

  • Sigmoid function: This function returns a value between 0 and 1, making it ideal for binary classification problems. The equation for the sigmoid function is: σ(x) = 1 / (1 + e^-x)
  • ReLU (Rectified Linear Unit): This function sets all negative values to zero and leaves all positive values unchanged. This is one of the most popular activation functions and is commonly used in deep learning. The equation for ReLU is: f(x) = max(0, x)
  • Tanh (Hyperbolic Tangent): This function returns a value between -1 and 1, making it ideal for classification problems that require more than two classes. The equation for Tanh is: tanh(x) = (e^x — e^-x) / (e^x + e^-x)
  • Leaky ReLU: This function is similar to ReLU, but it allows a small, non-zero gradient for negative input values, preventing the “dying ReLU” problem. The equation for Leaky ReLU is: f(x) = max(αx, x), where α is a small constant (e.g., 0.01)

Conclusion

In the concluding part of this write-up series, which you can find here, I explained the implementation process of building a neural network from scratch using the Python NumPy library. To further enhance your understanding, I have also provided a notebook where you can explore a practical implementation of the concepts discussed in this series. You can access the notebook here.

Thank you for taking the time to read through this series. I appreciate your support, and I would love to hear your feedback. If you found it helpful, please consider liking, commenting, sharing, and following me for more insightful content. Feel free to connect with me via LinkedIn , Twitter or Email me at olaniyiyusuf2000@gmail.comfor job recommendations, projects, or potential collaborations. I’m excited to continue this learning journey with you!

--

--