Deep Learning Basics — Part 7 — Feed Forward Neural Networks (FFNN)

Sasirekha Cota
6 min readDec 7, 2023

--

A Feedforward Neural Network (FNN) is a type of artificial neural network where information moves only in one direction, from the input layer through any hidden layers and finally to the output layer. This is the simplest kind of artificial neural network. There are no cycles or loops in the network. The term “feedforward” refers to the fact that the connections between the units do not form a cycle, unlike in more complex types of networks (like RNN).

This straightforward design makes them well-suited for tasks that require a simple, one-way processing of data — including pattern recognition and predictive modelling.

History and Evolution

The origins of Feedforward Neural Networks (FNNs) can be traced back to the work of Warren McCulloch and Walter Pitts in 1943. Their model, known as the McCulloch-Pitts neuron, laid the foundation for understanding how simple processing units could perform logical operations.

The concept of FNNs gained significant traction when Frank Rosenblatt introduced the Perceptron in 1957. The Perceptron (https://link.medium.com/vkOHE3WLkFb ), a single-layer network with binary inputs and outputs, could learn simple linear relationships. Perceptron, while considered an extreme learning machine, was not yet a deep learning network.

The Group Method of Data Handling (GMDH), developed by Alexey G. Ivakhnenko in 1965, was based on polynomial regression and aimed at identifying non-linear relationships between input and output variables. The variant Multilayer GMDH explicitly builds a layered architecture with hidden layers to learn complex relationships between input and output variables. The Polynomial GMDH variant utilizes polynomial activation functions in its hidden layers to achieve non-linearity. This structure closely resembles a Feedforward Neural Network (FFNN) with hidden layers.

Marvin Minsky and Seymour Papert (1969) highlighted the limitations of single-layer networks in handling non-linear relationships, which led to a decline in AI research for several years. But, their work also triggered research on multi-layer models.

In the late 1980s, FNN revival was fueled by two key developments:

1. Multi-Layer Perceptrons (MLPs): These networks introduced additional layers of neurons, allowing them to learn more complex relationships.

2. Backpropagation algorithm: Invented by David Rumelhart, Geoffrey Hinton, and Ronald J. Williams in 1986, Backpropagation provided a powerful tool for training MLPs by efficiently calculating the gradient of the error function with respect to network weights.

FNNs represent a fundamental building block of deep learning, that paved the way for more complex and powerful architectures.

Multi-layer Perceptron (MLP)

An MLP typically consists of the following layers:

· Input layer: receives the input data.

· Hidden layers: process the information and learn complex representations.

· Output layer: produces the final output based on the processed information.

A Multilayer Perceptron (MLP) consists of an input layer, one or more hidden layers, and an output layer. The MLP is fully connected, in the sense that each neuron in one layer connects (with a specific weight) to every neuron in the following layer. The number of hidden layers and the number of neurons in each layer are hyperparameters that influence the model’s capacity and complexity.

Source:https://www.researchgate.net/figure/Schematic-of-a-multilayer-perceptron-network-MLP_fig6_336308377

Each neuron in an MLP performs a non-linear activation function, often a sigmoid or ReLU function, to introduce non-linearity into the network. This allows MLPs to learn complex relationships — data that is not linearly separable — between input and output variables.

Backpropagation

Backpropagation, short for “backward propagation of errors,” is a critical component in training and refining neural network models. Its role is to adjust the network’s internal connections, known as weights, in order to minimize the “error” it makes during the learning process.

The term “back-propagating error correction” was introduced in 1962 by Frank Rosenblatt. But the current method, which includes the use of gradient descent or variants such as stochastic gradient descent, was popularized by David E. Rumelhart and others.

Backpropagation process relies on two key concepts:

1. Loss function: This function measures the discrepancy between the network’s output and the desired outcome. It essentially quantifies how “wrong” the network is in its predictions.

2. Gradient of the loss function: This represents the direction and magnitude of change required to minimize the error. It tells us how much and in what direction we need to adjust the weights to improve the network’s performance.

The gradient of the loss function, denoted as ∇L, is a vector containing the partial derivatives of the loss function (L) with respect to each of the parameters (weights and biases) in the neural network. It acts as a guide for optimizing neural networks.

It points in the direction in which the loss function increases the most (or the steepest increase). By adjusting the network parameters in the opposite direction (negative gradient), the loss function can be decreased, leading to improved model performance.

Loss functions analyze individual errors, while cost functions summarize overall performance

Each element of the gradient represents the rate of change of the loss function with respect to the corresponding parameter. In simpler terms, it tells us how much the loss will increase or decrease if we slightly change the value of that specific parameter. For higher values of error (loss), we will have larger values of the gradient (and larger change to the associated parameter).

Source:https://www.analyticsvidhya.com/blog/2023/01/gradient-descent-vs-backpropagation-whats-the-difference/

Backpropagation utilizes the chain rule of calculus to efficiently calculate this gradient. This rule allows us to propagate the error backward through the network, layer by layer, ultimately determining how each weight contributes to the overall error.

The chain rule plays a crucial role in making backpropagation a practical and efficient algorithm for training neural networks. It works as follows:

1. The loss function is viewed as a chain of nested functions, where each function represents the output of a layer in the network.

2. The chain rule provides a formula to calculate the derivative of a composite function by differentiating each inner function and multiplying the results.

3. Backpropagation uses the chain rule recursively, starting from the output layer and working backwards towards the input layer. At each layer, it calculates the derivative of the loss function with respect to the outputs of that layer, and then propagates this derivative back to the inputs of that layer, using the weights and activation functions.

This process allows the algorithm to efficiently compute the gradient for each weight, informing how much and in what direction to adjust them to reduce the overall error.

Typically, random weight initialization is used for starting FFNNs. Then, the iterative process of forward pass (data through the network), loss function calculation, and backward pass (error propagation and weight updates) continues until the network achieves a satisfactory level of accuracy or reaches a pre-defined stopping point.

FNNs today

FNNs often serve as building blocks for more complex architectures. While more complex deep learning architectures like convolutional neural networks (CNNs) and recurrent neural networks (RNNs) have emerged, FNNs are still used in real-life implementations.

FNNs remain relevant today due to its simplicity, interpretability and efficiency. FNNs are versatile and can be used for a wide range of tasks, including classification, regression etc. Some examples of real-world FNN applications:

· Spam filtering: FNNs are used to classify emails as spam or not spam.

· Fraud detection: FNNs are used to analyse financial transactions to identify fraudulent activity.

· Medical diagnosis: FNNs are used to analyse medical images and data to assist in diagnosis

· Credit scoring: FNNs are used to assess the creditworthiness of individuals and businesses.

· Time series forecasting: FNNs can predict future values of time series data, such as stock prices and energy consumption.

· eCommerce: FNNs are used in recommendation systems that suggest products to customers based on their browsing and purchasing history.

· Cybersecurity: FNNs are used to detect malicious activities or anomalies.

· Text Classification: FNNs are used in NLP tasks such as sentiment analysis.

· Marketing: FNNs are used to segment customers, predict customer churn

FNN research is still active, with researchers exploring new architectures and training algorithms to improve their performance and capabilities.

In 2021, MLP-Mixer emerged as a game-changer in the field of image classification. This simple architecture, consisting of two deep Multi-Layer Perceptrons (MLPs) with skip connections and layer normalizations, and strictly feedforward information flow (without any loops) surprised the research community by achieving performance comparable to the more complex and powerful Vision Transformers (ViTs) on large benchmarks like ImageNet.

--

--