Basic Understanding of Neural Network Structure

3 min readOct 3, 2023

A neural network is composed of layers of interconnected nodes (neurons) organized into three primary types of layers: the input layer, hidden layers, and the output layer.

Input Layer: The input layer consists of neurons representing the features of the input data. Each neuron corresponds to a feature, and its value represents the feature’s value.
Hidden Layers: Between the input and output layers, there may be one or more hidden layers. These layers perform complex computations on the input data. Each neuron in a hidden layer receives inputs from all neurons in the previous layer, applies a weighted sum, adds a bias term, and passes the result through an activation function.
Output Layer: The output layer produces the final prediction or result. The number of neurons in this layer depends on the nature of the problem. For example, in a binary classification task, there might be one neuron for each class, outputting probabilities.

Image source: Google

Connection Weights and Activation Functions:

Connection Weights (W): These represent the strengths of connections between neurons. Each connection from neuron A to neuron B has a weight associated with it, denoted as WAB. These weights are learned during training and determine the impact of neuron A’s output on neuron B.
Bias Terms (b): Each neuron also has a bias term (b) associated with it. The bias allows the neuron to shift its output. Bias terms are also learned during training.
Activation Function (σ): Each neuron applies an activation function to the weighted sum of its inputs plus the bias. Common activation functions include the sigmoid function, ReLU (Rectified Linear Unit), and tanh (hyperbolic tangent). The activation function introduces non-linearity, allowing the network to model complex relationships.

Forward Pass (Inference):

During the forward pass (inference), data is propagated through the network as follows:

Input Propagation: Input data is assigned to the input layer’s neurons.
Hidden Layer Computation: Each neuron in the hidden layers computes the weighted sum of its inputs and adds its bias term:

Weighted Sum (Z): Z = Σ(Wij * Xj) + bi, where Wij is the weight, Xj is the output of the j-th neuron in the previous layer, and bi is the bias of the current neuron.
Activation (A): A = σ(Z), where σ is the activation function.

Output Layer Computation: The output layer neurons perform the same computation as the hidden layers, producing the final predictions or values.

Loss Function:

After the forward pass, the network’s output is compared to the actual target values using a loss function (also called a cost function or objective function). The choice of loss function depends on the task, but common ones include mean squared error (MSE) for regression tasks and cross-entropy for classification tasks.

Backward Pass (Backpropagation):

During the backward pass (backpropagation), the network learns from its mistakes and updates its weights to minimize the loss. The steps involved are:

Gradient Computation: Calculate the gradients of the loss with respect to the weights and biases using the chain rule of calculus. The gradient represents how much a change in a weight will affect the loss.
Weight Updates: Adjust the weights and biases in the direction that minimizes the loss by subtracting the gradient multiplied by a learning rate (α). This update step follows an optimization algorithm like stochastic gradient descent (SGD).

Optimization:

Optimization algorithms like SGD, Adam, or RMSprop are used to efficiently update weights and biases during training. Learning rate (α) controls the size of weight updates, and other hyperparameters can be fine-tuned to achieve better convergence.

In summary, neural networks are composed of interconnected neurons organized into layers, with each neuron applying a weighted sum plus bias and an activation function. The network learns from data by adjusting weights and biases through backpropagation and optimization algorithms, aiming to minimize a defined loss function. This process allows neural networks to make predictions and solve various tasks.