Artificial Neural Networks — Explained

Data Overload
4 min readOct 20, 2022

--

Artificial neural networks are a hot topic in the field of data science. I would like to write about this field in my first blog post. I summarized the topic for the people who are new to it. I hope you enjoy!

This story was written with the assistance of an AI writing program.

History

Warren S. McCulloch and Walter Pitts (1943) built the foundations of neural networks. They introduced a mathematical model which theoretically acts like a neuron. This neuron is called perceptron and consists of one or multiple inputs, a processor, and an output.

Later, in 1957, Rosenblatt came up with the first trainable neural network we now know. Below is an example of a three-layer neural network.

Structure

An artificial neural network is a complex machine learning method. It consists of neurons and all the neurons have incoming and outgoing connections that make this whole thing a network. There are different kinds of connections, e.g. feedforward, recurrent, etc. There are also different types of networks such as convolutional neural networks (CNN), long-short term memory (LSTM), etc.

A neuron is a function that takes in one or more values and gives only one value as an output. It consists of weights and an activation function. In each neuron, inputs are multiplied by weights (just as in linear regression) and then, the activation function is applied.

The goal of training a neural network is to find the set of weights and biases that can transform the input data into the desired output.

Depending on the complexity of the problem, training a neural network might take a long time because of the causal chains of computational stages involved.

The necessity of having multiple neurons stems from the fact that the neurons can only have a single value, but we would like to have more information, i.e. a high number of factors are affecting our output (high complexity). The changes of values in different neurons in the same layer affect the output y differently.

There may be more than one layer between the input and output layers. These layers are called hidden layers. Suppose all the neurons in a layer of an artificial neural network have synapses (designated as arrows) connected to all the neurons in the subsequent layer. If that’s the case, it is a dense layer.

Activation Functions

An activation function is a non-linear function like the Sigmoid function, ReLU, Softmax, or TanH. Different layers may have different activation functions. Without this function, a perceptron is just a simple linear regression.

Some common activation functions include:

  • Sigmoid function: This function maps input values to a range between 0 and 1.
Source: https://www.codecademy.com/resources/docs/ai/neural-networks/sigmoid-activation-function
  • Rectified linear unit (ReLU): This function outputs the input value if it is positive, and 0 if it is negative.
Source: https://machinelearningmastery.com/rectified-linear-activation-function-for-deep-learning-neural-networks/
  • Hyperbolic tangent (tanh): This is similar to the sigmoid function, but maps input values to a range between -1 and 1.
Source: https://insideaiml.com/blog/TanhActivation-Function-1032
  • Softmax function: This is a generalization of the sigmoid function that is used for multi-class classification problems.

Choosing the appropriate activation function can significantly impact the performance of a neural network.

If you like this post, check out my post below.

If you found this article useful, please give it a clap and share it with others.

Thank you!

--

--

Data Overload

Data Science | Finance | Python | Econometrics | Sports Analytics | Lifelong Learner