DEEP LEARNING: ARTIFICIAL NEURAL NETWORKS

Ariel Jumba
3 min readOct 2, 2022

--

Neural networks are perhaps one of the most complex machine learning algorithms. They are modelled in a similar way as a human brain. To understand this, the human brain consists of elements that process various external signals into final outputs. The signals are fed into a sensor processor which determines the threshold number and weight to attach to each signal and ultimately combines the most important inputs into one final acceptable output.

Neural networks can be used in understanding various social, political or economic phenomena, Natural Language Processing as well as automation of devices among other functions. Artificial network neurons work in a similar way as the biological ones. However, they do defer from the biological ones as they yield output that is not binary but in ranges i.e. -inf to +inf, -5 to +5, etc.

Input data is fed into a processor that utilizes an activation function to yield the desired outcome. The processing function consists primarily of three elements i.e. the activation function, network topology and dataset training, which is the core element. The process involves raw data hitting input nodes from which an activation function is fired, assignment of various weights takes place then the output hits an output node which then transforms the data into final output. We will examine this process in detail.

The activation function involves transformation of raw data into output in various nodes. This is an iterative process where initial weights are assigned to the raw data depending on the features then subsequent weights are calculated with each iteration.

Standardization or normalization of data is very key in this process so as to avoid squashing. This involves having a concentration of so many input values over a small range of output. E.g. when dealing with nosy data or when comparing GDP and population. It’s important to check the data distribution at this stage so as to determine what function to use in standardization or normalization. It is best practice to normalize the data so that input ranges are scaled to a narrow range around zero. This is to also avoid overfitting of data.

The network topology is key in this process. The whole network consists of input nodes and output nodes. These nodes are organized in layers. We may have single layered nodes which are used in very simple processes. There could also be multilayered nodes which consist of additional hidden nodes. An output node must be fed by all the preceding input nodes. The same applies to each hidden node.

The number of nodes is determined by the features within the input data. Since this is an artificial model and not a biological one, the output is determined mostly by the desired output. These multilayered nodes are known as deep neural networks and their study known as deep learning.

The concept of direction of information travel also comes into play at this point. Some networks are unidirectional and are known as feedforward networks. Information moves from one node to another, whether single-layered or multi-layered, in a forward direction.

We equally have recurrent networks which involves movement of information in any direction. However, these are rarely used. Training the data is the most complex part. This is where weights are calculated. It equally involves strengthening or weakening the relationships between the various neurons. One advantage of this algorithm is that it can model very complex data patterns. It therefore takes a huge toll on the processor and can be very slow if the network topology is complex in nature.

Backpropagation is mainly used in training. It can be a forward phase or backward phase. In a forward phase, information moves from node to node, assigning weights along the way, until the final node is reached. There’s no model evaluation or improvement of model performance involved.

Backward phase involves the comparison of training data target value and the forward phase data to determine the error. Any error will be corrected by moving backwards and fixing through weight reallocations and additional hidden nodes. This will be useful to avoid any future errors.

When walking along a street it’s very easy to determine whether a building houses a bank or not by simply observing the building’s architecture or the surroundings. The above example may be useful in illustrating how weights are reallocated when fixing errors by a process known as gradient descent.

--

--