Introduction To Artificial Intelligence — Neural Networks

Ilija Mihajlovic
Apr 18, 2019 · 14 min read

Inspired by the structure of the brain, artificial neural networks (ANN) are the answer to making computers more human like and help machines reason more like humans.

They are based on the neural structure of the brain. The brain basically learns from experience. It is natural proof that some problems that are beyond the scope of current computers are indeed solvable by small energy efficient packages.

Human Neurons

To understand how artificial neural networks work let’s first briefly look at the human ones. The exact workings of the human brain are still a mystery. Yet, some aspects of this amazing processor are known. In particular, the most basic element of the human brain is a specific type of cell which, unlike the rest of the body, doesn’t appear to regenerate. Because this type of cell is the only part of the body that isn’t slowly replaced, it is assumed that these cells are what provides us with our abilities to remember, think, and apply previous experiences to our every action. These cells, all 100 billion of them, are known as neurons. Each of these neurons can connect with up to 200,000 other neurons, although 1,000 to 10,000 is typical.

Credit For The Image Goes To: Knowing Neurons

The power of the human mind comes from the sheer numbers of these basic components and the multiple connections between them. It also comes from genetic programming and learning.

The individual neurons are complicated. They have a myriad of parts, sub-systems, and control mechanisms. They convey information via a host of electrochemical pathways. There are over one hundred different classes of neurons, depending on the classification method used. Together these neurons and their connections form a process which is not binary, not stable, and not synchronous. In short, it is nothing like the currently available electronic computers, or even artificial neural networks.

These artificial neural networks try to replicate only the most basic elements of this complicated, versatile, and powerful organism. They do it in a primitive way. But for the software engineer who is trying to solve problems, neural computing was never about replicating human brains.

Artificial Neurons and How They Work

The inventor of the first neurocomputer, Dr. Robert Hecht-Nielsen, defines a neural network as:

“…a computing system made up of a number of simple, highly interconnected processing elements, which process information by their dynamic state response to external inputs.”

Neural networks are typically organized in layers. Layers are made up of a number of interconnected ‘nodes’ which contain an ‘activation function’. Patterns are presented to the network via the ‘input layer’, which communicates to one or more ‘hidden layers’ where the actual processing is done via a system of weighted ‘connections’. The hidden layers then link to an ‘output layer’ where the answer is output as shown in the image below:

Credit For The Image Goes To: Knowing Neurons

Most ANNs contain some form of ‘learning rule’ which modifies the weights of the connections according to the input patterns that it is presented with. In a sense, ANNs learn by example as do their biological counterparts; a child learns to recognize dogs from examples of dogs.

How artificial neural networks learn stuff

In the same way that we learn from experience in our lives as mentioned above, neural networks require data to learn. In most cases, the more data that can be thrown at a neural network, the more accurate it will become. Think of it like any task you do over and over. Over time, you gradually get more efficient and make fewer mistakes.

When researchers or computer scientists set out to train a neural network, they typically divide their data into three sets. First is a training set, which helps the network establish the various weights between its nodes. After this, they fine-tune it using a validation data set. Finally, they’ll use a test set to see if it can successfully turn the input into the desired output.

Credit For The Image Goes To:

During the training and supervisory stage, the ANN is taught what to look for and what its output should be, using Yes/No question types with binary numbers.

Let’s see at an example, a bank that wants to detect credit card fraud on time may have four input units fed with these questions: (1) Is the transaction in a different country from the user’s resident country? (2) Is the website the card is being used at affiliated with companies or countries on the bank’s watch list? (3) Is the transaction amount larger than $2,000? (4) Is the name on the transaction bill the same as the name of the cardholder? The bank wants the “fraud detected” responses to be Yes Yes Yes No, which in binary format would be 1 1 1 0. If the network’s actual output is 1 0 1 0, it adjusts its results until it delivers an output that coincides with 1 1 1 0. After training, the computer system can alert the bank of pending fraudulent transactions, saving the bank lots of money

Types of neural networks

Neural networks are sometimes described in terms of their depth, including how many layers they have between input and output, or the model’s so-called hidden layers. This is why the term neural network is used almost synonymously with deep learning. They can also be described by the number of hidden nodes the model has or in terms of how many inputs and outputs each node has. Variations on the classic neural network design allow various forms of forward and backward propagation of information among tiers.

The simplest variant is the feed-forward neural network. This type of artificial neural network algorithm passes information straight through from input to processing nodes to outputs. It may or may not have hidden node layers, making their functioning more interpretable.

More complex are recurrent neural networks. These deep learning algorithms save the output of processing nodes and feed the result back into the model. This is how the model is said to learn.

Convolutional neural networks are popular today, particularly in the realm of image recognition. This specific type of neural network algorithm has been used in many of the most advanced applications of AI including facial recognition, text digitization and natural language processing.

There are several strategies for learning, such as:

Supervised Learning

Essentially, a strategy that involves a teacher that is smarter than the network itself. For example, let’s take the facial recognition example. The teacher shows the network a bunch of faces, and the teacher already knows the name associated with each face. The network makes its guesses, then the teacher provides the network with the answers. The network can then compare its answers to the known “correct” ones and make adjustments according to its errors.

Unsupervised Learning

Required when there isn’t an example data set with known answers. Imagine searching for a hidden pattern in a data set. An application of this is clustering, i.e. dividing a set of elements into groups according to some unknown pattern. We won’t be looking at any examples of unsupervised learning in this chapter, as this strategy is less relevant for our examples.

Reinforcement Learning

A strategy built on observation. Think of a little mouse running through a maze. If it turns left, it gets a piece of cheese; if it turns right, it receives a little shock. (Don’t worry, this is just a pretend mouse.) Presumably, the mouse will learn over time to turn left. Its neural network makes a decision with an outcome (turn left or right) and observes its environment (yum or ouch). If the observation is negative, the network can adjust its weights in order to make a different decision the next time. Reinforcement learning is common in robotics. At time t, the robot performs a task and observes the results. Did it crash into a wall or fall off a table? Or is it unharmed? We’ll look at reinforcement learning in the context of our simulated steering vehicles.

Neural Network Terminology


A unit often refers to a nonlinear activation function (such as the logistic sigmoid function) in a neural network layer that transforms the input data. The units in the input/ hidden/ output layers are referred to as input/ hidden/ output units. A unit typically has multiple incoming and outgoing connections. Complex units such as long short-term memory (LSTM) units have multiple activation functions with a distinct layout of connections to the nonlinear activation functions, or maxout units, which compute the final output over an array of nonlinearly transformed input values. Pooling, convolution, and other input transforming functions are usually not referred to as units.

Artificial Neuron

The terms neuron or artificial neuron are equivalent to a unit, but imply a close connection to a biological neuron. However, deep learning does not have much to do with neurobiology and the human brain. On a micro level, the term neuron is used to explain deep learning as a mimicry of the human brain. On a macro level, Artificial Intelligence can be thought of as the simulation of human level intelligence using machines. Biological neurons are however now believed to be more similar to entire multilayer perceptrons than to a single unit/ artificial neuron in a neural network. Connectionist models of human perception and cognition utilize artificial neural networks. These connectionist models of the brain as neural nets formed of neurons and their synapses are different from the classical view (computationalism) that human cognition is more similar to symbolic computation in digital computers. Relational Networks and Neural Turing Machines are provided as evidence that cognition models of connectionism and computationalism need not be at odds and can coexist.

Activation Function

An activation function, or transfer function, applies a transformation on weighted input data (matrix multiplication between input data and weights). The function can be either linear or nonlinear. Units differ from transfer functions in their increased level of complexity. A unit can have multiple transfer functions (LSTM units) or a more complex structure (maxout units).

The features of 1000 layers of pure linear transformations can be reproduced by a single layer (because a chain of matrix multiplication can always be represented by a single matrix multiplication). A non-linear transformation, however, can create new, increasingly complex relationships. These functions are therefore very important in deep learning, to create increasingly complex features with every layer. Examples of nonlinear activation functions include logistic sigmoid, Tanh, and ReLU functions.


A layer is the highest-level building block in machine learning. The first, middle, and last layers of a neural network are called the input layer, hidden layer, and output layer respectively. The term hidden layer comes from its output not being visible, or hidden, as a network output. A simple three-layer neural net has one hidden layer while the term deep neural net implies multiple hidden layers. Each neural layer contains neurons, or nodes, and the nodes of one layer are connected to those of the next. The connections between nodes are associated with weights that are dependent on the relationship between the nodes. The weights are adjusted so as to minimize the cost function by back-propagating the errors through the layers. The cost function is a measure of how close the output of the neural network algorithm is to the expected output. The error backpropagation to minimize the cost is done using optimization algorithms such as stochastic gradient descent, batch gradient descent, or mini-batch gradient descent algorithms. Stochastic gradient descent is a statistical approximation of the optimal change in gradient that produces the cost minima. The rate of change of the weights in the direction of the gradient is referred to as the learning rate. A low learning rate corresponds to slower/ more reliable training while a high rate corresponds to quicker/ less reliable training that might not converge on an optimal solution.

A layer is a container that usually receives weighted input, transforms it with a set of mostly nonlinear functions and then passes these values as output to the next layer in the neural net. A layer is usually uniform, that is it only contains one type of activation function, pooling, convolution etc. so that it can be easily compared to other parts of the neural network.

Neural Networks in Practice

Given this description of neural networks and how they work, what real world applications are they suited for? Neural networks have broad applicability to real world business problems. In fact, they have already been successfully applied in many industries.

Since neural networks are best at identifying patterns or trends in data, they are well suited for prediction or forecasting needs including:

  • Sales forecasting
  • Industrial process control
  • Customer research
  • Data validation
  • Risk management
  • Target marketing

But to give you some more specific examples; ANN are also used in the following specific paradigms: recognition of speakers in communications; diagnosis of hepatitis; recovery of telecommunications from faulty software; interpretation of multimeaning Chinese words; undersea mine detection; texture analysis; three-dimensional object recognition; hand-written word recognition; and facial recognition.

Neural networks in medicine

Artificial Neural Networks (ANN) are currently a ‘hot’ research area in medicine and it is believed that they will receive extensive application to biomedical systems in the next few years. At the moment, the research is mostly on modelling parts of the human body and recognizing diseases from various scans (e.g. cardiograms, CAT scans, ultrasonic scans, etc.).

Neural networks are ideal in recognizing diseases using scans since there is no need to provide a specific algorithm on how to identify the disease. Neural networks learn by example so the details of how to recognize the disease are not needed. What is needed is a set of examples that are representative of all the variations of the disease. The quantity of examples is not as important as the ‘quantity’. The examples need to be selected very carefully if the system is to perform reliably and efficiently.

Modelling and Diagnosing the Cardiovascular System

Neural Networks are used experimentally to model the human cardiovascular system. Diagnosis can be achieved by building a model of the cardiovascular system of an individual and comparing it with the real time physiological measurements taken from the patient. If this routine is carried out regularly, potential harmful medical conditions can be detected at an early stage and thus make the process of combating the disease much easier.

A model of an individual’s cardiovascular system must mimic the relationship among physiological variables (i.e., heart rate, systolic and diastolic blood pressures, and breathing rate) at different physical activity levels. If a model is adapted to an individual, then it becomes a model of the physical condition of that individual. The simulator will have to be able to adapt to the features of any individual without the supervision of an expert. This calls for a neural network.

Another reason that justifies the use of ANN technology, is the ability of ANNs to provide sensor fusion which is the combining of values from several different sensors. Sensor fusion enables the ANNs to learn complex relationships among the individual sensor values, which would otherwise be lost if the values were individually analysed. In medical modelling and diagnosis, this implies that even though each sensor in a set may be sensitive only to a specific physiological variable, ANNs are capable of detecting complex medical conditions by fusing the data from the individual biomedical sensors.

Electronic noses

ANNs are used experimentally to implement electronic noses. Electronic noses have several potential applications in telemedicine. Telemedicine is the practice of medicine over long distances via a communication link. The electronic nose would identify odours in the remote surgical environment. These identified odours would then be electronically transmitted to another site where an door generation system would recreate them. Because the sense of smell can be an important sense to the surgeon, telesmell would enhance telepresent surgery.

Neural Networks in business

Business is a diverted field with several general areas of specialisation such as accounting or financial analysis. Almost any neural network application would fit into one business area or financial analysis.

There is some potential for using neural networks for business purposes, including resource allocation and scheduling. There is also a strong potential for using neural networks for database mining, that is, searching for patterns implicit within the explicitly stored information in databases. Most of the funded work in this area is classified as proprietary. Thus, it is not possible to report on the full extent of the work going on. Most work is applying neural networks, such as the Hopfield-Tank network for optimization and scheduling.


There is a marketing application which has been integrated with a neural network system. The Airline Marketing Tactician (a trademark abbreviated as AMT) is a computer system made of various intelligent technologies including expert systems. A feedforward neural network is integrated with the AMT and was trained using back-propagation to assist the marketing control of airline seat allocations. The adaptive neural approach was amenable to rule expression. Additionaly, the application’s environment changed rapidly and constantly, which required a continuously adaptive solution. The system is used to monitor and recommend booking advice for each departure. Such information has a direct impact on the profitability of an airline and can provide a technological advantage for users of the system. [Hutchison & Stephens, 1987]

While it is significant that neural networks have been applied to this problem, it is also important to see that this intelligent technology can be integrated with expert systems and other approaches to make a functional system. Neural networks were used to discover the influence of undefined interactions by the various variables. While these interactions were not defined, they were used by the neural system to develop useful conclusions. It is also noteworthy to see that neural networks can influence the bottom line.

Credit Evaluation

The HNC company, founded by Robert Hecht-Nielsen, has developed several neural network applications. One of them is the Credit Scoring system which increase the profitability of the existing model up to 27%. The HNC neural systems were also applied to mortgage screening. A neural network automated mortgage insurance underwritting system was developed by the Nestor Company. This system was trained with 5048 applications of which 2597 were certified. The data related to property and borrower qualifications. In a conservative mode the system agreed on the underwriters on 97% of the cases. In the liberal model the system agreed 84% of the cases. This is system run on an Apollo DN3000 and used 250K memory while processing a case file in approximately 1 sec.


The computing world has a lot to gain from neural networks. Their ability to learn by example makes them very flexible and powerful. Furthermore there is no need to devise an algorithm in order to perform a specific task; i.e. there is no need to understand the internal mechanisms of that task. They are also very well suited for real time systems because of their fast responseand computational times which are due to their parallel architecture.

Neural networks also contribute to other areas of research such as neurology and psychology. They are regularly used to model parts of living organisms and to investigate the internal mechanisms of the brain.

Perhaps the most exciting aspect of neural networks is the possibility that some day ‘consious’ networks might be produced. There is a number of scientists arguing that conciousness is a ‘mechanical’ property and that ‘consious’ neural networks are a realistic possibility.

Finally, I would like to state that even though neural networks have a huge potential we will only get the best of them when they are intergrated with computing, AI, fuzzy logic and related subjects.

Thanks for reading!😄 🙌

Ilija Mihajlovic

Written by

Self-taught iOS developer and computer science graduate, with a passion for machine learning and computer vision.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade