A Non-Technical Introduction to AI: Part 2

Unlocking the Secrets of Deep Learning: An Exploration of Neural Networks for the non-tech-savvy traveler.

Manu Mulaveesala
13 min readMay 25, 2023

Introduction

In the first part of our series, we began our heroic journey into the vast world of Machine Learning and AI. We touched on the different types of standard machine learning models and we performed in-depth explorations of supervised and unsupervised machine learning tasks. If you’ve missed this, don’t worry! You can catch up right here: A Non-Technical Introduction to AI: Part 1. Now, in this next stage of our adventure, we’re going to unveil the secrets of the ‘brains’ behind Deep Learning — the Neural Networks. We’re going to break down the techy terms you’ve heard like “neural nets,” “deep layers,” “neurons,” and others, into everyday language.

Deep Learning Demystified

Deep learning, a sub-field of machine learning, draws inspiration from the workings of the human brain to teach machines to adapt to ever-changing conditions or data. The goal is to craft a similar system to our own brain, capable of learning, adapting, and solving complex puzzles! Deep Learning employs artificial neural networks, which are layers of interlinked nodes (or “neurons”) that collaboratively analyze data and make choices. Picture a neuron as a sort of input/output station: you have certain inputs coming in and certain outputs going out. Inside the neuron, calculations are performed (“number-crunching”) to refine the network (or model) as it learns over time. Keep in mind, unlike standard computation tasks, learning involves fine-tuning the system to interpret the data or understand its surroundings in the best way possible.

Much like our brains, signals can flow through the neuron in two directions: forward or backward. Each forward “fire” of the neuron takes us from input to output, gauging how snugly our model “fits” our data. The gap between our model’s perceived interpretation of the data and the actual interpretation is a general yardstick of how well our model “fits” — the so-called model’s loss.

After moving through the network in a forward manner, we get a response (backward pass) that comes back through the network that is used to adjust the values stored inside the neuron, in order to minimize the loss. Each time a signal goes through one of the neurons, it will be used to better adjust the value inside the neuron to be able to predict the correct output based on a given input.

Photo by Tyler Nix on Unsplash

Let’s back up and think of a simple analogy that involves something everyone loves: coffee. A neuron is like a busy barista at your favorite coffee shop, taking orders (inputs) and serving drinks (outputs). Inside the barista’s mind, calculations (number-crunching on the different orders for different customers) help refine their skills (model — ability to be a great barista) over time. Orders go through a two-step process: first, the barista makes the drinks (forward pass) to see how well they achieve customer satisfaction(loss); and second, they receive feedback (backward pass) to adjust their technique from customer feedback, minimizing mistakes and perfecting their craft.

Neurons to Neural Networks

Now that we’ve examined how an individual neuron works, let’s take a look at what happens in a connected web of these neurons — a Neural Network. Remember that Neural networks (NNs) are inspired by the human brain’s structure and processing capabilities, which weave these functional units of neurons together to make a whole that is greater than the individual parts. NNs consist of multiple interconnected layers, with each layer containing a set of neurons. These neurons are the building blocks of the network, responsible for receiving input, processing it, and generating output. Let’s dive deeper into the layers that make up a neural network.

  1. The Input Layer is the first layer in the neural network, responsible for receiving raw data from the outside world. This layer acts as a gateway, converting the data into a format that can be processed by the subsequent layers. The input layer contains as many neurons as there are features in the input data.
  2. Hidden Layers are found between the input and output layers. They are responsible for processing and transforming the data received from the input layer, extracting useful patterns and features that can be used to make predictions. The number and configuration of hidden layers vary depending on the complexity of the problem and the neural network’s design.
  3. The Output Layer is the final layer in the neural network. It receives the processed information from the hidden layers and converts it into a format that can be interpreted by the outside world. The output layer typically has as many neurons as there are classes or categories in the problem being solved.

The Neural Network Gossip Party

Since analogies are one of the best ways to learn any complex topic, let’s make the complex algorithm of a neural network more tangible, and think of neurons like a bunch of chatty gossipers at a party, spreading information through whispers and hand gestures. These gossipers represent individual neurons, and the whispers and hand gestures symbolize the connections between them. The more they communicate, the more information gets passed around, and the better they become at making sense of the world (in theory).

Example of a Neural Network with an input layer, hidden layers, and an output layer

Now, imagine we try to replicate this whole gossiping process on a computer. That’s essentially what neural networks are — digital versions of our brain’s gossipers, designed to process and learn from the information in a similar way.

Layers of Gossipers: From Input to Output

A neural network is made up of layers, just like a gossip-laden party would have different groups of people huddled together, exchanging whispers. The first layer, called the input layer, is where the gossip (or data) enters the network. It’s like someone just walked into the party and started sharing the latest scoop.

The input layer passes the information to the next group of gossipers, known as the hidden layers. These hidden layers are where the real magic happens, as the gossipers in these layers chat and share information amongst themselves, gradually refining and understanding the message. This process goes on until the information reaches the final group, called the output layer, which delivers the final decision or prediction — like the party’s master gossiper who summarizes the hot news for everyone to hear.

The real challenge here is if the “gossiper network” actually had a good handle or understanding of the information in this game of “telephone.” The final results may often (especially at the beginning of the neural network’s training) lead to incorrect results or outputs that completely misconstrue the inputs to the model (fake news in the gossip world).

Therefore, we need successive iterations (or repetitions) of the network being trained to spread the information as accurately as possible in a network. This analogy can help us understand how a “network” works, by thinking about how a social network of individuals spreads information, but if we are being real for a second…honesty may be a far cry from the highest priority imperative for any real human gossiper network.

Now that we know a little more about the “players” in the neural network, you may ask yourself, “How does this ‘learning process’ actually take place?”

Unraveling the Neural Network

We mentioned how neural networks are able to “learn” from successive iterations and in this section, we will dive deeper into how this process actually works. We’ll need to define a few more “terms” before we can tie at all together.

Each input node (neuron) connects to other layers of the network, and the neurons in those layers connect to the neurons in later on, until we reach the output of the neural network. There is a term that is used to describe the actual strength of the connection between neurons: the weights. The weights in a social network from our earlier analogy can be thought of us as the strength of the relationship between two gossipers in the network.

Why wouldn’t all the connections be weighted equally? This is because the “weights” are actually what allow for the adaptability of the neural network to change over time, to provide more weight to certain layers and less weight to others, depending on the features that matter the most in the network. There are some “leaders” within the network and “key influencers” that affect the network or certain layers /neurons more than others.

A bias is a parameter of the neural network that acts as an additional parameter to help neurons make more accurate predictions. Both weights and biases get updated during the training process to improve network performance. Now the main headline to remember from this is that the main goal of the “training game” in a neural network is to calculate the optimal values of the weights and biases that produce the best results for the predictions of the network.

Activation Functions and Neurons

Neurons in a computer program (called a neural network) take in information, process it, and pass it on. Each neuron has a special function called an activation function, which decides how to process the input it gets. There are several types of activation functions like sigmoid, ReLU, and softmax. These functions help the network learn complex patterns and make sure the output values stay within a certain range.

Activation functions serve two main purposes: introducing non-linearity into the network and normalizing the output of each neuron. Why Non-linearity? Because this Non-linearity allows the neural network to learn complex patterns, unlike the simple Regression models we saw in the previous article. Meanwhile, normalization ensures that the output values remain within a certain range.

Training a Neural Network

Training a neural network is about adjusting it to get the least amount of error. This process involves two steps — forward propagation and backpropagation. In forward propagation, input data is passed through the network to generate predictions. In backpropagation, the difference between predictions and actual output is used to adjust the network. The network is trained with labeled data (which has known input-output pairs). The difference between predicted and actual output gives the loss, and the aim is to minimize this loss.

We minimize this loss by adjusting the weights and biases to minimize the loss function. This process requires a combination of forward propagation, where the input is passed through the network to generate predictions, and backpropagation, where the error between the predictions and the actual output is calculated and used to update the weights and biases. The entire purpose of the training exercise is to find the optimal weights and biases.

Optimization Techniques

So, how do we find the optimal weights and biases? There are several optimization techniques have been developed to speed up this process and improve the accuracy of the final model. Optimization techniques help improve the accuracy of the model by minimizing the loss function. There are several techniques, starting with the most popular one called Gradient Descent, and more advanced ones like Stochastic Gradient Descent, Momentum, RMSprop, and Adam. These advanced techniques use different strategies to make the model converge faster and avoid getting stuck.

Gradient descent is the most basic optimization algorithm, where the model charts a path to the “bottom of the valley” pictured below by slowly converging on the lowest point on a multi-dimensional landscape.

Putting it all Together: Neural Network City

Think of a neural network as a bustling city where cars represent data. The city’s roads, or synapses, are constantly abuzz with these cars (information) moving from point A to point B (inputs to outputs).

This city is divided into several districts (neural layers), each serving a unique function. The input layer is like the city’s entrance, where cars (data) enter, get processed, and then are dispatched to the next district. The hidden layers, situated in the heart of the city, work together to further analyze and process this data, identifying useful patterns and features. In the end, the output layer, similar to the city’s exit, generates a decision or prediction based on the processed data.

At each junction (neuron), there are traffic lights (activation functions) regulating the flow of cars. These traffic lights control if and how much data (cars) can move from one road (synapse) to another. Some traffic lights might only allow a certain number of cars to pass, while others might halt the flow completely under specific conditions.

The objective is to optimize the traffic flow (train the neural network) to ensure cars (data) reach their destinations (desired outputs) efficiently. For this, city planners (optimization algorithms) continuously monitor and adjust the settings of traffic lights (weights and biases). These planners work with a city map (loss function) that highlights traffic-heavy areas to help identify which traffic lights need tweaks.

Training the neural network is akin to an urban planner’s trial-and-error method. They tweak traffic light settings and observe how it influences the overall traffic flow in the city. The process continues until they find the most efficient flow possible, reducing traffic jams (minimizing the loss function) and ensuring cars reach their destinations (outputs) accurately and quickly.

In conclusion, a neural network is like a complex city with data as cars that are moving through interconnected roads (synapses). Neurons act like junctions, and activation functions are traffic lights controlling the flow. Training a neural network involves adjusting these traffic lights (weights and biases) to optimize traffic flow and achieve desired outcomes.

Deep Learning in the Real World

The main difference between any old “neural network” and a Deep Learning neural network is the presence of more hidden layers in the network. The power of deep learning lies in its ability to extract intricate patterns and representations from vast amounts of data. Deep learning has gained significant attention in recent years due to its remarkable success in solving problems that were once thought to be impossible for computers. Some notable achievements include:

  1. Image and Speech Recognition: Deep learning algorithms have surpassed human performance in tasks like image and speech recognition, which has led to the development of more advanced virtual assistants and autonomous systems. Google Home and Apple’s Siri are popular industry examples of Speech Recognition.
  2. Generative Models: Deep learning can generate realistic images, audio, and text based on learned patterns, enabling applications like AI-generated art and music, as well as more sophisticated chatbots. Check out DALLE2 by Open AI if you want to play around generating images with a prompt. I’ve played around quite a bit with this, and this deserves a post of its own! Here is the link to try it out yourself, you just might be surprised what it comes up with!
  3. Reinforcement Learning: Deep learning has empowered reinforcement learning algorithms to tackle more complex tasks, such as teaching computers to play games like Go and Chess at a superhuman level.
  4. Question and Answer Dialogs (Chatbots): ChatGPT has made many headlines for both its wins and shortcomings, but the reality is here that ChatGPT and tools like it are here to stay. Engineers are already using ChatGPT to drop coding frameworks and save hours of time with code. The chatbot can even be used to write blog posts or write marketing material! In fact, in the near future, you may not even be able to tell whether an AI or human has generated a certain piece of content, and that can be a little bit scary…we’ll talk more about ethics in an upcoming portion of this series.

Challenges and Future Directions

Despite the fact that the world has seen much success in deep learning, there are still several challenges that remain. Some of the most pressing issues include the interpretability of deep learning models, the reliance on large amounts of labeled data for training, and the computational resources required for training large models. Researchers are actively working on developing techniques to make deep learning models more interpretable, such as explainable AI (XAI).

Typically, machine learning like Deep Learning requires a massive amount of data which also requires a massive amount of training, AKA computational resources. Therefore, efforts are also being made to develop methods that require less labeled data, like unsupervised learning, semi-supervised learning, and transfer learning. Additionally, advancements in hardware and the development of more efficient algorithms are helping to reduce the computational requirements for deep learning.

Conclusion

Deep Learning has taken the world by storm because it has revolutionalized the field of Artificial Intelligence to have the capability to solve very “challenging” obstacles like Object Recognition (see: Autonomous Vehicles), Human Language Response (see: ChatGPT and QA Dialog services), and more. Neural networks as we have seen are the foundation of deep learning, which allows a complex aggregation of miniature patterns in the data for more complex machine tasks. These new powerful models are capable of recognizing patterns, making predictions, and solving a wide range of problems. The breakthroughs and applications of deep learning in various industries demonstrate its potential to transform our world. In the next part of our series, we will explore some of the key driving factors that have allowed ChatGPT to take center stage in the AI world by learning more about Natural Language Processing and an introduction to Large Language Models. Stay tuned and hit the green button to subscribe for future posts and easy-to-understand topics on AI and Machine Learning.

Author’s Footnote: Portions of this article were supplemented by Generative AI tools in its writing. This was used for outline mapping of common concepts and for certain selected definitions. The Author accepts sole responsibility for the content expressed in this writing.

--

--