Machine Learning (ML) and Neural Networks (NN)… An Intuitive Walkthrough

Aayush Grover
The Innostation Publication
19 min readOct 21, 2021


If I had a dollar for every time I’ve heard the words “Artificial Intelligence” “Machine Learning” and “Deep Learning” I’d be at a resort in Hawaii, sipping lemonade and contemplating the very meaning of life. These words are essentially buzzwords… attention grabbers. They make anyone sound smart without actually knowing what they’re talking about. However, this begs the following questions: “Why are these words thrown around so much? What do they actually mean? And how can we even BEGIN to understand them?” Hopefully, by the end of this article, we won’t need to be asking these questions anymore. Instead, we might ask: “What am I going to do today to make real, tangible change within our world using Artificial Intelligence?”

Introductory Definitions:

Artificial Intelligence (AI):

The use of computer systems to simulate human intelligence; holds dichotomous elements from natural intelligence such as the fact that artificial intelligence is developed while natural intelligence is evolved

Machine Learning (ML):

The use of computer systems to learn and adapt without explicitly coded instructions; primarily utilized through statistical models and a machine's ability to draw inferences, and analyze certain patterns that may present themselves in data.

Neural Networks (NN):

The use of computer systems to mimic Biological Neural Networks (BNN); often utilized through a series of algorithms used to discover relationships between information, analyzed in an approach that mimics the inner workings of the human brain. A multi-layered Neural Network is referred to as a Deep Neural Network, lending itself over to Deep Learning (DL).

I provided these definitions to multiple different people and got the exact same response each time… “I was able to understand absolutely nothing from that” To be honest, I can’t really blame them. There’s no doubt that these definitions present themselves in a way that’s incredibly difficult to decode and understand. Simply reading these definitions adds very little value to our understanding of these incredibly complex fields. This article aims to take a deeper dive into these definitions to try and achieve a fundamental understanding of the inner workings of artificial intelligence, machine learning, and neural networks, as well as their relationships with one another intuitively.

Let’s Begin With The Basics…

There are various different facets of machine learning, as well as how it functions. Hopefully, by the end of this article, your perception of ML will shift from one that associates it with magic, to one that’s oriented around mathematics and logic. To strengthen our definition of machine learning, ML consists of various different generic algorithms that can be used to analyze data rather than feeding the code with your own logic. The idea here is for the machine to use this data and build its own logic. One of the most common examples of machine learning is a “classification” problem, where a machine analyzes a set of data to be able to recognize human handwriting of the numbers 0 through 9. To understand this intuitively, we can feed a machine a data set of thousands of images of numbers from 0 through 9. Our algorithm would now differentiate between different patterns in this data and begin to recognize these differences through training. Training refers to feeding our algorithm with data that it’s able to analyze. After thousands of images of training data are fed into our algorithm, we’re now able to input new images of handwritten numbers and get an output (# between 0 and 9), ultimately solving our classification problem! This example was an incredibly stripped-down version of what’s actually going on, disregarding weights, biases, neural networks with convolutional layers, and types of ML algorithms. That being said, it’s important to have an intuitive and basic understanding before delving into technical and complex details. In fact, this example will be explained in-depth within this article.

Two Types Of Machine Learning Algorithms

There are two primary kinds of machine learning algorithms — Supervised Learning, and Unsupervised Learning. Most machine learning algorithms fall under the umbrella of one of these two types of algorithms.

Supervised Learning

Supervised Learning is a category of machine learning algorithms that are based upon the idea of having a labelled data set and trying to find out its relationships. To consolidate this through an analogy, let’s say we were shown 1000 different models of cars, each labelled with a different price. We provided various data points such as production costs, tariffs, reviews, etc. After thousands of iterations, our machine can eventually analyze trends and create logic for the correlation between these individual data points and the final cost of the car. Eventually, we’ll be able to feed in data for a car that’s never been seen and our algorithm can use our past data to predict its cost. By letting our algorithm sort out the relationship between the type of car and the cost, we’re able to make predictions with an incredible accuracy of models that weren’t presented in the training data. This is an example of supervised learning.

Make a note that supervised learning runs on labelled data. This means that its applications are slightly limited. It requires large data sets with data that has been individually labelled, something that can be quite scarce in various different situations. For example, each car in our previous example had labelled data points for its individual features — more importantly, its final cost. We were able to use this to create predictions for car prices for different models. In situations where we don’t have this labelled data, we use an alternate method: Unsupervised Learning.

Unsupervised Learning

Unsupervised Learning is a category of machine learning algorithms that are based upon the idea of having an unlabelled data set and trying to analyze naturally occurring patterns to separate the data based upon them. It consists of two individual techniques: ‘classification’ and ‘clustering’. Going back to our previous analogy, if our cars had all of the same data points but weren’t labelled with a final cost, we’d have to use unsupervised learning. Our algorithm might separate cars that had higher production costs, or others that had high tariffs. Our algorithm might start off with a random guess and not know what to cluster this data as. Eventually, as more data points get added, our machine might be able to use techniques such as “clustering” to analyze patterns in our data, something which could be of great value to us. A better analogy for unsupervised learning, and one that’s more commonly used, is separating a group of blocks by colour.

Suppose we have 10 blocks, each with different coloured faces. In the beginning, our algorithm might have no idea what to do with them. Let’s add a new block, an orange block. In our 10 blocks, there may have already been an orange block. If so, our algorithm may start recognizing a pattern. Fast-forward the addition of 1000 blocks and now our algorithm was completely able to analyze the differences in colour between these blocks through patterns, and cluster them together. Now, if we were to use this algorithm on a new dataset, it should be able to sort blocks by colour. This is an incredibly simple application of unsupervised learning. However, with a little bit of imagination and an intuitive sense of where we can apply this, I hope you begin to understand just how big its impacts span. Any algorithm that is modelled on this concept is referred to as an Unsupervised Learning Algorithm.

So What’s Really Going On Behind The Scenes…

By now, we should have a bit of an introductory and intuitive sense of what machine learning really is. However, what’s really going on behind the scenes? What allows our machines to analyze these data sets and iterate them to a degree where they’re able to learn from them? Let’s delve into our very first machine learning concept, the cost function.

Let’s backtrack to the very first example that we had with our cars. Suppose we have 3 primary data points for simplicity: cost of production, tariffs, and rating. It’s evident that a new car model's base price will be dependant on significantly more than just this, but let’s use these three values to obtain a fundamental understanding of what’s going on. We can apply weights to all of these different data points that essentially add importance to certain values. For example, let’s say we have 3 car models: Car A, Car B, and Car C. Let’s create a table with all the data we need.

Car A: Price: $100 000

  • COP: $75 000
  • Tarrifs: $2 500
  • Rating: 4.8 stars

Car B: Price: $65 000

  • COP: $48 000
  • Tarrifs: $ 1500
  • Rating: 4.3 stars

Car C: Price: $40 000

  • COP: $33 000
  • Tarrifs: $1 000
  • Rating: 3.9 stars

Now, let’s assign random values as weights to each of these values, starting with 1.0 for each one, multiplying that by each feature, and adding it up to find an estimated car cost.

Car A (Weights 1.0 for all):

Estimated Cost = (75000)(1.0) + (2500)(1.0) + (4.8)(1.0) = $77 504.8 Cost Difference = $100 000 - $77 504.8 = $22 495.2

Car B (Weights 1.0 for all):

Estimated Cost = (48000)(1.0) + (1500)(1.0) + (4.3)(1.0) = $49 504.3 Cost Difference = $65 000 - $49 504.3 = - $15 495.7

Car C (Weights 1.0 for all):

Estimated Cost = (33000)(1.0) + (1000)(1.0) + (3.9)(1.0) = $34 003.9 Cost Difference = $40 000 - $34 003.9 = $5 996.1

Obviously, these values aren’t accurate. However, we can adjust these weights so that eventually, we can fit these values to accurately represent a model that can predict the prices of cars using these three data points. That being said, this task certainly doesn’t sound easy, especially when we may have 10 or 20 different data points for thousands of different data entries in future problems. In fact, it sounds essentially impossible to brute force, and you wouldn’t be wrong to think that. While it certainly isn’t impossible per se, it isn’t efficient and definitely can’t be used in a majority of applicative situations. That’s why mathematicians have come up with an incredibly applicative and useful function, often referred to as the cost function.

Cost Function in an Applicative Sense

This is the formula for the cost function. It’s one that’s somewhat difficult to understand without a strong basis in mathematics. However, all you need to understand is that this formula states how wrong our guessed values for the weights are. If our function J(θ) = 0, our weights have been fine-tuned to predict our car prices with a 100% degree of accuracy. We want this function to be as close to 0 as possible to hold the highest level of accuracy. Let’s try and approach this from a visual perspective.

Graphical Depiction of the Cost Function and Gradient Descent with Multiple Variables

This is a graph of our cost function. Our goal is to find the lowest part of this function, where the cost difference displayed in our calculations earlier is the closest to 0. The lower the value of the function, the more accurate our model is. To find these local minima, we use another mathematical technique called gradient descent. This concept may be difficult to understand without fundamentals in calculus (primarily differentiation). However, let’s try and interpret this concept in a more intuitive sense, rather than a technical one.

Let’s say we were to take a tiny ball and place it at any point on our 3d graph. A ball on a curved surface will always try and roll downwards… simple physics, right? Let’s attempt to recreate this concept mathematically. The idea here is that our ball will eventually roll into a tiny “dip” or “valley”, also known as a local minimum.

The image above is a 2-dimensional depiction of what’s actually going on. Let’s shift our attention here for a second. The initial weight is the place on the function where our ball lies. It’s essentially a ‘random’ guess of what the weight could be, the same way we guessed 1.0 for all of our weights in our earlier calculations. If you have any experience with Single-Variable Calculus, you’ll know that the slope of a tangent line on a curve is called the derivative. The “minima” or “minimum value” of any concave-up parabolic function occurs when our slope is equal to 0 (horizontal slope). From this, we can understand that we’re trying to find the value of our weights when its tangent slope value or derivative value is equal to 0. Right now, let’s say our slope value for our initial weight is 5. We want to decrease this slope as much as possible. In this example, when our ball (weight value) moves a little to the left, our slope decreases. Since we want to decrease our slope, we’re going to move our ball a little to the left. We can repeat this slope-decrease process over and over again until our ball reaches the minimum value of 0. However, there might be a scenario where we move the ball too far to the left. To combat this, the amount we move our ball, also referred to as the step-size is proportional to our slope. The further our slope is from 0, the bigger our step size has to be. The closer it is to 0, the smaller our step size has to be.

After repeating these steps and iterating our weights through the use of Python, we’ll find that we have multiple different values that all represent the local minimum values of our cost function. The smallest one of these local minimum values is our global minimum value. Once we find that, we’ve now found the accurate weights for any data set for predictive models as such.

Note: In the actual application of the cost function, we use partial derivatives with respect to each weight. However, that requires an understanding of Multivariable Calculus and is significantly less comprehensible. Instead, understanding a 2-dimensional approach can make comprehension easier.

We’re now able to see how we can utilize the cost function to create a predictive model based on machine learning! Hopefully, you’re starting to see that ML really isn’t magic at all — it’s mathematics, data analytics, and logic. That's all there is to it!

What are the Limitations?

There are various different limitations with the cost function. One of the primary limitations is that its applications are very niche. There are many different methods to figure out these weights, methods that may only work for certain applicative cases. Eventually, we may be working with weights and biases for different machine learning models such as Linear Regression, Logistic Regression, or even Neural Networks. I would love to go in-depth on what these different ML models and algorithms actually do. However, that requires more of a specialized and technical approach, rather than an intuitive baseline. That being said, by now, we should have a strong understanding of how we can use data and predictive models to allow a machine to “learn”.

Neural Networks… What Are They, And How Can We Use Them?

I’m sure you’ve all heard of deep learning. It’s probably one of the reasons you may have clicked this article. It’s a concept that so many people use as a buzzword to a degree where we end up hearing it everywhere. However, for those of us who may not know what deep learning or neural networks are, it may be a bit confusing to understand what’s going on. Let’s recap the definition of neural networks and deep learning.

Neural Networks (NN) — RECAP:

The use of computer systems to mimic Biological Neural Networks (BNN); often utilized through a series of algorithms used to discover relationships between information, analyzed in an approach that mimics the inner workings of the human brain. A multi-layered Neural Network is referred to as a Deep Neural Network, lending itself over to Deep Learning (DL).

Types of Neural Networks:

There are 3 types of neural networks: Convolutional Neural Networks (CNN’s), Artificial Neural Networks (ANN’s), and Recurrent Neural Networks (RNN’s). In this article, we’ll only cover CNN’s for computer vision. However, ANN’s and RNN’s are just as important. The entire idea is for you to develop an understanding of how neural networks work on a fundamental level. I highly recommend doing research on ANN’s and RNN’s after reading this article once you’ve built a foundational understanding here.

In the meantime, here’s a list of applications for each type of neural network:


  • Image and Video Recognition
  • Image Classification
  • Computer Vision
  • Natural Language Processing
  • Brain-Computer Interfaces (BCI’s)


  • Text Classification
  • Information Extraction
  • Semantic Parsing
  • Language Generation


  • Speech Recognition
  • Generating Image Descriptions
  • Text Generation
  • Text Summarization

Earlier, I mentioned a classification problem where we use machine learning to give a machine the ability to differentiate between the handwritten numbers from 0 to 9, a problem under the umbrella of computer vision. We’re going to take a look at how to solve this problem using Convolutional Neural Networks (CNN’s).

Biological Neural Network (BNN) Diagram
Neural Network Example (Modeled off of Biological Neural Networks)

Input Layer: Layer in which we feed in our input (handwritten numbers in this case)

Hidden Layers: Layers in which calculations are done to figure out what our output is (in this case, the output would consist of the numbers from 0 to 9)

Output Layer: Layer that displays what our neural network output is

What are the dots? Each individual dot is referred to as a node. It essentially takes the input, transforms it in some way, and outputs it to the next layer, unless it’s in the input and output layer. Each individual dot is assigned a weight and bias. These two values are used to determine the multiplier and additive values that change to create an accurate predictive model.

Now that we have a basis on what neural networks look like, as well as how they function at a surface level, let’s use our problem-solving skills to work through this problem and consolidate our understanding.

Problem Introduction And Logic…

If we think about everything we’ve done so far, we’re constantly working with numbers — more so numerical values. Our car data points consist of numerical values. Our weights and biases are numerical values. However, how do we go about processing and analyzing an image? The answer is simple… numbers. We’ve already displayed how well computers are able to work with numbers. We can take any image and split it up into individual pixels. We can then turn each individual pixel into a number between 0 and 255 with each number representing a different shade of gray (with 0 being true white and 255 being true black). For the purpose of this problem, our training data set is going to consist of handwritten numbers in 16x16 pixels. We can then create an array of numbers with 256 slots with each number for each pixel. We can do this for every single handwritten number image. We should now have a set of arrays, each one representing a different handwritten number.


One of the primary issues presented with this problem is that even if we train a neural network to recognize handwritten numbers, the placement of that number on the 16x16 grid can greatly impact our output. We use convolution and convolutional layers to combat this. Think of a convolutional layer as a type of filter. These filters try and detect “patterns”. Let’s try and understand this using our example of handwritten numbers. Let’s say we have the number 5. This number consists of a straight line on the top, a vertical line on the left, and 3/4ths of a circle on the bottom. The circle on the bottom can be split into even more patterns, consisting of the top-right section, the bottom-right section, and the bottom-left section. If our neural network can detect each individual section, we can put these together and clearly see that our number is a five. This is the intuitive logic for what we’re attempting to achieve.

To go deeper into how these convolutional layers work, we need a little bit of knowledge in Linear Algebra. If you don’t understand this section, don’t fret 😬. For those who understand the fundamentals of linear algebra, think of each convolutional layer as its own matrix. We can choose how big our matrix is. In this situation, we may want to use a 3x3 matrix. Essentially what happens is that our matrix “convolves” around our 16x16 image. This means that it goes through every 3x3 block in our 16x16 image to search for patterns. The mathematics behind what’s going on here can be incredibly complex to understand so for now, all you need to know is that these individual filters act like a “pattern detector”.

Note that If this is a little confusing for you, that’s totally okay. It took me hours and hours of sweat, blood, and tears to understand the complexities behind convolution and what’s ACTUALLY going on. For now, focus on how we’re using convolution, over how it works.

So what’s our Convolutional Neural Network actually doing?

Think of it this way. Our first layer in our neural network is our input. We feed it our handwritten images. Our second layer is our primary convolutional layer. We can have as many as we want, but for now, we can simplify it to only two. Our first convolutional layer may use pattern detection to detect edges and their positionings. For example, the number 1 has 2 distinct edges, the number 2 has 1 distinct edge, etc. Our second convolutional layer may use the same linear algebra concept to detect parts of curves. For example, the number 8 has 8 distinct parts of curves (4 for the top circle and 4 for the bottom), the number 5 has 3 distinct parts of curves (quarter-circles in the bottom section), the number 9 has 4 (quarter-circles in the top section), etc. We can then have a hidden layer to assemble the parts of our curve, and then use another layer to assemble the curves and edges together. Our end result should be something that resembles a number if done correctly.

Where does training come into play?

Our neural network unfortunately isn’t finished yet. 😔 If we were to provide a random handwritten image of a number from 0 to 9, our output most likely wouldn’t be correct. That’s because we haven’t trained our neural network with data yet. This is where weights and biases come into play. Weights and biases are essentially numerical values in the Real Number Set that are assigned to each individual node. Each node has a different weight and bias. These numbers essentially determine the importance of each node. For example, certain patterns may have a higher influence in analyzing our handwritten numbers than others and can create a massive impact on whether our output will be correct or not. Essentially, the way we train our neural network is beginning with random weights and biases, calculating how wrong we were, and automatically adjusting our weights and biases in a way that increases our accuracy. This is where the cost function comes into play. Our weights are updated here in the same method as they were in our previous example with machine learning and gradient descent. After thousands of iterations and test images fed to our neural network, we should eventually have a neural network that can analyze new images that it hasn’t seen before, and classify them as a digit from 0 through 9 with an incredibly high degree of accuracy. We’ve now just learnt exactly how to create a deep convolutional neural network! 🤯

How Can You Learn ML nd DL?

The process of learning artificial intelligence, machine learning, and how to code neural networks is no doubt intimidating. That being said, I genuinely believe that every single one of you who’s gotten this far has the potential to become a master in this field. You don’t need a Ph.D., all you need is a will to learn.

I highly recommend learning the fundamentals of both Single variable and Multivariable Calculus, as well as Linear Algebra. While they’re not mandatory for understanding machine learning and deep learning, they significantly increase our comprehensive abilities. As well as this, it’s important to have a proper understanding of theory and machine learning algorithms such as the ones stated earlier in this article (Linear and Logistic Regression, Decision Tree, SVM, etc.) Andrew Ng’s Stanford Machine Learning course on Coursera is a great option with a high level of recognition. Picking up the programming language Python is essentially mandatory, especially with it being the industry standard within this specific sector of computer science. Picking up either PyTorch or Tensorflow (Python ML libraries) can be incredibly beneficial in translating our theoretical knowledge into fruition. After going through these and learning these essential skills, all that’s left to do is work on projects that you care about and start implementing what you learnt.

50% of you are saying “What the h — ” in our right now. 25% have given up before even starting, 24% are signing up for therapy right now, and only 1% are actually planning on taking action. I was once told by the people that I look up to the most to have a “Bias Towards Action” Words like Calculus, Tensorflow, and Stanford might be super intimidating, and I understand that. But ANYBODY can learn all of this if they really tried. I’m not saying it’s not hard work, I’m saying that we need to stop thinking and start doing. I’m excited to see all of your journeys through machine learning and artificial intelligence, as well as what you’re able to achieve!


Throughout the course of this article, we’ve been directed through the rudimentary fundamentals of machine learning, neural networks, and deep learning. As well as this, we’ve been provided with an intuitive sense of the inner workings of these concepts with a seemingly substantial complexity in a method that’s relatively easy to understand. Let’s… drop the formalities for a second. What we’ve done today is learn some really incredible s***, stuff that’s on their track to genuinely disrupt every single industry out there! Let’s be excited about that, it’s no small feat.! Even just getting through this article shows how much determination you have.

You might ask, “How do machine learning and deep learning actually help us? How can we use this in our current lives?” The answer is that… it’s entirely up to you. We haven’t even touched the surface of what we can do with these incredible emerging technologies, as well as artificial intelligence as a whole. All I’ve shown so far is analyzing basic data or differentiating between different handwritten numbers. What I haven’t shown yet is using machine learning to differentiate between benign or malignant tumours, or using predictive analysis to predict patient diagnosis to ultimately save lives. If I were able to do that, maybe I’d finally be on my way to that trip to Hawaii that I long for. We still have so much to progress in terms of healthcare diagnostics, computer vision, speech recognition, data analytics, and predictive analysis.

There’s so much that we can do with these amazing technologies and it's entirely up to you how you want to use your knowledge and capabilities. We’re all learning, so let your ambition run wild and start changing the world. I look forward to seeing what you all come up with! ✌️

If you liked this article, please check out my other articles here, or check out my article on the role of AI and ML in the healthcare industry here.

Feel free to check out my other socials: LinkedIn, Twitter

Consider subscribing to my newsletter here! I talk a lot about my progress, experiences, and my struggles.



Aayush Grover
The Innostation Publication

Leveraging Artificial Intelligence and Blockchain technologies to propel societal transformation this decade.