An Introduction to Neural Networks

Sean Gahagan
3 min readOct 20, 2022

--

My last note looked at the problem of overfitting machine learning models and a best practice to avoid it called regularization. This note looks at neural networks, what they are, and how they work conceptually.

What is a neural network?

In its most simple form, a neural network can be just an input, a node (or neuron) where a calculation uses that input, and the output of the calculation.

This by itself isn’t very powerful, but when you combine many of these nodes into a network, they can operate like how neurons in our own brain work. Using linear algebra and calculus to execute a concept called backpropagation of error, these networks of neurons can automatically adjust the weights of the connections between them by assessing how prediction errors map back to different connections in the network. This is conceptually similar to neurons in our brain adjusting the strength or weakness of connections to their neighboring neurons.

Neural networks are typically designed in layers, where the first layer is your input layer, which is a set of nodes that multiply each of the network’s features by their respective weights/parameters. The last layer is the output layer, where nodes provide the final calculation of the thing you’re trying to predict. In between the input and output layers are the hidden layers of nodes.

For calculations in the hidden layers and output layers, the nodes of networks can use a function like the sigmoid function or rectified linear units.

Neural Networks at Work

To illustrate how this is helpful, let’s go back to our earlier home price prediction example. Let’s say we want to use many more features to help with our prediction, but we’re not sure how they all might be related.

We may know that ZIP code and average income may correlate with the quality of the school districts, and so we might try to combine these two features into a new feature about school district quality, but we’re not sure exactly how those correlate or how many other ways that ZIP code and average income may factor into other features that are important to home price prediction. There may be many features we don’t even know about.

Rather than trying to manually build and assign the weighted relationships between these inputs, we can let a neural network determine them for us. We can do this by connecting each input to each node in a hidden layer and then using backpropagation of error to adjust the weights as we train the model.

With multiple hidden layers, we can not only let the model determine what the meaningful relationships are between those inputs, but also what the meaningful relationships are between those relationships. So on and so forth.

Video example: Predicting stadium ticket sales

The following short video (less than 6 minute) will walk through an example of how a very simple neural network can learn to predict what percentage of a sports stadium’s seats will be filled for a given game:

Scaling the power of neural networks

Neural networks have proven to be extremely powerful at making predictions. But creating larger and more powerful models is limited by the computational resources required to run a large neural network and the time it takes for machine learning engineers to train the model and iterate on it.

Up next

My next note will look at classification models using neural networks.

Past notes in this series:

  1. Towards a High-Level Understanding of Machine Learning
  2. Building Intuition around Supervised Machine Learning with Gradient Descent
  3. Helping Supervised Learning Models Learn Better & Faster
  4. The Sigmoid function as a conceptual introduction to activation and hypothesis functions
  5. An Introduction to Classification Models
  6. Overfitting, and avoiding it with regularization

--

--