Unveiling Multivariate Shallow Neural Network Magic

Insights into Multivariate Neural Network Applications

Mohit Mishra
Nerd For Tech
7 min readJan 28, 2024

--

Hello, everyone. I hope you are doing well. This is the second part of my Shallow Neural Network series, building on the previous blog post about single input and single output. This section will delve into Multivariate Inputs and Outputs, as stated in the title.

In the previous blog, we discussed a neural network with only one input (x) and one output (y).

The Universal Approximation Theorem also applies in the more general case where the network converts multivariate inputs into multivariate output predictions.

Source: Image by the author.

Initially, we will begin by examining the process of expanding the model’s capabilities to forecast multivariate outputs. Subsequently, we will delve into the topic of incorporating multivariate inputs into the model.

Visual Explorations of Multivariate Output Prediction

In the current context, we are exploring the scenario of multivariate output with a single input. Let’s consider a situation where we have a neural network with four hidden units(h1, h2, h3, h4) and a single input denoted asx. The objective is to obtain a 2D multivariate output represented as(y1, y2).

The values of the hidden units can be calculated as follows:

Source: Image by the author.

The values for the multivariate output can be calculated as follows:

Source: Image by the author.

The two outputs are represented as two distinct linear functions of the hidden units.

Let’s take a closer look at the network architecture:

Source: Image by the author.

The “joints” in piecewise functions are determined by where the initial linear functions are cut off by the hidden units’ ReLU (Rectified Linear Unit) functions. Because both y1 and y2 are distinct linear functions of the same four hidden units, the four “joints” in each function must be in the same positions. For a better understanding, see the below visualization:

Image by Simon on Understanding Deep Learning Book

Adding more hidden units to the network can improve its ability to understand intricate patterns, especially when dealing with high-dimensional input data. This means that the network becomes better at capturing and representing complex relationships within the data.

With the introduction provided above, let’s now proceed to explore the concept of multivariate inputs.

Visual Journey into Multivariate Input Analysis

Source: Image by the author.

To handle multiple input variables x, we expand the linear connections between the input and the hidden units. For instance, a network with two inputs x = [x1, x2] and a single output y (as shown in the above figure) could have three hidden units defined by:

Source: Image by the author.

In this scenario, each input now has its own slope parameter. The hidden units are then combined to produce the output conventionally:

Source: Image by the author.

When there are more than two inputs for the model, visualizing them becomes difficult. Despite this, the interpretation is similar. The output will be a continuous function composed of linear segments based on the input, which will now form convex polytopes in the multidimensional input space.

In layman’s terms, when there are multiple input variables, it becomes difficult to visualize them all at once. However, how the model interprets these inputs remains consistent. The model’s output will be a smooth, continuous function, but instead of a single straight line, it will be made up of connected straight-line segments that form specific shapes in the multidimensional space containing our input data.

As the number of input dimensions increases, so does the neural network’s ability to define distinct regions. To demonstrate this, consider each hidden unit as a boundary that separates the space where it is active from the space where it is not. If we had the same number of hidden units as input dimensions, we could line up each boundary with one of the coordinate axes. For example, with two input dimensions, the space would be divided into four quadrants. It would produce eight octants in three dimensions and two orthants in N dimensions. Because shallow neural networks have more hidden units than input dimensions, they can generate linear regions larger than 2N.

Now that we’ve covered both multivariate inputs and outputs, let’s try to generalize the case.

Towards Generalization: Shallow Neural Networks and Multivariate Data

So far, we’ve looked at examples of shallow networks to help us understand how they work. We now have enough understanding to define a general equation for a shallow neural network y that maps a multidimensional input x to a multidimensional output y via h hidden units. Each hidden unit can be calculated as follows:

Source: Image by the author.

and these are linearly combined to produce the following output:

Source: Image by the author.

Let’s visualize the neural network with 3 inputs, 5 hidden units, and 4 outputs.

Source: Image by the author.

The activation function is critical in enabling the model to capture complex, non-linear relationships between input and output. Without an appropriate activation function, or if a linear activation function is used, the model’s ability to represent non-linear patterns is limited, and the overall mapping from input to output is restricted to linear transformations only. In simpler terms, the activation function enables the neural network to learn and represent more intricate and non-linear patterns in the data, which is essential for solving complex real-world problems.

In neural networks, different activation functions are used to process input data. One of the most commonly used activation functions is ReLU (Rectified Linear Unit), which is favored for its simplicity and ease of interpretation.

When ReLU activations are applied, the network divides the input space into distinct geometric shapes known as convex polytopes. These polytopes are formed by the intersections of hyperplanes determined by the ReLU functions’ “turning points.”. Each polytope contains a distinct straight-line function, and while the polytopes are identical for each output, the linear functions they contain can differ.

A convex polytope is a geometric shape in which, if you choose two points within the shape and draw a straight line between them, the entire line will remain within the shape.

Now, let us discuss hyperplanes. A hyperplane is a flat surface with one less dimension than the space it is in. In a four-dimensional space, a hyperplane is a three-dimensional flat surface.

So, a convex polytope is a shape in which any line drawn between two points remains inside the shape, whereas a hyperplane is a flat surface with one less dimension than the space it is in.

About Me

My name is Mohit Mishra, and I’m a blogger that creates intriguing content that leave readers wanting more. Anyone interested in machine learning and data science should check out my blog. My writing is designed to keep you engaged and intrigued with a regular publishing schedule of a new piece every two days. Follow along for in-depth information that will leave you wanting more!

If you liked the article, please clap and follow me since it will push me to write more and better content. I also cited my GitHub account and Portfolio at the bottom of the blog.

All images and formulas attached have been created by AlexNail and the CodeCogs site, and I do not claim ownership of them

Thank you for reading my blog post on Unveiling Multivariate Shallow Neural Network Magic. I hope you find it informative and helpful. If you have any questions or feedback, please feel free to leave a comment below.

I also encourage you to check out my portfolio and GitHub. You can find links to both in the description below.

I am always working on new and exciting projects, so be sure to subscribe to my blog so you don’t miss a thing!

Thanks again for reading, and I hope to see you next time!

[Portfolio Link] [Github Link]

--

--

Mohit Mishra
Nerd For Tech

My skills include Data Analysis, Data Visualization, Machine learning, and Deep Learning. I have developed a strong acumen for problem-solving, and I enjoy ML.