A Simpler Alternative to Neural Nets

Nick Sohacki
4 min readMay 12, 2024

--

A Simpler Alternative to Neural Nets

The fundamental problem neural nets try to solve is approximating a function only given the inputs and outputs of the function, but not the function itself. For example, you may be given the following information:

The goal of neural nets is to approximate what f(x, y) actually is.

The problem with neural nets is that they are complicated, specifically in the choices one must make when using them: how many hidden layers there should be, how many neurons per layer, what activation function should be used, etc. This involves a lot “knob-tuning” which is annoying, especially if you just want to approximate a simple function. In this article, I present a simpler way to approximate a function that removes much of the complexity involved with neural nets.

Parameterizing Polynomials

Let’s assume the function we’re trying to approximate is the following:

Consider that each term has three values associated with it: a base (in this case, x or y), a coefficient (all 1 in this case), and an exponent. Now consider another function where instead of x and y being the parameters, they are instead constants, and the coefficients and exponents are parameters instead. This function looks like the following:

This function will be our approximated function. The question now is: what should the values of the parameters be (we’re pretending we don’t know what the actual function is)?

For starters, we can initialize all of them to 1:

Now let’s say one of the samples we were given for the function was the following:

If we were to use our approximated function for these x,y values (4,5), we would get the following:

The output is clearly wrong. The question now is: how should we change the parameters such that the gap between the actual function output (41) and our approximated function output (28) is reduced?

The Cost Function

Let’s define another function that tells us how far away our approximated function is from the correct output:

The goal now becomes to minimize this function — the closer its output is to zero, the more accurate our approximated function is. Minimizing a function can be achieved through gradient descent.

Gradient Descent

To apply gradient descent, we must first take the partial derivative of the function with respect to each variable. The following are the partial derivates for some of the variables of the cost function:

Plugging in our values for the partial derivative with respect to a, we get:

Now we can adjust our value for parameter a by the opposite of -1704, however, in order to avoid overshooting the minimum of the cost function, we apply a “learning rate”; a good starting learning rate is something small, like 0.000001. The following shows a’s adjustment:

Repeat this process for each parameter and, in theory, the output of the approximated function with the updated parameters should be closer to the actual output.

Now, continuously compute and apply the partial derivatives to the approximated function’s parameters (and add more samples to avoid overfitting) and the approximated function should converge to the actual function over time.

Conclusion

Notice that this method doesn’t include backpropagation — it’s not necessary as there are no “layers”. It also doesn’t have “activation functions” — it’s support of non-linearity depends on the terms within the approximated function, which you have full control over.

Thanks for reading.

--

--