From the Edge: Physically Constrained Neural Networks

Jake Tauscher
4 min readDec 14, 2020

--

This blog is part of a series on recent academic papers in the AI/ML community. By understanding what experts are spending their time researching, we will get a sense of the current limits and the future of the AI/ML world!

In early 2020, a group of scientists from UC Irvine, Columbia University, and the Technical University of Munich developed a framework to “constrain” neural networks with physical laws, ensuring the predictions of the network do not violate known physical laws.

Why is this interesting?

Neural networks are classically “unconstrained”. In training a (supervised) neural network, you provide the network a set of inputs and corresponding outputs (this is called your “training data”). Then, you define a cost function, which basically is a way to put a number to “how far off” your network’s predictions are from the expected outputs. Then, you algorithmically adjust the network to minimize this cost function –moving the predictions as close as possible to the actual outputs. And, this approach is super powerful! It has actually been proven that a neural network can approximate any function (this is the Universal Approximation Theorem, and if you are interested in reading more about it, this online summary from Michael Nielsen is the best description I have found).

So, what is the issue that these researchers are trying to solve? Well, training data is rarely complete. Consider a very simple example. Suppose we have two points of training data: (2,4) and (3,9).

The world’s simplest training data, plotted on a graph

Then, say we are trying to predict the output when our input is 4. So, what function will our neural network approximate?

Well, given our limited training data, it could approximate a few different functions! For example, y = 5x — 6 works very well.

The world’s simplest training data, with one function that approximates the points well

However, we can observe that in this (contrived and unrealistic) example, y = x2 would work too!

So, what will our neural network predict? For an input of 4, is the answer 14 or 16? It actually will depend on how our network is initialized and trained — it could end up predicting either. But, only one is the real relationship, so our network has a decent chance of being incorrect in making its prediction for 4.

So, what does this way over simplified example show? Basically, given the incompleteness of training data, it can be possible to have a network that fits the training data well, but doesn’t generalize to new data it hasn’t seen successfully. In this case, it has misunderstood the actual underlying relationship of the data.

So, how can we push the neural network not only to fit the training data, but to be more likely to generalize well to unseen data? There are a lot of techniques for this, but one in the scientific field is to ensure that the network not only fits the training data, but also follows well established physical laws. Because, if the neural network is making predictions that violate physical laws, we know it has misunderstood the nature of the relationship in the training data!

Tell me the details!

In this paper, the researchers introduce a methodology for constraining neural networks that must meet multiple physical laws (they note that previous work in this area has attempted to constrain a network by a single equation).

To demonstrate this, the researchers compared 3 types of neural networks in modeling the behavior of a climate system. The nets differed in how (of if) they applied “constraints”, which are the known physical laws that the climate system must follow. In this case, the constraints were “conservation laws” — conservation of mass, conservation of energy, and conservation of radiation.

The first type of net was an Unconstrained Net (UC). This is a net that is trained solely on the data, with no other constraints added in.

The second was an Architecture Constrained Net (AC). This net enforces n constraints precisely, by adding n additional “residual” layers to the neural net that essentially generate “plugs” to ensure the predictions satisfy the various constraints.

The third was a Loss Constrained Net (LC), in which a “penalty factor” is added to the cost function for violating physical laws (this is considered a “soft constraint”, whereas the AC uses a “hard constraint”). This penalty factor calculates “how far off” the output is from meeting each of the n constraints, and then this penalty is added to the cost that the network is attempting to minimize.

So, what did they (and we) learn?

The researchers found that their constrained networks worked very well! The AC network was basically able to equal the performance of the UC in fitting the data, but to do so in a way that also satisfied physical laws (and is thus more likely to generalize).

The LC net’s performance changed based on how much “weight” was given to fitting the physical laws (vs. fitting the data). As the LC net was constrained more to fit the physical laws, its overall performance in fitting the data decreased modestly (as would be expected).

Interestingly, the more constrained neural networks did not generalize to the test data more successfully — they basically replicated their performance on training data, which means the UC network generalized just as well as the AC/LC. I would be interested to see additional analysis based on this work that discussed this behavior, because this surprised me a bit! As discussed in this blog, I would have expected to see the AC/LC nets generalize to unseen data more successfully — this could be a function of relatively similar training and test data? But, regardless of that follow-up question, this was a very interesting paper.

And you can read the paper yourself!: arXiv:1909.00912v4

--

--