Deep Learning applied to NonIntrusive Reduced Order Modelling and Parameter Reduction

Published in

SISSA mathLab

9 min readNov 14, 2023

Deep learning is a well studied field with lots of practical applications and research topics. As application examples, we have image recognition, natural language processing, and music generation. Artificial Neural Networks have achieved good results in dealing with these problems thanks to their deep structures and their ability to approximate functions. The universal approximation theorem guarantees that big enough neural networks can approximate any continuous function [1].

But what does big mean?

Artificial Neural Networks are composed of elementary units, called neurons, which are parametrized non-linear functions. The size of a neural network is determined by the number of neurons and by how they are combined.

We will focus on the simplest type of neural network, the Feedforward Neural Network (FNN). In a FNN the neuron is a parametrized linear function composed of a (non-parametrized) non-linear function. A layer is a sequence of neurons that are calculated in parallel. Sequences of layers (which are computed sequentially) are called neural networks.

In the following image a FNN with two layers (one with three neurons and one with two neurons) is shown.

Fig. 1: A Feedforward Neural Network with two layers.

The number of incoming arrows of a neuron represents the number of inputs. The number of outcoming arrows represents the number of outputs. So, globally, we can say that the graph above represents a nonlinear trivariate function with output on the Cartesian plane.

Feedforward Neural Networks can be applied to time independent parametrized partial differential equations. For example, the following partial differential equation (Eq. 1)

which depends on the parameters a and b, both positive.

These differential equations are usually solved by discretizing the domain and by solving an approximated finite dimensional system for a specific value of a and b (Fig. 1).

Fig. 2: Solution of the PDE for a=1 and b=1.

For this reason, if the parameter also appears in the domain of the equation, the computational cost is significantly higher: depending on the partial differential equation and on the domain number of degrees of freedom a simulation can take also days or weeks to compute for every value of the parameter.

For this reason usually given the partial differential equation map (Eq. 2)

Eq. 2: General PDE map.

where d is the dimension of the PDE parameter μ, and n is the dimension of the finite approximation of the PDE, a surrogate map (Eq. 3)

Eq. 3: General PDE surrogate map.

with faster evaluation time is computed. Because of the universal approximation theorem, neural networks are an optimal choice for computing such a surrogate map, under the assumption that u is continuous with respect to μ.

For the discussion that follows we assume to have some sampling of the parameters (Eq. 4)

Eq. 4: Set of PDE parameters.

and some corresponding PDE evaluations (Eq. 5).

Eq 5: Set of PDE evaluations.

These hypotheses do not include any knowledge of the PDE equation. For this reason, this framework is called Non-Intrusive Reduced Order Modelling. Note that Neural Networks can also be applied when the PDE equation is known [2].

There are mainly two ways in which neural networks can be applied to Non-Intrusive Reduced Order Modelling.

Discriminative models

The first way is using discriminative models, in which the surrogate map is computed directly. Theta represents the parameters of the neural network. This is a simple regression problem and the neural network weights can be computed by optimizing on a distance function l (Eq. 6).

Eq. 6: Optimization problem of a discriminative model.

Then the surrogate map is simply the following (Eq. 7).

Eq. 7: Computation of the PDE surrogate map from the trained discriminative model.

Fig. 2: Plot of the relative reconstruction error of the simple FANN model.

The figure above (Fig. 2) shows the behaviour of the relative reconstruction error of the feedforward artificial neural network over the size of the error. As predicted by the universal approximation theorem the relative error decreases with increasing size.

A recent example of these models are the Neural Operators [3].

If the dimension of the solutions is too high, sometimes it is convenient to first reduce the dimensionality of the solution by finding two maps (Eq. 8)

Eq. 8: General framework for dimensionality reduction.

Examples of these maps are Proper Orthogonal Decomposition (POD) and Autoencoders.

In this case if we define the encoded PDE evaluations (Eq. 9)

Eq. 9: Encoded PDE evaluations.

we can learn a surrogate map using a feedforward artificial neural network (Eq. 10)

Eq. 10: Surrogate map for encoded PDE evaluations.

by solving the optimization problem of Eq. 11:

Eq. 11: Optimization problem of an encoded discriminative model.

At this point, we find a map to the solution space (Eq. 12)

Eq. 12: Computation of the PDE surrogate map from the trained encoded discriminative model.

In SISSA mathLab, a package called EZyRB [4,5], which implements this maps (and much more) is currently maintained.

Fig. 3: Convergence of a model composed of POD (for dimensionality reduction) and a FANN. Again the universal approximation theorem guarantees that the relative error (approximately) decreases with increasing size.

Generative models

The second way is using generative models, in which it is assumed that (Eq. 13)

Eq. 13: Standard assumption of the generative models.

In this case the neural network does not approximate the PDE map anymore, but it models a surrogate probability parametrized by alpha by solving the following (Eq. 14):

Eq. 14: Standard objective of a generative model.

where D is a divergence between distributions. Examples of divergences between distributions include Kullback-Leibler Divergence and Fréchet Distance. From the parametrized probability new samples can be generated.

In the context of Reduced Order Models is usually further assumed that there exists some unknown latent variables (Eq. 15),

Eq. 15: Latent variable generative models.

which can be estimated. These models are called Latent Variable Models. For an example of the estimation process see the concept of Evidence Lower Bound.

But what about the PDE parameters?

The parameters can be integrated into this framework in two ways.

The first way is inserting them in the latent space, while still leaving some latent variables unknown (Eq. 16).

Eq. 16: Standard assumption of latent variable conditional generative models.

The models with this assumption are called Latent Conditional Generative Models. For an applied example of Reduced Order Models, see Generative Adversarial Reduced Order Models (GAROM) [6].

We can also easily find a ROM model using the integral operator (Eq. 17).

Eq. 17: Computation of the PDE surrogate map from the trained latent variable conditional generative model.

The integral can be approximated by the law of Large Numbers.

Fig. 4: Loss of the GAROM model with increasing size.

The second way is to not include the parameters in the latent space and to estimate a map from the parameters (Eq. 18),

Eq. 18: Map from the parameters to the generative model latent space.

using Eq. 19.

Eq. 19: Equation form estimating the map from the parameters to the latent space.

With this map we can compute the ROM equation (Eq. 20).

Eq. 20: Computation of the PDE surrogate map from the trained generative model and a trained map from the parameters to the latent space.

For an example see [7], which is based on Variational Autoencoders.

Fig. 5: Convergence of a model composed of a VAE for probabilistic dimensionality reduction and a FANN.

But why should we complicate our lives and use a Generative Model when we still get a deterministic function?

The main advantage of the Generative Models is that you are not forced to get a deterministic function. You can remove the expectation from the equation and get a probability distribution instead of a single value. From the probability distribution you can for example estimate the variance, which gives you the uncertainty on the inference. Having probabilities also gives a very useful performance measure of the models: the probability evaluation.

If what are you saying is true, why isn’t everyone using Generative Models?

Generative models have a drawback: their inference is typically harder than discriminative models. We can easily see this by looking at all the plots together.

Fig. 6: Relative reconstruction error over the size for all the models that we have seen.

Notice that both generative models perform worse of both the discriminative models (Fig. 6). This happens not because they are weaker models, but because their estimation in general is partly stochastic (as they are probability models they often require sampling during the training). This makes their inference much harder, and they often need a significant amount of more tuning to get the same results.

How will deep learning be used to speed up physical simulations?

Artificial Intelligence for the Natural Science progress

medium.com

Parameter Reduction

One of the main problems of reduced order modelling is the curse of dimensionality: when the dimension of the PDE parameter increases, the performance of the reduced order models decreases if the data does not increase exponentially. This has brought the need to use some techniques to reduce the number of PDE parameters.

For a general overview of the topic of parameter reduction see Parameter space and model order reduction for industrial optimization, written by Marco Tezzele and published by SISSA mathLab [9].

Parameter space and model order reduction for industrial optimization

Innovations in naval engineering

medium.com

Latent Variable Generative Models can be used for Parameter Reduction if the domain of the PDE is parametrized. In fact if the dimension of the unknown variable z is less than the dimension of the parameter mu, new PDE domain samples can be generated using the generative model. Then the associated latent variable z can be used as a parameter instead of mu when performing reduced order modelling [10]. This brings better performance, with the cost of potentially reducing the variability of the new domains and solutions (Fig. 7).

Fig. 7: Moving from μ to z increases the performance of all the models.

Conclusion

We have seen how Feedforward Neural Networks can be used to perform non-intrusive reduced order modelling on time independent problems. We have only scratched the surface about how deep learning can be applied to non-intrusive reduced order modelling. For example, if the partial differential equation is time-dependent then Recurrent Neural Networks are a better choice. We have also seen how generative models can be used to perform parameter reduction.

References

[1] Yarotsky, Dmitry (2021). “Universal Approximations of Invariant Maps by Neural Networks”. Constructive Approximation. 55: 407–474. arXiv preprint arXiv:1804.10306.

[2] Raissi, Maziar; Perdikaris, Paris; Karniadakis, George Em (2017–11–28). “Physics Informed Deep Learning (Part I): Data-driven Solutions of Nonlinear Partial Differential Equations”.

[3] Nikola Kovachki, Zongyi Li, Burigede Liu, Kamyar Azizzadenesheli, Kaushik Bhattacharya, Andrew Stuart, Anima Anandkumar. “Neural Operator: Learning Maps Between Function Spaces”. arXiv preprint arXiv:2108.08481.

[4] Nicola Demo, Marco Tezzele, and Gianluigi Rozza. EZyRB: Easy Reduced Basis method. https://joss.theoj.org/papers/10.21105/joss.00661

[5] https://github.com/mathLab/EZyRB

[6] Dario Coscia, Nicola Demo, Gianluigi Rozza. “Generative Adversarial Reduced Order Modelling”. arXiv preprint arXiv:2305.15881.

[7] Alberto Solera-Rico and Carlos Sanmiguel Vila and M. A. Gómez and Yuning Wang and Abdulrahman Almashjary and Scott T. M. Dawson and Ricardo Vinuesa. “ β-Variational autoencoders and transformers for reduced-order modelling of fluid flows”. arXiv preprint:2304.03571.

[8] Dario Coscia. “How will Deep Learning be used to speed up Physical Simulations?”. https://medium.com/sissa-mathlab/how-will-deep-learning-be-used-to-speed-up-physical-simulations-8634cda1022a

[9] Marco Tezzele. “Parameter space and model order reduction for industrial optimization”. https://medium.com/sissa-mathlab/parameter-space-and-model-order-reduction-for-industrial-optimization-e905a5429dd8

[10] Guglielmo Padula, Francesco Romor, Giovanni Stabile. “Generative Models for the Deformation of Industrial Shapes with Linear Geometric Constraints: model order and parameter space reductions.” arXiv preprint:2308.03662.