How Graph Neural Networks can be used to accelerate and replace physical simulations

Merantix Momentum

Published in

Merantix Momentum Insights

14 min readFeb 22, 2023

Author: Winfried Ripken

Introduction

A boundary value problem consists of a range of partial differential equations (PDEs) and a set of additional boundary conditions, which restrict the space of possible solutions further. Boundary value problems occur in many physics disciplines, such as thermodynamics or electromagnetics, and are, therefore, crucial to understanding and controlling physical processes in a variety of domains ranging from the design of electric engines to weather forecasting.

Solving such boundary value problems is both challenging and relevant for many real-world applications. On top of this, the possibility of using numerical simulations makes the generation of ground truth data straightforward. Boundary value problems, therefore, offer great potential to be solved using machine learning techniques that can either be faster, more accurate, or in other ways preferable to classical simulations. In this blog post, we first discuss state of the art for the intersection of these two interesting topics. In the second part of the blog post, we give an outlook of the possible directions for future work and an overview of problems we find particularly interesting to investigate further.

Figure 1: A simple example of a boundary value problem: the electrical potential that results from a distribution of charges can be described as an instance of the Poisson equation. In this case, a circular charge and the resulting electric potential is visualized. Coordinates and physical quantities are plotted relative to normalization constants and without physical units for simplicity.

Before we discuss applications of ML to this domain further, let’s have a look at a simple example of a PDE and corresponding boundary conditions. Let u be a twice differentiable function over an open domain Ω and f(x) a sufficiently smooth function defined on Ω. The Poisson equation in its general form is then given by:

A common boundary condition is to fix the value of u at the boundary of Ω to values defined by another function g(x). This type of boundary condition, where the value of the solution is known along the boundary of the domain, is known as Dirichlet boundary condition:

Following Langtangen et al. (2017), a practical example of the Poisson equation is found in electrostatics. The electric potential that is caused by a distribution of charges can be found by solving the Poisson equation for:

where ρ is the charge density at each point and ϵ is a material constant called permittivity. For simplicity, we can set g(x)=0, such that the potential is zero everywhere at the boundary. A practical example of a specific circular charge distribution can be seen in Figure 1. We will next discuss different ways to solve boundary value problems in general and their various advantages and disadvantages.

(Non-)Neural Approaches for PDE solving

Classical numerical solvers

Conventional solvers like the Finite element method (FEM) discretize the PDE on a mesh to obtain a tractable approximate solution. They can be used as generic solvers to solve a broad range of PDEs. The accuracy of the produced solutions depends on the resolution of the FEM mesh imposing a tradeoff between runtime and quality of the solutions.

The discretization of the function space leads to a matrix equation that is solved to obtain an approximate solution of the respective PDE. This can be done using exact or iterative methods of linear algebra. Logg et al. (2012) provide an extensive introduction to the topic and also describe the FEM solver fenics as one example of an open-source implementation in greater detail.

While FEM simulations produce high-quality results, they typically are numerically expensive due to the numerical solving of the resulting matrix equation. On top of that, any slight variations in the parameters of the problem require a full rerun to obtain the updated solutions.

Physics-informed networks (PINNs)

Physics-informed networks (Raissi et al. 2018) directly use neural networks to parametrize a solution for a single boundary value problem with given constraints. The idea is to use the physical equations (PDEs) to generate a loss for training the network. Similarly, the boundary conditions are enforced via the loss function. When we apply gradient-based optimization to this setting, we can train a neural network that learns to approximate the solution to a single combination of PDEs and boundary conditions over time. Similar to classical numerical solvers, this optimization procedure needs to be repeated for every change in the parameters of the PDEs or the boundary conditions to find the corresponding weights for the PINN. Note, however, that there is no ground truth physical simulation required to train a PINN. Figure 2 shows an example of a PINN for the electrostatics Poisson equation discussed above.

Figure 2: Example of a PINN learning the electric potential on a 2d grid. The PINN directly approximates the solution function u, therefore, in this case, it takes the spatial coordinates as input and predicts the potential at this point. The boundary conditions and PDEs are enforced as separate loss terms during training. To obtain the electric potential over the whole domain, the PINN needs to be executed multiple times, and results can be interpolated.

Neural Operators

In contrast to the previously discussed methods, Neural Operator networks (Lu et al. 2019) aim at learning the solution operator for a set of PDEs directly. This means the trained network will be able to handle different discretization of the domain and operate under changing boundary conditions. Once the network is trained, a change in the parameters of the PDEs requires only a single forward pass of the network to produce the updated solution.

The operator network takes the possibly varying boundary conditions and PDE parameters as input and directly outputs the solution function over the whole domain in a discretized form. A disadvantage of neural operator networks is that they are usually more difficult to train compared to PINNs and require ground truth data generated by a physical simulation. The ground truth data is then used to train the network in a supervised manner. An example of an operator network can be seen in figure 3.

Figure 3: Schematic example for a neural operator network trained on the electrostatics problem. The network takes the electric charge distribution as input and predicts the potential for the whole domain at once. Coordinates and physical quantities are plotted again relative to normalization constants and without physical units.

Fourier Neural Operators

A more recent prominent example of a neural operator network is the Fourier Operator Network introduced by Li et al. (2020.1). A common problem for operator networks using, for example, convolution layers is the lack of generalization to discretizations with very different resolutions. Li et al. (2020.1) propose to instead apply the neural operator in Fourier space, where it is fully resolution independent. By combining this learned operator with the Fast Fourier Transform (FFT), their approach is very fast and produces accurate results.

Figure 4: Schematic overview of a single fourier layer in the Fourier Neural Operator: First the input is transformed using the FFT and high frequencies in the fourier space are omitted. A linear learned transformation is applied to the lower fourier modes. The output of the linear transformation is transformed back to real space. In parallel to the fourier layer, a linear pointwise transform is learned and later the results are added up. Image source: Li et al. (2020.1)

Though Fourier Operator Networks are robust to variations in grid resolution, they rely on FFT with complexity O(n log n), where n is the number of nodes in the graph. Computing the Fourier transform becomes challenging in the case of an arbitrary grid. As FFT cannot be used anymore, the complexity of the Fourier transform grows to O(n²), which becomes infeasible for large graphs. As a workaround, Li, et al. (2022) propose to learn an additional deformation step to transform irregular grids, which is more accurate than interpolating the solution on a uniform grid. However, they investigate only a relatively small subset of possible transformations and do not investigate generalization performance to differently shaped grids.

In conclusion, existing work most commonly improves the performance on a single axis, to become either very robust with respect to resolution or handle irregular grids efficiently. Whether neural operators can generalize well across both very different resolutions and grid shapes remains an active research question.

GNN-based Neural Operators:

Graph neural networks (GNNs) are a general class of neural network architectures that describe operations directly carried out on graphs. Predictions can be made on the level of nodes, edges, or for the whole graph and are usually carried out via a technique called message passing: Neighboring nodes send messages to each other, and aggregated message information is used to form the subsequent node states. GNNs are a natural fit for irregular grid data and can be applied directly to any training data generated by a ground truth FEM simulation. Li, et al. (2020) are among the first to propose using a simple message-passing operator, in this case, the edge-conditioned graph convolution (Simonovsky, et al. 2017, Gilmer, et al. 2017), to learn the solution operator for various PDEs. See figure 5 for a detailed illustration of a GNN operator that operates on a discretized grid. In the following section, we will specifically discuss GNN models in greater depth and lay out further directions of research.

Figure 5: A GNN operator can be straightforwardly applied to any discretized grid. The operator learns to transfer information between nodes via message passing.

Research directions for training a good GNN operator on PDEs

First of all, it is interesting to highlight why we should look closer at GNNs for approximating the solution operators for PDEs: Besides the natural ability of GNNs to cope with irregular grids, research has shown great generalization performance of such models, even across very different mesh resolutions in the train and test set (Li, et al. 2020, Pfaff, et al. 2020). In addition, it becomes straightforward to connect different coordinate systems like the mesh space and Euclidean world-space for modeling mesh deformations or even adaptively predict the mesh resolution over time (Pfaff, et al. 2020).

Data augmentation and generalization

The first interesting direction of research is how to achieve the best possible generalization performance for the trained operator. This is especially important for transferring models in practice, where we ideally want to replace expensive FEM simulations altogether. In this case, the model, which has been trained only on a subset of simulated data, needs to be able to generalize to previously unseen PDE parameters, boundary conditions, and different meshes. A common technique to enhance generalization capabilities is to apply augmentation to the training data in order to increase the difficulty of the training task: in our case, to discover the true solution operator instead of simply memorizing solutions from previous simulations.

At Merantix Momentum, we compared different augmentation techniques for this problem in order to identify those which enable the model to generalize best to differently shaped meshes during test time (Lötzsch et al. 2022). We developed a novel technique, which we call Mesh Augmentation: We change the FEM meshes slightly before running the physical simulation to generate the ground truth data. We find that Mesh Augmentation improves the generalization performance much stronger than other augmentation techniques if the time spent for the physical simulations is held constant. Alternatively, techniques like edge dropout or node dropout increase the generalization performance as well and can be applied to a fixed dataset. See figure 6 for a comparison of different augmentation techniques.

Figure 6: Augmentation techniques can be used to greatly enhance the models performance to generalize to previously unseen mesh topologies: the 5 bars represent the generalization performance measured on a held-out test set, comparing 5 models trained with the same amount of data and different augmentation techniques. The best performance is reached if we randomly modify the mesh topologies slightly before running the ground truth simulation to generate the training data. Note that the figure is in log scale.

Generalization to different resolutions

Even though GNNs can learn to generalize reasonably well to different FEM meshes and resolutions, as we have seen before, there are some limits to generalization. As the GNN operator is always localized, it becomes especially difficult to pass information across long distances in the graph. This becomes especially problematic for grids that have very different resolutions in different areas, as the updates via message passing might get stuck in high-density areas. (Li et al. 2020.1) argue that with a naive GNN operator, some PDEs cannot be solved at all because of these difficulties.

A few methods have been proposed to overcome this limitation, mostly by adding additional edges to the graph spanning larger distances. A good example is the Multipole Graph Neural Operator (Li et al. 2020.2): This operator forms a hierarchical graph by first connecting all nodes that lie within a relatively small radius with each other. Then the nodes in the graph are randomly subsampled, and edges between the randomly selected support nodes are created now with a bigger radius. The process can be repeated many times until nodes with sufficiently large distances in the graph are connected. In this way, the method ensures that information can be passed throughout the entire graph while keeping the operation tractable. Figure 7 visualizes the iterative subsampling of the graph and the different radii for connecting nodes.

Figure 7: The same initial graph (a unit square) with subsequently subsampled nodes and edges spanning longer distances. The maximum radius for a connection to be formed varies from 0.25 to 1.0, while at first, 400 randomly selected nodes are kept, and at the later stages, only 100 or even just 25 nodes. The sampling procedure ensures that the 25 nodes in the coarse stage are included in the 100 nodes selected for the finer stage and so on. Passing information across such randomly subsampled support nodes is the key principle of the Multipole Graph Neural Operator (Li et al. 2020.2), while the execution starts with message-passing steps on the finest graph and then subsequently traverses the coarser graphs and performs message passing using their connections. The process is then inverted and starts again from coarse to fine to generate the final predictions.

For future work, we assume that existing methods can still be improved a lot when it comes to passing information efficiently in a graph: instead of subsampling the graph randomly, we can borrow knowledge from graph theory to find optimal paths between relevant nodes. In this spirit, it will become interesting to look at approaches that organize hierarchical graphs in the most optimal way (Malkov et al., 2018). Another approach would be to learn the necessary long-distance connections, for example, using a graph transformer network to predict relevant edges (Yun et al. 2019). Furthermore, there are already transformer architectures, which scale to very large input sizes and arbitrary data formats, e.g. (Jaegle et al. 2021). We, therefore, think that neural PDE operators should be investigated that work without a predefined graph structure at all.

Incorporating physical prior knowledge

Especially for problems that require long simulation runs to be solved, it becomes particularly interesting to combine the objectives of operator learning and PINN in a single approach to aim at learning general-purpose operators with less ground truth training data. In our recent study (Lötzsch et al. 2022), we noticed that in some edge cases, the operator model trained on ground truth data output predictions which violate known physical principles. We assume that incorporating physical constraints could enhance the accuracy of the predictions and speed up the training process as well as foster generalization capabilities.

In (Arnold et al. 2022), the authors propose to use a two-step procedure for training. They start by training an operator network on a range of simulated samples obtained from a ground truth FEM simulation. In the second step, they use a physical training objective similar to PINNs for further training. They show that while the training error on the ground truth simulations slightly increases during the second phase, the generalization and test performance are greatly improved. Similarly, Li et al. (2021) combine physical objectives with learning from data to train a more powerful operator network.

Generalize to more PDEs

For practical purposes, it is essential that the training scheme for obtaining the neural operator is applicable to new problems and works across many different PDEs. Li et al. (2021) propose to use transfer learning for data-efficient adaptation to new but similar PDEs. Brandstetter al. (2022) even propose to learn multiple PDEs within a single operator network by introducing separate parameters that allow interpolation between these different PDEs. Despite this advantage, they still resort to using different hyperparameters and network architectures for very different PDEs. For the effective transfer of existing approaches to numerous new challenges, it will be important to construct operators that are general enough to work well under different constraints and with a wide range of different PDEs and boundary conditions.

Practical challenges

Finally, there are also numerous practical challenges in researching Neural PDE solvers. Especially due to missing common benchmark datasets in the domain, there is a lack of comparability throughout different papers. An important step towards creating neural PDE solvers that are more applicable to practical problems would be to aim at a unified benchmark suite for comparing different approaches, as is the case in other domains, for example, Natural Language Processing (Wang et al. 2018). In the same spirit, it will be beneficial to collect and compare existing code bases.

Conclusion

In this blog post, we have looked closer at machine learning-based solutions to boundary value problems in physics. While there are impactful papers in this area already, we believe that a lot of potential still remains unexplored. Graph Neural Networks (GNNs), in particular, allow to train very general solution operators that can work with arbitrary grids and can be significantly faster than ground truth FEM simulations. We touched upon further directions of research for this type of solution operator, namely: increasing generalization performance by augmentation, dealing with very high graph resolutions, incorporating prior physical knowledge, and generalizing to different PDEs. It will be very interesting to follow the progress in these and related areas and observe the possible practical applications of these advancements.

References

(Logg et al. 2012) Logg, Anders, Kent-Andre Mardal, and Garth Wells, eds. Automated solution of differential equations by the finite element method: The FEniCS book. Vol. 84. Springer Science & Business Media, 2012.
(Raissi et al. 2018) Raissi, Maziar. “Deep hidden physics models: Deep learning of nonlinear partial differential equations.” The Journal of Machine Learning Research 19, no. 1 (2018): 932–955.
(Langtangen et al. 2017) Langtangen, Hans Petter, and Anders Logg. Solving PDEs in python: the FEniCS tutorial I. Springer Nature, 2017.
(Li et al. 2020) Li, Zongyi, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. “Fourier neural operator for parametric partial differential equations.” arXiv preprint arXiv:2010.08895 (2020).
(Lu et al. 2019) Lu, Lu, Pengzhan Jin, and George Em Karniadakis. “Deeponet: Learning nonlinear operators for identifying differential equations based on the universal approximation theorem of operators.” arXiv preprint arXiv:1910.03193 (2019).
(Li et al. 2022) Li, Zongyi, Daniel Zhengyu Huang, Burigede Liu, and Anima Anandkumar. “Fourier neural operator with learned deformations for pdes on general geometries.” arXiv preprint arXiv:2207.05209 (2022).
(Li et al. 2020.1) Li, Zongyi, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. “Neural operator: Graph kernel network for partial differential equations.” arXiv preprint arXiv:2003.03485 (2020).
(Simonovsky et al. 2017) Simonovsky, Martin, and Nikos Komodakis. “Dynamic edge-conditioned filters in convolutional neural networks on graphs.” In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3693–3702. 2017.
(Gilmer et al. 2017) Gilmer, Justin, Samuel S. Schoenholz, Patrick F. Riley, Oriol Vinyals, and George E. Dahl. “Neural message passing for quantum chemistry.” In International conference on machine learning, pp. 1263–1272. PMLR, 2017.
(Pfaff et al. 2020) Pfaff, Tobias, Meire Fortunato, Alvaro Sanchez-Gonzalez, and Peter W. Battaglia. “Learning mesh-based simulation with graph networks.” arXiv preprint arXiv:2010.03409 (2020).
(Lötzsch et al. 2022) Lötzsch, Winfried, Simon Ohler, and Johannes S. Otterbach. “Learning the solution operator of boundary value problems using graph neural networks.” arXiv preprint arXiv:2206.14092 (2022).
(Arnold et al. 2022) Arnold, Florian Robert Eduard. “Beiträge zur aktiven Strömungsbeeinflussung: Systemmodellierung mit Methoden des maschinellen Lernens und ganzzahlig beschränkte Regelung für zyklische Prozesse.” PhD Thesis (2022).
(Brandstetter et al. 2022) Brandstetter, Johannes, Daniel Worrall, and Max Welling. “Message passing neural PDE solvers.” arXiv preprint arXiv:2202.03376 (2022).
(Li et al. 2020.2) Li, Zongyi, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Andrew Stuart, Kaushik Bhattacharya, and Anima Anandkumar. “Multipole graph neural operator for parametric partial differential equations.” Advances in Neural Information Processing Systems 33 (2020): 6755–6766.
(Yun et al. 2019) Yun, Seongjun, Minbyul Jeong, Raehyun Kim, Jaewoo Kang, and Hyunwoo J. Kim. “Graph transformer networks.” Advances in neural information processing systems 32 (2019).
(Jaegle et al. 2021) Jaegle, Andrew, Felix Gimeno, Andy Brock, Oriol Vinyals, Andrew Zisserman, and Joao Carreira. “Perceiver: General perception with iterative attention.” In International conference on machine learning, pp. 4651–4664. PMLR, 2021.
(Malkov et al. 2018) Malkov, Yu A., and Dmitry A. Yashunin. “Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs.” IEEE transactions on pattern analysis and machine intelligence 42, no. 4 (2018): 824–836.
(Li et al. 2021) Li, Zongyi, Hongkai Zheng, Nikola Kovachki, David Jin, Haoxuan Chen, Burigede Liu, Kamyar Azizzadenesheli, and Anima Anandkumar. “Physics-informed neural operator for learning partial differential equations.” arXiv preprint arXiv:2111.03794 (2021).
(Brandstetter et al. 2022) Brandstetter, Johannes, Daniel Worrall, and Max Welling. “Message passing neural PDE solvers.” arXiv preprint arXiv:2202.03376 (2022).
(Wang et al. 2018) Wang, Alex, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. “GLUE: A multi-task benchmark and analysis platform for natural language understanding.” arXiv preprint arXiv:1804.07461 (2018).