Insight in Inverse Methods — A Simple Example

Published in

dataswati-garage

8 min readMay 21, 2019

Building a model consist in using some input data to obtain some output value. In Machine Learning, the model that gives the output prediction is learned from the dataset, and it needs no theory to link the data to the model (i.e. the theory is learned).

Inverse problems are quite different: we have now access to measurements and to a theory that explain the relation between measurements and model. Using this theory, we aim to rebuilt the model generating the measurements.

Here are some examples of inverse problems:

Radar uses measurements of radio waves to infer the range, angle, or velocity of moving objects.
Magnetic Resonance Imaging (MRI) builds slice images of the inside of the body using magnetic fields. It is possible to use these numerous slices to reconstructing 3D model, this technique is called tomography.
During my thesis I worked on the tomography of the Earth, i.e. mapping the interior of the Earth’s mantle using seismic recordings (Adenis, 2017).
More generally, many of the non-destructive methods that allow to see inside an object or a person use inverse methods.

The previous examples are used to reconstructs images, but inverse problems are also met in many contexts other than imagery. In this article, I try to stay as general as possible to explain the formalism in the first part. Although in the second part, the application, I focus on mapping as it is my field of expertise.

Formalism

Mathematically, the so-called forward problem is written as d=g(m) where d is a vector containing the observations, m a vector containing the searched model — called model parameters — and g is the model theory. The inverse problem consists in finding m knowing d and g.

For example, in seismology, we often model the velocity of the seismic waves (model m) using the arrival time at the recorder (data d). The time taken by the wave to go from point A to point B is directly related to the velocity of this wave along the path AB. This relationship is expressed by the function g, that returns the time given the velocity model along AB.

The matrix G contains the partial derivatives of g with regard to the model:

If g is linear, the problem can be written as the matrix product: d=G.m (where . denotes matrix multiplication). Now you can see where inverse problem gets its name: it is possible to solve it by finding the inverse matrix of G.

To resolve an inverse problem, linear or non-linear, one can use Tarantola & Valette (1982) algorithm. This algorithm can be interpreted as Bayesian inference. As in many Baysian model, the observations and the model parameters are expected to follow Gaussian distributions.

Data

d0 contains the observed data (the measurements) and Cd0 is the correlation matrix of the data, that indicates how two observed data are correlated with one another.

In the simpler case this matrix is diagonal, and the diagonal contains the a priori error on the ith data.

In this case we have prior the inversion:

Model

Next we have:

m0 is model a priori, i.e. the prior knowledge we have of the model, which is often chosen null or constant. Cm0 is the correlation matrix of the model, i.e. how to points of the model are correlated to one another. Again, in the more simple setup, this matrix is diagonal, and the diagonal contains the a priori error on the jth parameter:

And we have prior the inversion:

Thought, it is common to use more complex formulations for Cm0. If the model is a map, it is possible to consider that two points close to each other are more strongly correlated. In this case, it is possible to define this matrix using a kernel (a covariance function) that gives the spatial dependency of two data points.

Cd0, m0 and Cm0 contain the a priori knowledge of the system. The resulting model strongly depends on the choices made during the elaboration of these matrices.

Linear and non-linear solving

If the problem is linear (i.e. if we can write it as d=G.m), Tarantola & Valette (1982) estimator for the predicted model is:

The updating algorithm can be represented in a simple way:

The linear problem solving: starting from d0 and m0, we compute the derivatives of g with respect to the model m and we update the model to minimize the residuals (the distance between the real data and the data calculated from the new model).

If the problem is non-linear, the iterative algorithm becomes:

This algorithms converge easily if the problem is weakly non-linear. The different steps can be represented like:

The non-linear problem solving: starting from d0 and m0, we compute the derivatives of g with respect to the model m and we update the model to minimize the residuals. Consecutive updates of the model are represented by (a), (b), © and (d). As g is non-linear, reaching an optimal solution can require many steps.

Application

This application is a simplified example of mapping using seismic recording. The process is implemented in python and is available in this notebook.

The different steps are:

A synthetic model is randomly generated.
From this model are extracted synthetic data, corresponding to measurements.
From these measurements, the initial model is rebuild using inverse method.

In the real world, we only have access to the measurements, and not to the real model. Consequently, there is no way of checking if our final model is correct but by making synthetic tests.

Synthetic model

Below is a randomized synthetic model, also called initial model. This model is a map of the velocity of a seismic wave c in km/s. We have 5x5 cells and each cell is a square of length 1 km.

In each cell of coordinate s=(latitude, longitude), the velocity c(s) is constant. To ease the calculus, we set the model to the inverse of the velocity:

The mean and standard deviation of m give us an estimation for m0 and sigma_m0.

In this formalism, m is a vector of size M=5x5. Therefore, the correlation matrix Cm0 have a size (M, M) and is set in this example to an identity matrix multiplied by sigma_m0.

Synthetic dataset

Lets consider a seismic wave traveling through the map. We know the coordinates of the starting points A (the epicenter of the earthquake) and ending points B (the location of the seismic recorder). Therefore we know the path of the wave as well as the duration dt it takes the wave to go from A to B. The dataset consist in a set of N time-deltas (in seconds), and their corresponding paths.

Map of 1/c with a random path plotted in white.

The time dt taken by the wave to travel through the path is equal to the sum of the time spent in each cell:

Where l(s) is the length of the path in the cell s. The last step comes from c=l/t, as it is the definition of the velocity (it equals distance over time).

This equation is actually our forward problem d=g(m). From this equation, we can calculate the synthetic dataset d0. Also, this relation is used to build the inverse model.

Here, d0 is size N. We also set the correlation matrix Cd0 to an identity matrix multiplied by the standart deviation of our data.

Forward problem

Now we got all that we need. Our model g is given by the equation of the forward problem:

Where li(sj) is the length of the ith path in the jth cell. G have size NxM.

Here we got lucky, as we have a linear process. We can directly extract the matrix G from the forward model. But the calculus of g and G can be critical for some problems (just think how ridiculously difficult this problem would be if we tried to model the velocity c directly instead of 1/c).

One needs to spent some time rigorously defining the model, the data and the relation between those. Some tricky examples can be found in Debayle & Sambridge (2004) (solving for seismic wave velocity), and the annex of Adenis et al. (2017) describe the formalism I used to model attenuation during my PhD. This formalism is more fully described in my thesis but, well, it’s in French (Adenis, 2017).

Solving

Let’s apply the linear solution of Tarantola & Valette (1982) to our problem. It gives us the following map for 1/c.

And finally, we compare below our initial map of the velocity to the new map. On the left is our initial map covered by all the paths used to build the model and on the right is the rebuilt map.

You can see that below the cells that are not covered by many paths, the algorithm struggle to find the initial velocity, and the rebuilt map remains close to the a priori value m0.

Below is what you get with a larger model (25x25 checker with about 1000 data). On the left is presented the initial map from which I extract the data, and the right one is the rebuilt map. Underneath is an example of a map obtained when sigma_m0 is increased. Increasing this parameter allows the model to have broader variations. As our dataset does not contain any noise, this model is therefore closer to our synthetic model.

Conclusion

I presented the formalism of inverse methods as developed by Tarantola & Valette (1982). Contrary to machine learning, inverse problem starts from a theory, a relation that links the data and the model. Without any forward problem d=g(m), there is no inverse problem.

I gave you a simplified example of mapping from seismic data. A notebook is available if you want to try it out. The parameters (m0, Cm0 and Cd0) used for the example are very basic.

These parameters have a very strong impact on the quality of the rebuilt model. In the next article, I will present some tests on these parameters and their analysis. I am also going to try out some different kind of initial synthetic model.

Don’t hesitate to ask any questions about this article, I’ll be happy to answer!