SimPEG release of 0.14

Lindsey Heagy
simpeg
Published in
7 min readMay 28, 2020

Lindsey Heagy, Joe Capriotti, and the SimPEG team

We merged a big pull request into SimPEG this week (#786): over 30,000 lines of code changed! (and now close to 60,000 because we ran black 😉) These changes have been in progress since Summer 2019 and involved input, feedback, contributions, and improvements from many in the team. Normally, we push to have smaller pull-requests with tightly scoped changes, but this was a big refactor touching much of the code-base, and also includes major improvements and additions to the documentation, so we consolidated all of the changes and are releasing them to the world 🚀.

In this post, we will summarize some of the major changes and improvements as well as provide avenues for discussion and help if you are looking to migrate your code or want to get started with SimPEG.

TL; DR

SimPEG used to look a bit like this

And now it looks sort of like this

(We are actively having conversations on Slack about the most effective way to communicate the structure of the framework, please feel free to contribute your sketches and ideas!)

The major changes are:

  • The Problem and Survey classes have been refactored into the Simulation and a much lighter-weight Survey
  • The Data class now plays an explicit role in the framework
  • Many classes and methods have been renamed to be more in-line with PEP8 recommendations
  • There are efficiency improvements, including a first implementation using dask to parallelize aspects of the DC, IP and potential fields codes.
  • For help migrating your codes or if you have questions about the refactor, please post on discourse.

To get the latest code

> conda install -c conda-forge simpeg>=0.14.0

Motivation

In our post about the team meeting in Montreal we outlined some challenges with the current structure of the SimPEG framework (original issue here: #562) primarily around the way we constructed a forward simulation. The pair method used to combine a problem which contained the physics-engine and a survey which contained the survey geometry made it tricky to interface to other simulation codes (and also was tough to explain — which is a good test of an architecture). Also, it wasn’t necessarily obvious where you bring your field data into the framework.

Since we started SimPEG in 2013, the Python ecosystem has advanced significantly. Dask did not yet exist, nor did the Pangeo community, which has been an important catalyst for scalable computational technology and a community of practice in the geosciences. Undertaking a major refactor is also a chance to think through how to interoperate and build upon the advancements made by these groups.

Significant changes — the highlights

We provide a detailed overview of the changes in the release notes, so here we will share some of the highlights.

No more problem’s

The name of the Problem class has been changed to Simulation and now the Simulation contains all of the methods you expect a forward simulation to perform including computing a solution to a Partial Differential Equation (PDE) (the fields method) and generating predicted data (the dpred method). Previously these were divided up between the Problem and the Survey; the Problem had the fields method while the Survey had the dpred method (This is why we used to need that pair method, so that the Survey and Problem each had access to the methods and properties of the other).

The Survey is now a much lighter weight class; it contains a source_list with each source having its own receiver_list as before (and see how nice the PEP8 names look as compared to srcList and rxList 😃) and is now no longer needs to be paired.

What does this mean for your code? Previously, computing fields and predicted data in a DC-resistivity simulation would have looked like this

survey = DC.Survey(source_list)
prob = DC.Problem3D_CC(mesh, rhoMap=mapping)
prob.pair(survey)
# compute the fields and predicted data
# if you don’t want to view the fields, predicted data can be
# computed in one step survey.dpred(model)
fields = prob.fields(model)
dpred = survey.dpred(model, f=fields)

Now,

survey = dc.Survey(source_list)
simulation = dc.Simulation3DCellCentered(
mesh=mesh, rhoMap=mapping, survey=survey
)
# compute the fields and predicted data
fields = simulation.fields(model)
dpred = simulation.dpred(model, f=fields)

Making use of the Data class

The purpose of the Data class in SimPEG is to provide a structure which makes it straightforward to grab data associated with a given source and receiver. Essentially, it is a “smart” dictionary. Although we had this implemented in SimPEG previously, we never really used it… and instead, to tell SimPEG about your observed data that you want to use in an inversion, you had to set the dobs property on a survey. This is not obvious, and in a sense, places priority on the wrong thing, the survey, rather than on what our priority is in an inversion: working with our data. So now, the Data class plays an explicit role in setting up an inversion with SimPEG.

Previously, to construct a data misfit, we would have done something like

# set the observed data and uncertainties
survey.dobs = dobs
survey.std = 0.05 # assign 5% relative error
survey.eps = 1e-6 # assign a noise floor
# create a data misfit
dmis = DataMisfit.l2_DataMisfit(survey)

The survey carried both the observed data as well as the ability to run simulations and predict data (because we had to do the pesky pair). Now, creating a data misfit looks like:

# set the observed data and uncertainties
data = Data(dobs=dobs, relative_error=0.05, noise_floor=1e-6)
# create a data misfit
dmis = data_misfit.L2DataMisfit(data=data, simulation=simulation)

So we have separated the observed data and associated uncertainties from the simulation, which computes predicted data.

No more Python 2.7

The 🌎 has moved on. Need we say more?

We recommend python>=3.6 for SimPEG.

Documentation Improvements

With significant contributions from Devin Cowan, the SimPEG docs contain many more examples and tutorials! For the geophysical problems we support in SimPEG, there are now tutorials that walk through both setting up and running a forward simulation as well as an inversion. Documentation can always be improved, so please take a look, and share your questions and ideas!

Renaming of modules

We have made efforts to rename many modules, classes and functions with PEP8 style names. Most classes and functions should give you deprecation warnings if you try to use them. However, the modules have all been renamed. For example, SimPEG.EM.Static.DC is now SimPEG.electromagnetics.static.resistivity. We recommend looking at the release notes for a comprehensive overview of the changes and the tutorials section of the documentation for examples of the updated name-space for each geophysical problem included in SimPEG.

Bonus items

In addition to the major structural changes that were the initial goal of this release, we have also made some general efficiency improvements, particularly in the DC and IP codes motivated by the Geoscientists Without Borders project: Improving Water Security in Mon State, Myanmar. As part of this we now have a 1D layered Earth simulation and inversion written for SimPEG. We have also provided a toggle for the 2D and 3D DC simulations that should drastically decrease computation time and memory requirements for DC arrays with many overlapping dipole sources, (i.e. a wenner type array common in groundwater applications). Also, the IODC class can now generate a TreeMesh in 2D for you!

Another new area of active development is incorporating dask to parallelize operations. This is in the experimental stage, but you can toggle the dask implementation on by importing SimPEG.dask prior to other SimPEG imports

import SimPEG.dask
from SimPEG.electromagnetics import resistivity as dc

Finally, of course, we squashed a few bugs 🐛!

A note on backwards compatibility

We have worked to make many of the changes backwards compatible to ease the transition (a big thanks to Joe Capriotti 👏 !). Most previously written code should work with only small changes to imports. Warning and error messages will inform you of updates you can make to your code in order to align with the new implementation. Be on the lookout for keyword argument and class property renamings. In the next minor release of SimPEG (0.15), we will be introducing breaking changes that reduce some of this backwards compatibility, so for active projects, we recommend you start migrating your code. For archived projects that are complete (e.g. like those in the simpeg-research GitHub organization), we recommend that you pin the version of SimPEG you used to produce those results in your requirements.txt and / or environment.yml file.

How to connect

The SimPEG discourse is the go-to place for questions, discussions and help on using SimPEG. We have a dedicated post for the simulation refactor, so feel free to post your questions there as they arise! We also have an open Slack group for chatting with users, developers and folks who follow the project.

We hold meetings weekly at 10:30am PDT. If you would like to join us, please join the #meetings channel on slack; this is where we circulate the video link each week. All meetings are recorded and available on youtube.

Looking ahead

We are continuing to work on improving efficiency throughout the codebase as well as parallelizing all SimPEG simulations. A current focus for many on the team is improving our implementation for Magnetotellurics and Natural Source EM methods.

We also have plans to implement a higher-level API for SimPEG in order to streamline the implementation of simulations and inversions. This will include things like utilities for building a simulation mesh based on a given survey geometry, functions for setting up a “vanilla” Tikhonov inverse problem with a beta-cooling schedule, and a command-line-interface to SimPEG.

There is still lots to be done! And we look forward to continued and growing involvement from the community 🎉

--

--