Spotlight: MDAnalysis–Python Library to Analyze Molecular Dynamics Simulations

Tim Bonnemann
Open-Source Science (OSSci)
9 min readSep 21, 2023
Infographic explaining molecular dynamics computer simulation
Copyright © 2021 Fiona Naughton. The apical sodium-dependent bile acid transporter (ASBT) uses sodium to drive the reabsorption of bile acids in the intestine. Experimentally obtained structures show the location of two bound sodium ions (top left). Molecular dynamics simulations were performed to observe the dynamic behavior of these sodium ions in both wild-type and mutant ASBT. Using MDAnalysis, relative ion positions across multiple simulations were gathered and clustered to reveal new ion interaction sites in addition to those seen in the experimental structure (bottom left). To further characterize these sites, MDAnalysis was used to identify the frequency of contacts between protein residues and the ion while in each cluster (right). Clusters show distinct, though overlapping, coordination patterns; the newly-identified sites may serve as “staging sites” for sodium ions when entering or leaving the canonical sites.

Welcome to our new Spotlight Series, where we get to know science-focused open-source software projects in order to better understand the opportunities for Open-Source Science (OSSci) to add value and help accelerate scientific research and discovery through better, stronger open source in science. We’re particularly happy to kick things off with MDAnalysis, a NumFOCUS sponsored project since 2020. This interview was conducted asynchronously via shared doc. Enjoy!

Let’s start with a round of introductions. Who are you? What do you do? And what is your relation with open-source software in science?

As a short introduction, my name is Jenna Swarthout Goddard and I joined the MDAnalysis team as the Project, Community and Outreach Manager in March 2023. Coming from a background in Environmental Health and Engineering, I still consider myself a newcomer to open source. A passion for transparent research and advancing equity, diversity and inclusion in the research community led me to join the MDAnalysis team to help expand our mentoring, teaching and community engagement efforts. I am also currently serving as a member of the NumFOCUS Project Summit Program Committee and a Code of Conduct working group within the Scientific Python ecosystem.

Tell us about MDAnalysis. What does it do? And how did it come about?

MDAnalysis is a free, open source Python library for manipulating and analyzing data from molecular simulations, with a focus on molecular dynamics. Computer simulations at the atomistic and coarse-grained scales have become important tools in the molecular sciences, spanning use cases from interactions of drugs with proteins in biological systems and the development of novel materials. MDAnalysis makes it easy for users to analyze simulations data that run on some of the largest supercomputers in the world by providing a toolkit of programming building blocks, which not only lends itself to interactive data exploration and rapid prototyping, but also provides a robust foundational library that can form the basis for new computational tools.

MDAnalysis Trailer 2021

With >27,000 downloads per month for new releases and >3,000 total citations of the original two publications (Michaud-Agrawal 2011 and Gowers 2016), MDAnalysis has had an ever-expanding international community of users and developers since it was first developed in 2006 by Naveen Michaud-Agrawal, a graduate student at Johns Hopkins University at the time. Another graduate student, Elizabeth Denning, and a postdoc, Oliver Beckstein, at Johns Hopkins University used MDAnalysis and started to contribute code, before co-founding MDAnalysis in January 2008.

“Naveen eventually left academia, but Elizabeth and I continued to work on MDAnalysis,” said Oliver Beckstein, an associate professor at Arizona State University (ASU) and co-founder/core developer of MDAnalysis, in an ASU interview in 2021. “Over the years, both users and contributors grew. Obviously, we use MDAnalysis everywhere in our own research, so it’s great to have a good scientific multipurpose tool at hand that allows us to implement our own new ideas. Overall, I am most happy about the fact that MDAnalysis is not just a useful piece of software for so many, but that MDAnalysis has become its own thriving community that is known to be very welcoming and inclusive.”

Could you give a concrete example of how MDAnalysis is being used in practice?

MDAnalysis has been recognized by the Chan Zuckerberg Initiative (CZI) as “Essential Open Source Software for Science (EOSS)” through the award of two EOSS grants: EOSS4 and EOSS5. It is used for cutting edge research in biophysics, chemistry, soft-matter physics, and materials research around the world in academia and national research labs. By running a reverse citation lookup for Michaud-Agrawal 2011 and Gowers 2016, it is evident that applications range from medically relevant biophysics and chemistry (e.g., SARS-CoV-2, neurotransmitter binding) to materials science (sustainable batteries). MDAnalysis also strives to be highly interoperable with other software packages. As it can read and write simulation data in over 40 file formats, it enables users to write portable code that is usable in virtually all biomolecular simulation communities. MDAnalysis forms the foundation of many other packages and is currently used by more than 20 data visualization, analysis and molecular modeling tools.

How is your project funded? And how do you manage to sustain it?

Over its lifetime, MDAnalysis has had at least 187 individual contributors. We now have a nine-person core developer team who represent the project publicly and steer the direction of the project.

“However, over all this time, the developers were never really paid to work on this software — it’s all been volunteer work,” said Beckstein. “This situation only changed with the recent CZI grants. They have been funding five core developers (typically part time) to allow them to dedicate some time to some of the less glamorous work of actually maintaining the code base and documentation. Even as it is, it is difficult to keep fixing, updating, and releasing code that’s used by thousands of scientists, but without such support, code quality and release frequency would suffer and important performance improvements can’t be made. Of course, the funds also allowed us to do exciting things, such as making performance improvements and launching the MDAKits ecosystem (see also our SciPy proceedings paper DOI 10.25080/gerudo-f2bc6f59–00a).”

We regularly take part in various mentoring programs (i.e., Google Summer of Code, Outreachy and Station1 Frontiers Fellowship) that sponsor “interns” working on the project, and have worked with the National Science Foundation (NSF) Research Experiences for Undergraduates (REU). As part of an NSF grant, PhD students have worked on high-performance computing aspects, but the core of the development has been through voluntary contributions by PhD students, postdocs, and faculty members on the side. More recently, we have had support from Google Season of Docs and several NumFOCUS small development grants. The CZI EOSS grants mentioned earlier represent the first funding for the project itself.

What are the key challenges that keep you up at night, both in your day-to-day work on MDAnalysis and longer term?

MDAnalysis is an international community made up of people from all over the world. With users and developers based in most continents (we’re not so sure about our presence in Antarctica), it can be difficult to work across different time zones, both in regards to the day-to-day and through our outreach efforts. We do our best to have regular (online) “face-to-face” core developer meetings, but it’s often difficult to get everyone in the (Zoom) room at once. We therefore make use of a lot of asynchronous conversations, such as those through our Discord server to chat with other MDAnalysis users and developers. We are also intentional about trying to organize events, such as online training workshops, across different time zones for an international user base.

Not only is MDAnalysis used internationally, but it is used by scientists in academia, national research labs, and industry. While it is anecdotally known that the MDAnalysis package is used, for example, within the pharmaceutical industry, we have not historically leveraged these partnerships to help inform the direction of the project or fund development. This is partially due to perceived restrictions tied to the MDAnalysis core library’s GNU General Public License v2 (or any later versions) (GPLv2+), which you can read a bit more about on our blog. After receiving comments from the community and having back and forth conversations with our lawyers, we have settled on changing to the GNU Lesser General Public License v2.1 (or any later versions) (LGPL v2.1+). While this transition will take some time and require sign-off from all MDAnalysis contributors, we hope this will provide more flexibility for developers who make use of MDAnalysis in their own codes. There is also still generally work to be done in involving industries that rely on open source software (some estimates say open source software is being used in as many as 99% of Fortune 500 companies!) in the contribution to and investment in open source. At MDAnalysis, we are working towards building relationships with industrial partners to coordinate development in a synergistic direction and hopefully open up additional funding streams to sustain the project over the long term.

Looking ahead at the next couple of years or so, where do you see your project is headed? What are your aspirations?

We are extremely grateful for our growing community of users and contributors to the MDAnalysis project. However, greater numbers can also bring some challenges. It can be difficult for a limited number of core developers, who must wear a number of different hats, to stay on top of reviewing pull requests, setting up documentation and tests and maintaining code. This can negatively impact the experience for new contributors and slow improvements to the code base.

Comic “Centralising code to MDAnalysis is a limited solution“, designed by MDAnalysis core developer, Fiona Naughton (@ExplainedByCats)
Designed by MDAnalysis core developer, Fiona Naughton (@ExplainedByCats)

To address this, as well as to lower barriers to producing FAIR (Findability, Accessibility, Interoperability, Reusability)-compliant code, we have introduced the MDAKit ecosystem as part of our CZI EOSS4 supported work. The MDAKit framework is designed to guide developers from the initial stage of package development all the way through long term maintenance of the code base and eventual publication.

Comic “The MDAKit Framework”, designed by MDAnalysis core developer, Fiona Naughton (@ExplainedByCats)
Designed by MDAnalysis core developer, Fiona Naughton (@ExplainedByCats)

We have collected initial feedback from the community on effective strategies to increase visibility for the most useful packages, reduction of redundant metadata generation, clarifying the separation between the MDAKits registry and individual MDAKit source codes (particularly with regards to testing and CI) and solidifying the types of applications (analysis versus utility) that can make valid and useful Kits. We are using this developer input to improve our MDAKits framework to be ready for the inaugural MDAnalysis User Group Meeting (UGM) taking place September 27–29, 2023. We envision the MDAnalysis UGM will provide a platform to not only allow users and developers to learn to create their own MDAKits, but also to have an open discussion of the planned development of the MDAnalysis ecosystem overall. The feedback arising from these discussions will be invaluable to the developer team for improving MDAnalysis-based workflows and creating a community-based development roadmap to help guide the future direction of the project.

In addition to continuing to improve our codebase, we are aspiring to nurture our community by increasing our engagement in outreach, mentoring and teaching activities. We are aware that the diversity of our community is one of our biggest strengths, and we are determined to be a welcoming community for all. However, not all of those in our community have access to the same resources, which may limit participation from people in underrepresented or minoritized communities. We are therefore always looking to improve the ways we can lower the barriers to participation in the MDAnalysis community. For example, we strive to make recordings with subtitles and transcripts of our teaching materials freely available under open source licenses to maximize the accessibility and impact of these materials for self-directed learning. Made possible by the generous support of our CZI EOSS5 grant, we are also dedicated to not only making MDAnalysis events, such as workshops and user group meetings, free to attend, but also to providing financial support to those facing financial barriers to participation (e.g., to cover child care or internet access costs, through travel bursaries, etc.).

Last but not least, who are your ideal contributors? And how can people get involved?

MDAnalysis has always been “written by scientists for scientists”, so people with an interest in using MDAnalysis for their own research have been key to the growth and health of the project. MDAnalysis also places great emphasis on mentorship as an avenue of inclusion. We have mentored many contributors just starting out in open source — through programs such as Google Summer of Code, Outreachy and the Station1 Frontiers Fellowship — and are continuing to identify new mentoring and community engagement schemes we plan to participate in (such as the CompChemURG mentoring program). These efforts have tremendously benefited not only the mentees but also the overall health of the project, with many of the current core developers joining the project through these mentoring schemes.

There are many ways to get involved with MDAnalysis. One of the best ways to get started is to get your hands dirty following the MDAnalysis User Guide. Once you’re a bit familiar with the MDAnalysis package, you can start contributing by forking the GitHub repository and submitting a pull request. Ask questions on the discussion (for usage-based questions) or developer (for questions about contributing) mailing lists, and chat with other users and developers on Discord (join using this invite link). We are also organizing online training workshops and striving for annual user group meetings to connect with the community both online and in-person. Keep an eye on our blog, Twitter and LinkedIn pages for announcements related to MDAnalysis and community events!

Good stuff. Thank you, Jenna!

Thanks for reading! Are you involved in an OSS project in science and would like to share your experience? Let us know!

--

--

Tim Bonnemann
Open-Source Science (OSSci)

Intersection of community & participation. Currently @IBMResearch. Wannabe trailrunner.