A new software framework for autonomous materials discovery

Published in

Toyota Research Institute

6 min readJul 6, 2021

By Muratahan Aykol & Joseph Montoya

Crafting a material with the right “structure” to get the desired engineering property is the holy grail of material science. Thanks to quantum mechanics, we know that the crystal structure of a material — the identity and arrangement of atoms — determines its physical and chemical properties. For example, the cathode material in your cell phone’s lithium-ion battery is made primarily of lithium, cobalt and oxygen atoms arranged in a specific crystal structure. This arrangement packs chemical energy in a small volume while allowing lithium ions to move around without disrupting the structure, making the battery rechargeable.

*Reproduced from Montoya et al. Chem. Sci., 2020, 11, 8517 with permission from the Royal Society of Chemistry.*

What makes materials science so exciting is that the number of hypothetical crystal structures is limitless! Because of the many possibilities, there is always an opportunity to find better materials that push technology to its next frontier. For example, putting less-expensive batteries in electric vehicles will help us reduce our carbon emissions quicker, while faster-charging materials will ensure that we spend less time charging our vehicles, eliminating one critical obstacle to widespread adoption and use.

But if the number of possible materials is limitless, how can we explore in an efficient way when we have finite time and resources for running experiments?

This blog will illustrate how CAMD (Computational Autonomy for Materials Discovery) can provide the software framework to discover stable materials and design and run autonomous research campaigns.

The traditional route for materials discovery has been the decades-long exploratory practice, in which scientists driven by their own curiosity attempt to make new compounds, develop new theories, and ultimately decide what materials should be studied next by the scientific community.

This process sets the stage for serendipitous and insightful discovery of the next breakthrough materials like lithium cobalt oxide for batteries or yttrium barium cuprate superconductors. But how can we automate and accelerate this process of materials discovery?¹ For instance, can we give a computer some existing information on materials and ask it to intelligently decide which experiment to run, feed the measurements back and repeat this process so the computer gets smarter with new data?

In our recently launched open source python library, CAMD (Computational Autonomy for Materials Discovery), we aim to provide the proper software framework to make it easier for materials scientists to design and run closed-loop research campaigns. Our work leverages many of the popular sequential or adaptive decision-making efforts, such as active learning or Bayesian optimization. In CAMD, all the nitty-gritty details of the decision-making process are encapsulated in an agent, which pretends to be the researcher.

CAMD knows how to talk to an outside experimental facility programmatically and organizes a ping-pong game between the agent and experiment. This process runs until the system is out of time or resources or the agent wants to stop. The benefit of using the agent is the flexibility it provides for algorithmic decision making, including machine learning, exploration-exploitation strategies, and physical constructs, heuristics, logic, and empirical models. With this modular design and ability to use existing materials data, CAMD provides a playground to design, try and optimize the agents before a single penny is spent on new experiments!

At TRI, our recent application of CAMD is focused on discovering new stable inorganic compounds. While the rules of quantum mechanics (QM) dictate the world of atoms and define how crystal structure maps to engineering properties, we can’t just apply the QM directly to materials discovery. QM itself is intractable even with the best supercomputers of our day, but its density functional theory (DFT) formulation provides a reasonable compromise between computational complexity and accuracy for predicting many fundamental material properties. DFT is now the workhorse of modern-day first-principles materials modeling. Over the past decade, DFT has begun a new chapter in materials science by creating large databases of predicted structures and properties and by enabling virtual screening for new candidate materials for an application.

But using DFT as our computer-based “experiment” to measure properties of hypothetical materials is still not an inexpensive calculation. A single material can cost anywhere from $1 to $100 on cloud computing. So, out of billions and billions of possibilities, how do we decide which hypothetical materials to probe with DFT, and how do we effectively spend our resources? This is the type of research problem CAMD helps solve.

So far, our main objective with CAMD has been to find inorganic compounds that are stable. DFT is good at predicting stability, which is a prerequisite for having hope of creating a computer-designed hypothetical material in the lab. Stability measures tendency for decomposition into other materials, so it is not only a function of the structure of a material itself, but also everything else in its chemical space. Only a small percentage of materials in DFT databases² end up being (reasonably) stable, which explains why developing new stable compounds from scratch (or materials discovery) has been a long standing challenge in materials science.

An easy way to illustrate how CAMD helps with discovery of stable materials is by looking at its impact on compositional phase diagrams. In simplified terms, one can think of these diagrams as maps of chemical spaces — they tell which materials would exist (or co-exist) if we mix various elements at certain proportions and provide guidance on how to navigate the land of materials³.

Below you will find two compositional diagrams for the case of the three component Magnesium-Iridium-Boron (Mg-Ir-B) system before (i.e. with materials already known to the system) and after running CAMD. In the original diagram, we are told that we can make only a single compound, MgBIr, if we mix these three elements in equal amounts. But is this information complete?

Let’s switch over to the new diagram we obtain after the CAMD campaign. First, we get a glimpse of the breadth of compositions (of ~200 crystal structures) CAMD tried in this campaign; ~50 of which ended up being nearly stable (green circles). But more importantly, we notice that three new fully stable⁴ compounds are discovered (red circles), and our map of the Mg-Ir-B space has evolved considerably! CAMD spent under $100 on computing and found new meaningful Mg-Ir-B compounds in a matter of a few days- a notable acceleration over the traditional process of discovery. The only human intervention was instructing CAMD to look at the Mg-B-Ir system.

This example illustrates how CAMD helps find new compounds and changes the phase diagram of the system Mg-Ir-B. Compositions CAMD tried are marked with green circles if they are successful; (nearly) stable, and with a red “x” otherwise. Left panel shows the original phase diagram overlaid with the first batch of compositions, and the right panel shows all compositions tested during the campaign.

CAMD for inorganics is now a platform running 24/7 on the cloud, and it is autonomously finding new inorganic compounds with instructions for which chemical system to evaluate. In less than a thousand chemical spaces we looked at it has discovered more than 25,000 new stable or nearly stable inorganic compounds. CAMD helps rapidly mine chemical spaces to see where the gaps in our materials knowledge (i.e. missing compounds) are.

Inorganic compound discovery has been the first application of CAMD for materials research at TRI, but it’s merely the tip of the iceberg for what’s possible with closed-loop research in general. Now that we can discover new, stable compounds, a clear next step is to direct CAMD to find ones that also have useful functional properties. Such autonomous research systems are poised to become more and more prevalent in materials science, and can aid researchers in designing and driving experiments towards materials fulfilling an objective, in complex, high-dimensional design spaces not tractable to explore with simple searches.

Stay tuned to hear some more new and exciting applications from us in the near future!

¹ This is a relatively old question. Dendral was one of the earlier computer-based discovery systems; an “expert-system” that focused on organic molecule discovery: https://en.wikipedia.org/wiki/Dendral. Herbert Simon also dedicated a good part of his career on how the process of scientific discovery can be automated. This is a good example of one of his many contributions in this area: https://doi.org/10.1016/0364-0213(88)90020-1

² Examples of these DFT databases are Materials Project (materialsproject.org), OQMD (oqmd.org) and AFLOW (aflow.org), with many other new databases emerging in the last few years.

³ You can learn how coordinates in these diagrams translate to elemental composition here and here.

⁴ To be a bit more precise, these stable compounds are new “ground states”, i.e. energetically more favorable than any combination of other phases in the system!

A new software framework for autonomous materials discovery

Written by Toyota Research Institute