A Local Materials Project Database

Shyue Ping Ong
3 min readMay 19, 2022

I was part of the team that started the Materials Project, a large open database of computed materials properties, back in 2012. In the ten years since, the Materials Project has become a critical resource that tens of thousand of researchers worldwide rely on for information on a daily basis.

One killer feature of the Materials Project is its well-documented REST API, one of the first of its kind in the field of materials science. With the Materials API and the high-level interface in pymatgen, any user can download large quantities of materials data, for instance, for developing machine learning models and performing screening to find technological materials with some unique set of properties.

While the Materials API is extremely powerful, there are times when you need a local “copy” of the Materials Project database. For instance, recently I had a project where I needed to create phase diagrams for many different combinations of elements. One of the main bottlenecks in my code was these lines:

mpr = MPRester()
entries = mpr.get_entries_in_chemsys(elements, inc_structure=True)

This code essentially queries for the API for all computed entries belonging within a given set of elements, which can number in the tens to hundreds depending on the chemical system. When looped over thousands of chemical systems, the download over the internet quickly became the bottleneck, despite the best efforts by the Materials Project team to make the API as efficient as possible.

--

--