DeepOpen Catalyst Project: Using Machine Learning to Accelerate the Search for Low-Cost Catalysts for Renewable Energy Storage. Part 2

Shuvam Das
deepkapha notes
Published in
7 min readFeb 26, 2023

Advancements in Computational Chemistry and Machine Learning for Hydrogen Storage Material Discovery

The process of generating the new OER dataset required tens of millions of compute hours. The carbon emissions from the compute resources used to generate the dataset were committed to being 100 percent offset as part of Meta’s Net Zero program. The Open Catalyst Project aims to facilitate scientific progress in the field of computational chemistry by providing an open-source dataset that allows researchers to overcome the computational limitations of previous methods. The hope is that the project will help the community discover promising new materials at scale, specifically in the development of low-cost catalysts for the energy industry. Such catalysts are critical for addressing global energy needs while reducing the impact of climate change, and the project’s efforts can accelerate progress in this area.[1]

The study also notes that there has been a shift in automobile engine development, from large gasoline engines to smaller turbocharged ones, and from conventional gasoline-powered vehicles to hybrid and electric ones. Hydrogen fuel cells have also emerged as an alternative, thanks to their clean fuel source and electrochemical reactions. Adsorptive hydrogen storage using nanoporous materials (NPMs) such as zeolites, carbon-based materials, and metal-organic frameworks (MOFs)[2] has been explored to address the problems of high-pressure storage tanks, including high production costs and safety hazards such as leakage and explosion. High-throughput molecular simulations have been used to screen the adsorption properties of a large number of NPMs. Machine learning algorithms have been used to predict adsorption properties from structural information of NPMs to reduce the total computational cost of the high-throughput screening process.[3]

Generic fuel cell electric vehicle (FCEV) powertrain topology including fuel cell system (FCS), HV battery, electric drive, supercapacitor (SC), transmission, LV auxiliary loads, onboard charger, and related power electronics converters.[4]

Most simulation and machine learning approaches to nanoporous material (NPM) discovery focus on predicting the property at the same thermodynamic state for all materials, which is insufficient to accurately rank hydrogen storage among many NPMs because of the non-monotonic relationship. To overcome this limitation, the need arises to predict adsorption at multiple temperatures and pressures, resulting in a big “meta-dataset” containing many small adsorption datasets. The adsorption isotherm function (AIF) with temperature dependence is fitted individually to each dataset to enable prediction of the adsorption property at any thermodynamic state, leading to the discovery of NPMs with higher hydrogen storage working capacities. A team of researchers from the Technical University of Denmark and other institutions have developed a machine learning model that uses an encoder-decoder architecture to generate a latent representation of the data for each sorbate-sorbent system, with only the fingerprint with five coefficients needing to be stored for each material. Monte Carlo simulations are used to obtain the hydrogen loading data as the meta-dataset, which includes all-silica zeolites, hypothetical all-silica zeolites with high predicted synthesizability, hypothetical metal-organic frameworks, and configurations drawn from nine different hard-templating carbons.

Machine Learning Model for Identifying High-Capacity Materials for Hydrogen Storage and Analysis of Adsorption Patterns on Zeolites and MOFs

The meta-learning model was trained on 160 all-silica zeolites and could be directly applied to hypothetical all-silica zeolites and hard-templating carbons, but it required fine-tuning on metal-organic frameworks because of their larger pore sizes than the all-silica zeolites. The predictive accuracy of the meta-learning model was compared to that of four commonly used adsorption isotherm functions for all sorbent materials, with the meta-learning model being significantly more accurate than any of the four adsorption isotherm functions in the few-shot setting. The study shows that the proposed machine learning model could help identify the most promising materials for hydrogen storage, which is essential for the development of clean energy technologies.[6]

However, further research is needed to consider the effects of framework defects, extra framework species, and gas impurities on hydrogen storage performance. The results also suggest that the use of machine learning models for predicting adsorption properties could be extended to other gas molecules and materials.

(A) A meta-learning problem setup. To solve the problem of jointly predicting the materials space and the state (temperature and pressure) space, meta-learning consolidates the prediction of all materials into a single model and can be generalized to new materials. (B) The meta-learning architecture for gas adsorption prediction. The encoder network produces the fingerprint vector z from the 3D (q, p, T) loading surface. The decoder network reconstructs the adsorption loading as a continuous function of (p, T) given the adsorption fingerprint z.
[5]

This article discusses a study that analyzes the adsorption of hydrogen on zeolites and metal-organic frameworks (MOFs) to gain insight into the adsorption patterns and reasons for high Tmax values. The study investigates the fingerprint representations and the isosteric heat of adsorption for zeolites and MOFs. The fingerprint representation reflects the hydrogen adsorption behavior and contains information about adsorption patterns and the temperature dependence of the working capacity. A clear correlation is found between the position of an NPM in the fingerprint space and its Tmax, with zeolites on the Pareto front of the temperature-capacity distribution located at the boundaries of the fingerprint distribution. The isosteric heat of adsorption is an important characteristic that helps to understand the adsorption behavior, and it is calculated using automatic differentiation and energy/particle fluctuations. Qst values at very low loading, plow = 2.71 bar, and at the midpoint of plow and phigh in logarithmic pressure space are also assessed.

The study demonstrates the effectiveness of the meta-learning model in generating coherent fingerprint representations reflecting the hydrogen adsorption behavior of each NPM. The fingerprint representation contains information about adsorption patterns, the temperature dependence of the working capacity, and correlations with the structural properties of zeolites. The isosteric heat of adsorption is an important characteristic that helps to understand the adsorption behavior, and it is calculated using automatic differentiation and energy/particle fluctuations. Qst values at very low loading, plow = 2.71 bar, and at the midpoint of plow and phigh in logarithmic pressure space are also assessed.[7]

Meta-Learning Model Enables Accurate Prediction of Hydrogen Loading in Nanoporous Materials, but Limitations Exist for Generalization to Other Adsorbate Molecules

The meta-learning model can predict the hydrogen loading surface of nanoporous materials (NPMs) accurately over a wide range of temperatures and pressures. The model can extrapolate to a limited extent beyond the temperature range of the training data and can be used for few-shot prediction, and the fingerprints also allow for the application of other dimensionality reduction methods such as PCA. While the meta-learning model can be used for few-shot prediction, other dimensionality reduction approaches require the number of samples for each material to be fixed. The model can be trained on a subset of representative materials with data covering the complete condition space, and then be applied via few-shot predictions to the complete set of materials. This method makes the maximum use of available data, and it is less expensive to perform high-throughput computation.

However, the study’s task distribution is limited, covering multiple NPMs but only for the same adsorbate molecule, i.e., hydrogen. It remains unclear whether the same meta-learning model can be generalized to the adsorption of a different adsorbate molecule. Fine-tuning the model is still necessary for applying a pre-trained model to materials with largely different structures from the training set. Extending the meta-learning model to multicomponent adsorption data may require a judicious selection of representative training data, which may markedly increase the problem’s complexity.

Accelerating the Discovery of Catalysts for Renewable Energy Storage through Machine Learning: Insights and Opportunities from the Open Catalyst Project.

The study also demonstrates that some sorbent materials attain their maximum hydrogen storage capacity at a high temperature of 140 K, and they retain 90% of their maximum capacity up to 165 K. This capability of using a higher working temperature may make the application of sorbent-based storage system in vehicles more feasible. The meta-learning predictions indicate that incorporating a modest temperature swing is more beneficial for increasing the working capacity than increasing the filling pressure or decreasing the depletion pressure. There is also a Pareto distribution between the maximum working capacity and the optimal temperature for a diverse set of NPMs. The article suggests finding and designing NPMs that extend the Pareto front of hydrogen storage.[8]

In conclusion, the Open Catalyst Project aims to accelerate the discovery of low-cost catalysts that can drive the necessary chemical reactions for renewable energy storage. The project has open-sourced the world’s largest training dataset of materials for renewable energy storage, called OC20, and has recently announced a new dataset focusing on oxide catalysts for the Oxygen Evolution Reaction (OER). The OER dataset is critical for the production of renewable energy and contains approximately eight million data points from 40,000 unique simulations. By using machine learning to accelerate the prediction process, researchers hope to identify promising catalysts in a matter of seconds, rather than hours or days. The release of these datasets and machine-learning models will help the global scientific community advance renewable energy technologies and develop improved catalysts for OER, which will advance several renewable energy technologies such as solar and wind fuel production, as well as rechargeable metal-air batteries, which are useful for electric cars.

REFERENCE

[1]https://ai.facebook.com/blog/accelerating-renewable-energy-with-a-new-data-set-for-green-hydrogen-fuel/

[2]https://www.mdpi.com/2076-3417/9/11/2296

[3]https://www.mdpi.com/2076-3417/9/11/2296

[4]https://www.mdpi.com/1996-1073/15/24/9557

[5]https://www.science.org/doi/10.1126/sciadv.abg3983

[6]https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8294760/

[7]https://www.science.org/doi/10.1126/sciadv.abg3983

[8]https://www.sciencedirect.com/science/article/pii/S0360319919310195

--

--