Crystalformer: Infinitely connected attention for periodic structure encoding (ICLR 2024)

Published in

OMRON SINIC X

11 min readApr 19, 2024

We are pleased to announce that the following work on a Transformer encoder framework for crystal structures has been accepted at the Twelfth International Conference on Learning Representations (ICLR 2024).

Tatsunori Taniai ¹, Ryo Igarashi ¹, Yuta Suzuki ², Naoya Chiba ³, Kotaro Saito ⁴ ⁵, Yoshitaka Ushiku ¹, and Kanta Ono ⁵: “Crystalformer: Infinitely Connected Attention for Periodic Structure Encoding”, In The Twelfth International Conference on Learning Representations (ICLR 2024), 2024.
[OpenReview] [arXiv] [Project]
¹ OMRON SINIX X Corporation, ² Toyota Motor Corporation, ³ Tohoku University, ⁴ Randeft Inc., ⁵ Osaka University

The ICLR is recognized as one of the top venues for machine learning research among the Neural Information Processing Systems (NeurIPS) and the International Conference on Machine Learning (ICML). The ICLR 2024 is held from May 7th through 11th at Vienna, Austria, and we will present our work at a Poster Session from 16:30 to 18:30 on May 9th in UTC+2. (A virtual poster place is also prepared by the conference.)

This work was done collaboratively by research groups at OMRON SINIC X and Osaka University, as a part of JST-Mirai Program, Grant Number JPMJMI19G1.

Contents of this article

Materials Science and Crystal Structures
Neural Networks for Crystal Structure Encoding
Transformers for Molecular and Crystal Structure Encoding
Crystalformer using Infintely Connected Attention
Interpretation as Neural Potential Summation
Performed as Finite-element Self-attention
Overall Architecture
Results
Conclusions, Future Work, and Take-away Messages
Call for Interns
Related Projects

Materials Science and Crystal Structures

Machine-learning-based material property prediction.

Our work is an interdisciplinary study in machine learning and materials science. Materials science is a field to explore and develop new materials with useful properties and functionalities, such as superconductors and battery materials. Since the current material discovery and development processes are time-consuming and labor-intensive, there is an growing demand for next-generation AI technologies to accelerate these processes.

Materials are represented by crystal structures. They are infinitely repeating, periodic arrangements of atoms in 3D space, described by a minimum repeatable pattern called unit cell.

The unit cell defines the atomic arrangement and lattice information. The atomic arrangement defines the number of atoms, N, in the unit cell and their 3D positions [p_1, p_2, …, p_N] and species [x⁰_1, x⁰_2, …, x⁰_N]. The lattice information defines three lattice vectors, l1, l2, and l3, that represent translations of the unit cell in 3D space.

Given the unit cell information, the crystal structure can be recovered by repeatedly translating each atom i in the unit cell to the following position:

where n1, n2, and n3 are integers representing unit cell shifts along l1, l2, and l3. We denote such a translated atom of unit cell atom i with shift n by i(n), and use j(n) similarly.

Crystal structure and unit cell in 2D space.

Neural Networks for Crystal Structure Encoding

Given a unit cell of a crystal structure, ML-based crystal property prediction is usually performed by defining the state features for the atoms in the unit cell and evolving them through interatomic message-passing layers in a neural network.

Evolution of atom-wise states through interatomic message-passing layers in a neural network.

In our work, initial state features of atoms only simbolically represent their atomic species via atom embedding (AE), and we evolve them through interatomic interactions that are induced by the spatial information of the crystal structure (i.e., the atomic positions and lattice vectors).

In existing methods, the interatomic message passing is usually performed by Graph Neural Networks (GNNs), by representing crystal structures as graphs. To represent the periodicity of crystal structures, these GNNs employ multi-edge graphs [Xie & Grossman, 2018]. These graphs connect each pair of unit-cell atoms, i and j, with multiple edges to incorporate connections between atom i and atom j’s translated atoms j(n).

Graph edges in standard GNNs and our Crystalformer.

Transformers for Molecular and Crystal Structure Encoding

While GNNs are dominant approaches in both molecular and crystal structure encoding, the emergence of Transformers [Vaswani et al. 2017] is changing the situation. After several studies attempting to replace key components of GNNs with attention mechanisms, Ying et al. 2021 first succeeded in applying a standard Transformer architecture for molecular graph representation learning. Their method, called Graphormer, employs fully connected self-attention blocks for the state evolution and showed excellent performance in property prediction of molecules.

Recent years, Transformers have demonstrated their outstanding ability to model complex interdependencies among input elements (e.g., resolving ambiguous meanings of words in sentence context) as well as their great flexibility in handling irregularly structured or unordered data such as point clouds.

We believe that these capabilities of Transformers should benefit the encoding of crystal structures and molecules because atoms in these structures interact with each other in complex ways to determine their states, and also they are structured in 3D space rather than regularly ordered.

However, extending the fully connected attention for crystal structures results in a non-trivial formulation, namely infinitely connected attention, which involves an infinite summation over repeated atoms. Our Transformer framework, Crystalformer, enables this infinitely connected attention for crystal structures, through an analogy to physics simulation.

Crystalformer using Infinitely Connected Attention

Fully connected attention for finite elements.

To derive our self-attention formulation for crystal structures, let’s start with a simpler case of finite atoms in molecules. Given the state variables x of N atoms in a molecule, we can formulate their state evolution as fully connected self-attention as follows.

Fully connected self-attention for finite elements (e.g., atoms in a molecule).

Here, q, k, and v are query, key, and value features obtained as linear projections of input x, and Z_i is the normalizer of the softmax function defined as the sum of the weights exp(….).

In Transformers, x as well as q, k, and v are position-agnostic. Therefore, we need to add position information into them. We here employ relative position representations (φ and ψ), as scalar and vector biases to softmax logits and value vectors. To make the networks invariant to rigid transformations (i.e., SE(3) transformations such as rotation and translation), we formulate φ and ψ based on spatial distances between atom pairs.

Infinitely connected attention for periodic elements.

Similarly to molecular encoding, crystal structure encoding is performed with state features of the finite atoms in the unit cell. But simply applying the self-attention for these atoms only considers interactions between unit-cell atoms i and j and ignores those with translated atoms j(n) outside the center unit cell. Therefore, to extend the self-attention for crystal structures, we add an infinite summation over all 3D integer vectors n ∈ Z³, to add interactions with translated atoms j(n).

From fully connected attention to infinitely connected attention.

Note that translated atoms j(n) share the same state with atom j in the unit cell. Therefore, we can assume v_j(n) = v_j and k_j(n) = k_j, meaning that unit cell shift n only affects on the positions of key-value atoms j(n) in position enoding φ and ψ.

Interpretation as Neural Potential Summation

Our infinitely connected attention describes that the state of each atom, y_i, is possibly influenced by all the atoms in the structure, j(n), through v_j + ψ_ij(n) as abstract influences on i from j(n), weighted by scalar attention weights, exp(…)/Z_i.

We interpret this formulation as an abstraction of potential summation, which is usually performed for energy calculations in physics simulation. For example, the electric potential energy between center ion i and other ions j, with charges Q, is calculated as follows.

Calculation of electric potential energy using the Coulomb potential, which is proportional to 1/r.

In the physical world, potentials between atoms tend to decay as their distances increase. We apply this rule to our self-attention formula to derive a computationally tractable formulation.

Specifically, we use position encoding φ to explicitly decay the attention weights by interatomic distances. For simplicity, we adopt Gaussian distance decay functions as exp(φ), as follows.

Performed as Finite-element Self-attention

Although the infinitely connected attention may look complicated, it can be performed similarly to the standard fully connected attention for finite elements, by rewriting the fomulation as follows.

Infinitely connected attention can be performed as fully connected attention for finite unit-cell atoms with special forms of position encoding terms.

Here, new position encoding terms, α and β, are served to embed the peridicity information. (Tips: β_ij is another softmax attention computing the weighted average of ψ_ij(n) over n, where exp(-α_ij) works as the normalizer of exponential weights exp(φ_ij(n)).)

When α and β are tractably computed, the inifinitely connected attention can be performed by simply adding α and β to softmax logits and value vectors in standard self-attention, as follows.

Pseudo-finite periodic self-attention in matrix-tensor diagram.

Overall Architecture

The overall architecture of Crystalformer is simple and closely follows the architecture of the original Transformer encoder [Vaswani et al. 2017].

As an important difference from the original architecture, our self-attention block entirely removes Layer Normalization. We found that a normalization-free architecture with an improved weight initialization strategy proposed by Huang et al. (2020) is beneficial to stabilize the training.

Results

Property prediction benchmarks.

We evaluated our method for several property prediction tasks using a Materials Project dataset and the JARVIS-DFT 3D 2021 dataset. We show mean absolute errors (MAE) for these prediction tasks below.

Performance comparisons with state-of-the-art neural-network-based methods.

Remarkably, our method consistently outperforms most of the existing methods, except for PotNet, in all the prediction tasks, even without stochastic weight averaging (SWA).

Ablation study.

While position encoding φ for attention weights is important to make the infinitely connected attention tractable, position encoding ψ for value vectors is another key to properly encoding periodic structures. (This is because the infinitely connected attention without ψ cannot distinguish between crystal structures of the same single atom in differently sized unit cells. See Appendix D for more details.)

When we evaluate a simplified model that uses no value-based position encoding ψ, the performance drops significantly, as shown below. Note that this simplified model is close to the Graphormer’s architecture.

Increasing the number of self-attention blocks.

We further evaluated performance variations when changing the number of self-attention blocks in Crystalformer, by utilizing the simplified variant.

The above results suggest that the performance increases steadily with more blocks, while moderately mounting on a plateau with four blocks. We further tested a larger model of totally 7 blocks, with the full arhictecture using ψ.

The above results show that increasing the number of self-attention blocks can lead to further improvements, given sufficient training data.

Efficiency comparison.

We also compared the efficiency of our method with GNN-based and Transformer-based state-of-the-art methods, PotNet and Matformer.

Our method has highest parameter efficiency and inference speed among them.

Conclusions, Future Work, and Take-away Messages

We have presented Crystalformer as a Transformer encoder for crystal structures. It stands on fully connected attention between periodic points, namely infinitely connected attention, with physically-motivated distance-decay attention to ensure the tractability.

In the present work, we keep our method simple to provide a flexible Transformer framework, hoping that it serves as a baseline Transformer model in this area. It is possible to extend our framework as follows.

Incorporate long-range interatomic interactions by performing the infinitely connected attention in Fourier space. We demonstrated this idea in Appendix J in the paper.
Incorporate richer information than interatomic distance, such as angular and directional information. This can be done by extending relative position representations φ and ψ.
Extend to SE(3) equivariant networks. Since position information is only used by position representations (φ and ψ), extending these terms to be SE(3) equivariant will enable geometric prediction tasks such as force prediction.
Incorporate known forms of interatomic potentials. By modifying φ, we will be able to exploit our knolwedge about the forms of existing interatomic potential functions.

Architectural recipes of crystal-structure encoders

Our framework makes minimal modifications to the original Transformer architecture. We summarize them as the following architectural recipes.

Use relative position representations in the following two forms:
1) Bias for softmax logits to explicitly decay attention weights by interatomic distances.
2) Bias for value vectors to properly encode the periodicity of crystal structures. (See Appendix D in the paper.)
Use the normalization-free Transformer architecture with improved weight initialization by Huang et al. (2020) to stabilize the training. (We have confirmed that Crystalformer with 20 blocks can be successfully trained by this scheme.)

Call for Interns

OMRON SINIC X is looking for research interns throughout the year to work with our members in challenging research projects on a variety of topics related to robotics, machine learning, computer vision, and HCI. Many students have participated in our internship program, and their achievements have been published as academic papers at international conferences such as CVPR, ICML, IJCAI, ICRA, CoRL, or as OSS libraries. For more information, visit our call for interns.

Related Projects

Suzuki et al. “Self-supervised learning of materials concepts from crystal structures via deep neural networks”, In Mach. Learn.: Sci. Technol. 3 045034, 2022. [Paper]
[TL; DR] Self-supervisedly learn a map of materials space from crystal structures and their XRD patterns.
Chiba et al. “Neural structure fields with application to crystal structure autoencoders”, In Communications Materials volume 4, Article number: 106, 2023. [Paper]
[TL; DR] Represent crystal structures as continuous fields in 3D space for crystal structure decoding.

References

Xie & Grossman, “Crystal Graph Convolutional Neural Networks for an Accurate and Interpretable Prediction of Material Properties”, Phys. Rev. Lett., 120:145301, Apr 2018.
Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, Di He, Yanming Shen, and Tie-Yan Liu. “Do transformers really perform badly for graph representation?” In Advances in Neural Information Processing Systems, volume 34, pp. 28877–28888, 2021.
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. “Attention is all you need”, In Advances in Neural Information Processing Systems, volume 30, 2017.
Xiao Shi Huang, Felipe Perez, Jimmy Ba, and Maksims Volkovs. “Improving transformer optimization through better initialization”, In Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pp. 4475–4483. PMLR, 13–18 Jul 2020.