The Long Road to Reasoning in Chemistry

Tim Bonnemann
Open-Source Science (OSSci)
2 min readMay 28, 2024
Image of a female scientist talking “chemistry” to a computer (generated using ChatGPT by OpenAI)
This image of a female scientist talking “chemistry” to a computer was generated using ChatGPT by OpenAI.

Large language models (LLMs) have garnered significant interest for their impressive capabilities, particularly in generating and interpreting textual data. However, there is a growing enthusiasm, sometimes overly optimistic, about their ability to reason, especially in scientific contexts like chemistry. Recently, several LLMs have been developed specifically for chemical applications, targeting the description of chemical data.

The goal is to track these developments and understand their practical utility in chemistry. While general LLMs can handle broad chemical questions and discussions, specialized LLMs aim to address specific tasks important to chemists. However, these specialized models often use data from larger, well-known LLMs, and their actual usefulness for chemists remains a topic of investigation.

One critical area where LLMs fall short is in reasoning about specific molecules and chemical reactions. Chemistry is inherently compositional, and LLMs struggle with this aspect. The models are generally good at conceptual discussions but fail when it comes to making precise statements about chemical reactivity due to their lack of training in compositionality.

The focus is on benchmarking LLMs to identify their strengths and weaknesses in handling chemical reactivity. The aim is to understand where these models need improvement and how they can better serve the chemistry community. This involves creating detailed documents describing reaction prediction and the capabilities of LLMs in chemistry, hoping to find a productive overlap between these areas.

Ultimately, the goal is to push the development of LLMs towards more practical and reliable applications in chemistry, ensuring they can reason and make accurate predictions about chemical reactions.

The OSSci Chemistry IG has started to build a list of LLMs for chemical applications. This is an open document, and we welcome your contributions. Please join the Chemistry IG Google group, and you’ll automatically be granted edit privileges. Thanks!

--

--

Tim Bonnemann
Open-Source Science (OSSci)

Intersection of community & participation. Currently @IBMResearch. Wannabe trailrunner.