“Elementary, My Dear Watson”

Tim Bonnemann
Open-Source Science (OSSci)
2 min readApr 30, 2024
Screenshot taken from: PMechDB: A Public Database of Elementary Polar Reaction Steps (licensed under CC BY 4.0 DEED)

Thanks to Derek Ahneman (IBM) for sharing this brief introduction to one of the first two community projects in our Chemistry IG. Please see below for ways you can contribute and further grow this list. Thanks!

When it comes to available reaction datasets, the vast majority record information at the level of the overall transformation: “I put starting materials A, B, and C in the flask, and got product D in xx% yield.” There are a variety of open databases containing such information, including the workhorse USPTO dataset and the growing Open Reaction Database. The field continues to benefit from the increasing scope and quality of such overall (sometimes called “global”) reaction datasets.

There are significantly fewer datasets curated at the elementary step level. As every chemist knows, reactions proceed via stepwise mechanisms. Understanding these mechanisms is a key aspect of a chemist’s training, as it allows them to reason about stereo- and chemoselectivity, predict side and byproduct formation, anticipate substrate scope restrictions, and so on. The recent release of RMechDB and PMechDB datasets by the Baldi and Van Vranken groups are an important step in providing public datasets which can be used to train mechanism-aware ML models.

Below is a list of resources relevant to elementary reactions (both input/output chemicals and thermodynamic information). It is followed by a list of global reaction resources.

Elementary Step DBs

  • RMechDB
  • PMechDB

Elementary Step Barrier DBs

  • RGD1
  • Catalysis Hub
  • Green et al. (2020)
  • Green et al. (2022)

Overall Reaction DBs

  • Open Reaction Database (ORD)
  • USPTO Patent Data
  • NIST

Proprietary DBs

Proprietary overall reaction datasets include:

  • Pistachio (NextMove)
  • Beilstein/Reaxys (Elsevier)
  • SPRESI (InfoChem)
  • CAS (ACS)

For a more detailed description of each resource, please view our living doc.

If you would like to add to these lists, please join the Chemistry IG Google group, which will automatically grant you edit privileges.

If you’d like to join a call, the Chemistry IG meets on the last Tuesday of the month at 9am Pacific (12pm Eastern). Upcoming calls: April 30 and May 28.

--

--

Tim Bonnemann
Open-Source Science (OSSci)

Intersection of community & participation. Currently @IBMResearch. Wannabe trailrunner.