Self -Driving… Chemistry?

Edward Dixon
The Startup
Published in
10 min readJun 5, 2020


“Iterations” (Seema Gaur, Emil “GAN” Barbuta)

“Materials science” doesn’t have quite the pulse-raising associations as, say, “aerospace” or “robotics”. This is a shame! Stone tools — with wooden shafts — were key to the elevation of an ape with rather underdeveloped teeth and claws to the top of the food chain. Metallurgy gave us copper, bronze and iron — and corresponding waves of conquest and settlement (for at least 10,000 years, there has been strong selection pressure in favour of societies that let at least a few nerds tinker with ore and fire).

An extra pinch of carbon and a stubborn commitment to process improvement gave us high-quality steels to spark the industrial revolution. The semi-conductors inside the processor at the heart of the laptop on which I’m writing demonstrate our ability to manipulate matter at the nanometer scale… and yet, such achievements pale beside the sophistication of the composite materials in a shrimp’s claw (withstanding huge forces without the need of metals), or the efficiency (95%!) of the electron-harvesting elegance concealed within the green loveliness of chlorophyll. Life excels at creating macro-scale objects with nano-scale structure — we don’t.

Human mastery of matter at a certain scale is obvious — you can see our cities from orbit and we have changed even the composition of our atmosphere. However, our ability to manipulate matter at the tiniest scales remains rather limited. As we shall see, changing what is possible in this area may well become one of AI’s greatest contributions.

During the 20th century, physicists and chemists used the new understanding of the atom first to explore the structure of important molecules, and then, slowly, to learn to assemble them from simpler ingredients. Building larger molecules from simpler ones — synthesis — transformed our lives. Plastics are a great example — you have probably got some polymers in your clothes, but also insulating the wiring in your house, preventing food spoilage in your fridge and weatherproofing your exterior (even your lovely wooden furniture often relies on glues based on plastics). Chemosynthesis has been used to build more complicated molecules too, just look in your medicine cupboard — Aspirin, for example, or Salbutamol, the active ingredient in Ventolin inhalers — great medications that are now incredibly cheap to make.

It is here that we start to find some important limitations: we can synthesize simple molecules like these, but for more complicated ones — insulin, antibiotics — we have to ‘hire’ some assistants. Insulin, for example, is brewed in bioreactors using genetically modified yeast or bacteria: because we can’t figure out how to make it, we’ve copied and pasted the DNA sequence for the human version into tiny creatures who then churn it out in industrial volumes (a sort of high-end brewing process, not unlike beer-making). Although these medications have been an enormous boon, their discovery and manufacture also highlight our weaknesses:

  • New antibiotics must be found — they are not designed. For example, a key step in the mass-production of penicillin was the discovery of an unusually easy-to-grow mould on a cantaloupe in Peoria, Illinois. As much as we admire the diligent global search which found that cantaloupe, as a reliable, repeatable process for creating new medications, perhaps we could improve on it.
  • As obliging as our tiny bioreactor-dwelling workers are, their virtuosity in crafting large molecules is limited to what can be described by a DNA sequence (which itself must be short enough for us to synthesize successfully). This rules out almost the entire periodic table.
  • Even limiting ourselves to “things that yeast can make”, we still need to devise a DNA sequence or set of sequences that will yield the desired result. DNA is ‘expressed’ or ‘executed’ by being turned into proteins which need to fold themselves into the final, functional shape. Getting the sequence just right for a reliably folding result is not a simple problem.

We have discovered laws that describe the behaviour of matter at the tiniest scales with extraordinary precision — Quantum Mechanics — and so in principle, we should be able to design and build new structures at the tiniest scale at will. However, answering even a comparatively basic question about a new substance — “What is its boiling point?” remains extremely difficult. Instead of being designed from scratch, new materials are often found by a brute force search. Beyond finding the right material, actually making it in industrial quantities is an enormous challenge: new materials take between 5 and 15 years to commercialize. Even at Intel Corporation, with 50 years’ experience, where we do make devices that are exquisitely nano-structured, it takes an enormous building with a 10 to 11 digit price tag to get the job done.

One structure that illustrates both the difficulty and the potential is the carbon nanotube, a sheet of carbon atoms linked in a hexagonal pattern and rolled into a tube with walls just 1 atom thick. Tubes of this form have many interesting thermal and electrical properties — but their mechanical strength shows their potential: a cable of carbon nanotubes with a cross-sectional area of 1mm2 could hold a weight of 6 tonnes (picture a rather coarse thread suspending two pickup trucks!), a tensile strength more than 300 times stronger than high-carbon steel. Why don’t we see this wonder material everywhere? Two reasons:


As the 4th-most abundant element in the universe, we might naively expect carbon-based materials to be cheaper than steel (which is mostly iron, an element only about ¼ as abundant as carbon). Whilst the price of carbon nanotubes has fallen from $1600 per gram in 2000 (about 40 times dearer than gold) to $1 in 2018 (about 1/40th price of gold), a marvellous improvement, but it still leaves us 3 orders of magnitude short of matching the price of steel (about $1 per kilogram). “Cheaper than gold” isn’t a very exacting cost bar!


In 2007, the longest-ever tubes were about 1.8 centimetres in length. By 2013, the longest tubes were still just 50 centimetres long.

The enormous fall in price is due to the heroic efforts of experimenters to improve the synthesis of carbon nanotubes. So why can’t we ‘simply’ apply quantum mechanics to the problem of producing this material and skip all those tedious experiments? Shouldn’t those laws give us the recipe we need?

Unfortunately, simulations using these laws are exceptionally computationally demanding (at least on classical computers). Happily, physicists have found an elegant dodge: one way to think of machine learning models is as a function approximator — a way of looking at a complex piece of maths and coming up with a cheaper alternative that gets us within a whisker (“within epsilon”) of the right answer. Most of us think of machine learning as requiring a lot of computational power: physicists think of it as an incredible bargain offer that lets them simulate chemical reactions accurately for the low, low price of 1% of the cost of a “full fat” quantum simulation. We don’t have to know the exact parameters of the supply-demand curve for quantum simulations to forecast a correspondingly dramatic increase in the use of these simulations, as the cheaper (but still sufficiently accurate) machine-learning-based simulations become available. Already, simulations of this type have allowed scientists working at Canada’s National Research Center to predict the properties of carbon-based structures at an unprecedented scale, a key step to broadening the industrial use of carbon nanotubes and related structures.

Cheaper simulation is important as an enabling step, for understanding the behaviour of large quantum dynamical systems at the heart of material science, pharmaceuticals, computing and energy production. In addition, it also aids in designing or discovering useful molecules and, crucially, developing the synthesis methods needed to create them. At the time of writing, for example, designing new therapeutic molecules — new medicines — often involves working with rather hefty molecules containing thousands of atoms: amino acids and proteins are the language of molecular biology, hundreds of times larger and more complex than friendly little fellows like alcohol or caffeine.

One approach to discovering new drugs is to create a vast molecular library, then use robots to screen these molecules for therapeutic potential. Think of a drug target — a ‘receptor site’ (perhaps serving as a gatekeeper on the surface of a cell) as being an elaborate lock, and the molecular library as a vast collection of keys — jars of old relics from yard sales, master keys already known to open many locks, and new creations fabricated in the mere hope that their creator might one day find a corresponding lock (yes, chemists create variation after variation on interesting compounds, hoping one might turn out to be useful). The screening process is one of trying key after key in that tantalising lock, looking for something that turns smoothly and discarding those that don’t fit or jam — or open too many locks we would rather not touch. Imagine trying to create any other new product in the same way! Experts do try to hand-place atoms to craft new medicines, but, the very existence of robotic drug screening processes should warn the reader that this is a difficult endeavour.

If only we could build a suitably accurate (embodying the laws of quantum mechanics) simulation, capable of running at a reasonable cost, it would be possible to work in a different way. Instead of specifying a solution in the form of a molecule, a researcher would define desirable properties (perhaps the drug needs to bind to one particular ‘receptor’ site on a cell’s exterior but not to a great many similar receptors and to be susceptible to eventual breakdown into non-toxic metabolites by the liver’s enzymes) and allow a form of machine learning called reinforcement learning to construct a custom molecule. This form of machine learning is used to train policies to take a series of actions in a certain environment (usually simulated to reduce costs and development time), so as to make progress towards a specific goal. The learning works because the policy is adjusted by an optimization algorithm that attempts to maximize a ‘reward’. For example, changes to a molecule that increases toxicity would result in negative rewards, while changes that enhance specificity relative to the target receptor would attract positive rewards. This form of machine learning has produced policies (models) that defeat all opponents at games like Go and chess and has also had success with more frivolous tasks like using robotic hands to manipulate cubes, or training self-driving cars.

Unfortunately, finding the right molecular structure is only the first step: we then need to design a process to create that molecule and this is synthesis. We’ve considered a kind of self-driving chemistry for molecular design, but we could also benefit hugely from automating the design of synthesis, and figuring out the steps required to produce a particular molecule is a complex business which has so far proved resistant to automation. How important is the ability to design a synthesis process? A key component of the ‘green revolution’ that allowed agriculture to scale up in order to feed modern populations, is the use of artificial fertilizers. Among these, ammonia is especially vital — plants need it as a source of nitrogen. This might seem strange — our atmosphere is mostly nitrogen, so shouldn’t they just pull it out of the air? Unfortunately, the form of nitrogen we breathe (N2) is extremely unreactive, and plants have so far not evolved the necessary chemistry. Such is the difficulty that the humans who figured out a workable series of steps — the Haber-Bosch process — won a Nobel prize. The process is however extremely energy-intensive, accounting for more than 1% of global consumption of global energy usage, and contributing about half of the nitrogen in the tissues of the typical human (so great has been its contribution to agriculture).

That the synthesis of even a very small and simple molecule like ammonia can be extremely difficult (plants still can’t do it, and even human chemical geniuses need to use very high temperatures and pressures) should convince you of the potential benefits of automating the synthesis design, and the impact of the Haber-Bosch process should make clear the vast social value of such innovation. The example of the humble soil bacteria — casually fixing nitrogen at ambient temperatures and pressures so very different to the high-temperature high-pressure industrial process — shows us how much better we could be.

To connect the potential of machine learning to one of the great problems of our day — feeding an increasing global demand for energy while also reducing net CO2 emissions — better ways of generating, storing and transmitting energy hinge on materials science. In other words, better and cheaper composites for wind turbines, more efficient and cheaper solar panels, improved battery chemistries and high-temperature superconductors. Accelerating the discovery and production of new materials can have an enormously positive impact on how our societies develop and grow.

Returning to the topic of pharmaceuticals, it should now be clear that machine learning offers us a path to creating advanced drug design and testing in a datacenter. Given that the cost of bringing a new drug to market has reached $2.5 billion dollars and that the steady 20th-century trend of increasing life expectancies seems to be running out of steam (even in the face of unprecedented spending), the need to replace processes that hinge on luck and brute-force-search should be obvious.

In summary “self-driving chemistry” may allow machine learning to design new industrial and therapeutic molecules to order, and to transform the manufacturing processes which create them: it offers our most obvious path towards understanding the nanoscale world, and producing nanostructured materials in quantities and at prices that can really change our everyday lives. Since the advent of even one new material — copper, bronze, iron, steel — can remake our society, the value of a process that makes routine the design and production of new materials can hardly be overstated.

Follow me for more articles on new applications of Machine Learning and Deep Learning. To learn more about the application of machine learning to materials science, a good place to start is the Matter Lab (Toronto University) or this video of leading researcher Alan Aspuru-Guzik. Many thanks to another leading researcher in this field: I have Isaac Tamblyn to thank for the conversations which inspired this article.



Edward Dixon
The Startup

#AI guy, Principal/Founder @ Rigr AI, co-author of ‘Demystifying AI for the Enterprise’.