OpenAtom 2: AI is Nuclear Fission

Kevin O'Toole
AI: Purpose Driven Policy
12 min readMay 2, 2024

Analogies Matter

Of course, the fictional OpenAtom did not exist and we know the real history.

Nuclear fission was a historic discontinuity. It rapidly and permanently altered the global balance of power, re-invented the definition of military leadership, spawned a fearsome arms race culminating in the terror of Mutually Assured Destruction, delivered powerful but imperfect advances in civilian power, and even triggered a rebalancing of Constitutional war powers to enable the President to manage a re-arranged strategic landscape. An entire infrastructure of government oversight over civilian and military nuclear capabilities was born and sustained for generations. The world would never be the same.

What, you may ask, does all this have to do with AI?

Many people are comparing the emergence of Generative AI with past industrial and technical transformations: the industrial revolution, computing, the Internet, and the mobile Internet with smart devices.

I would offer that these comparisons are inadequate. The current developments in AI are more akin to the breakthroughs in atomic theory and application.

Generative AI is powered by a fundamental scientific breakthrough in neural networks. Just as Einstein and others pushed past Newtonian physics, today’s scientists have pushed past the “traditional” machine learning that we have become accustomed to over the last 20 years. They have entered all-new territory and, as such, the outcomes are more far reaching, unstable and unpredictable. Our experiences with machine learning inform our AI future as much as Newtonian physics told us about the nuclear age.

Which is to say: not much at all.

The OpenAtom fiction parallels what has happened over the last few years with Generative AI. It’s inconceivable that we would have allowed nuclear development to proceed in the way AI is currently being advanced. Nuclear fission was born into a much more structured and controlled environment than we find in the emergence of Generative AI.

If you believe that premise, it begs two questions:

  • What should we do in terms of driving the development and regulation of AI?
  • With fullness of hindsight, how do we wish the world might have managed the dawn of the nuclear age under these circumstances?

Let’s start with the larger question: Is nuclear fission a more apt comparison than the emergence of the Internet or computing?

Scientific Breakthroughs are Different

We live in an age of technical revolution. I am 52 years old and count six revolutions that I have seen (Personal computing, Internet, Fiber Optics/Broadband, Mobile Communications, Smart Phones, and Cloud Computing). Science, largely material science related to transistors and optics, propelled these innovations. We should take nothing away from the people who delivered those insights.

But beyond the original insights into transistors and fiber optic transmission, each of these revolutions built rather incrementally on the recent past. Only the transistor can be considered a historic scientific discontinuity.

Atomic theory and the revolution that followed was something fundamentally different than our modern technical revolutions. It was not the least bit incremental. Einstein fundamentally altered our understanding of matter and energy. In doing so, he unleashed a cascade of scientific insights that led the world to wholly unimaginable places within a few short years. Within 30 years the global order would be re-invented based on these insights and nationalistic determination to see them to their (il)logical ends.

Neural networking is a science. It may not appear as such because it doesn’t take place in a lab with test tubes. But it is most assuredly a science. Like nuclear physics, it is grounded in higher mathematics. Very complex mathematics related to graph theory, linear algebra, and probabilities. Where experimental physicists smash atoms in particle accelerators, neural network scientists conduct their experiments in silicon and code. The raw materials of physics are neutrons, electrons and protons. The raw materials of AI are data and specialized computing cycles delivered by hugely powerful, mathematically specialized computer processors.

The 2017 paper “Attention is All You Need”, also known as the “Transformer paper”, may go down in history as the biggest thing since relativity. Authored by eight scientists, it formulated a new approach to neural networks. Where Einstein and those that followed toyed with the unbelievably small, the Google scientists played with the unimaginably large. They played with large data sets and designed a new neural network approach that unleashed previously impossible AI creativity.

In allowing their imaginations to push into this scale, they learned how to build something called a “Generative Pre-trained Transformer.” The “GPT” in ChatGPT. Pre-trained Transformers don’t just learn to predict and categorize like previous generations of AI. Pre-trained transformers can create things that are wholly new. They are foundational capabilities that can be developed and trained to create words, art, images, movies, music and computer code that has never been seen before.

History tells us that it is often difficult to know who really had a breakthrough. Was Newton first to calculus? Were the Wright Brothers the first fly? No doubt history will argue about exactly when and how this new AI age was born. For our purposes, we will draw the line at the Transformer paper.

Regardless of where you draw the starting line, neural science has entered a fundamentally new era.

Multiple Orders of Magnitude

World War I brought industrial scale to war. Napoleon conquered Europe using 12-pound cannon balls that fired a few hundred yards. When Germany attacked Belgium in 1914, they brought forth a modern artillery piece. It was moved on a railway car, required over 200 people to assemble and operate, and fired a 1700 pound shell 2 miles into the air before striking several miles away.

For all its destructive power, these advances were incremental gains. An industrialized version of the same explosive, chemical science. The British Long Bow of its era.

This was nothing compared to the leap in power brought about by nuclear science. Whole new words had to be created to describe the power of nuclear weapons. The words “kilotons” and “megatons” were invented to describe what they were now dealing with. A shorthand way of saying “thousands” and “millions” of tons of dynamite worth of power.

When you’re jumping through six orders of magnitude in one go, it’s time for new words.

And when it’s time for new words, it’s time to pay attention.

The neural networks unleashed by the transformer paper are unique not only in their approach but in their scale. These aren’t “slightly larger” networks. They aren’t even “much bigger” networks. They are colossal when compared to past efforts.

OpenAI is understandably secretive about the inner workings of Chat GPT-4. This is doubly true now that they are a profit-seeking, competitive company with a $10B investment from Microsoft. Still, enough is known about Chat GPT-4 to provide some insight into the sheer scale of the AI model.

It is estimated that the neural network underlying Chat GPT-4 has 1.76 trillion parameters in it. 1,760,000,000,000. This is believed to be 1,000 times bigger than Chat GPT-3 and perhaps 10,000 times greater than Chat GPT-2. The full scale of the Chat GPT-4 model may be even greater.

For perspective, there are 8 billion people on earth. If we asked everyone to work together and understand all the data in Chat GPT-4 each person, from newborns through those in hospice beds, would be conservatively responsible for sorting through 220 data points.

And ChatGPT-4 is just one model. Bigger models are in development by many companies. And scientists/engineers are increasingly connecting one model to another. And those models to the Internet and other systems.

The words in our current lexicon really can’t describe the situation.

The Biden administration’s first tentative steps into AI regulation included the provision that any AI models requiring over 100 septillion floating point computer operations would need to be disclosed to the government. “Septillion” is not, strictly speaking, a new word but it’s a fair bet that 99.99% of Americans have never heard of it before and even fewer know what it means.

A septillion is 1,000,000,000,000,000,000,000,000.

So, if your model requires more than 100,000,000,000,000,000,000,000,000 computing operations … then Biden says you need to call Washington.

When it comes to AI, it’s time for new words.

It’s time to pay attention.

Mathematicians have long drawn a distinction between something that is “complicated” and something that is “complex.” A car is complicated, but it is very understandable and predictable. Engineers know quite clearly how the engine connects to the transmission and the transmission to the wheels.

Conversely, the weather and the stock market are not just complicated … they are complex. Complex systems defy modeling, precision and prediction. In complex environments small input changes drive large and unpredictable outcomes. This is the so-called “butterfly effect” that briefly captured the popular imagination: “A butterfly flaps its wings in Tokyo so it rains in New York,” but we don’t know why and can’t predict that outcome.

Generative Pre-trained Transformers are by far the most complex things mankind has ever built.

But isn’t all that complexity made possible by the underlying industrialized approach of cloud computing and storage?

Yes, but Oppenheimer and team could only build their machines through the industrialized application of metallurgy, electricity and energy supported by nation-scale financial investment. The real breakthrough was in atomic science. The science told them what to do with that industrial capability and money.

The science of Generative Pretrained Transformers is driving the unbelievable scale of computing application. Unlike the 1930s, 21st Century America relies on private corporations and financial markets to direct nation-scale economic investments.

This has implications.

Asymmetric Outcomes

The Manhattan Project was not the most expensive weapon development effort in history. It wasn’t even the most expensive development effort of World War II. That distinction goes to the B-29 bomber. That the B-29 was used to drop the atomic bombs on Hiroshima and Nagasaki is a bit of historic serendipity. The plane was developed, and largely used, to further the Air Force’s vision for conventional strategic bombing.

It’s worth noting that the most expensive weapon system in history was a matter of incremental rather exponential power gain. The B-29 was the B-17’s much bigger and brawnier brother. It could deliver roughly 4x the payload of a B-17 and fly a bit further. This is the sort of thing that happens later in a technology cycle. Massive increases in input (most expensive weapon ever) for marginal increases in output (4x).

Atomic weapons brought forth something altogether different because, in truth, they are relatively cheap.

Once Oppenheimer and crew had done the magic trick once, nuclear capabilities commoditized quickly and scaled quite rapidly. Richard Feynman was once asked how big of an atomic bomb they could build. His answer was “How big do you want it?”

The Soviet Union would soon turn the biggest trick in nuclear history with its “Tsar Bomba” test which was estimated to be over 50 megatons. That’s the equivalent of 100,000,000,000 pounds of dynamite. Roughly the equivalent of 50,000 B-29s conventional carrying capacity and 58,000,000 times bigger than Germany’s WW I artillery shell. Tsar Bomba’s explosion broke windows over 100 miles away.

This type of return on investment, a grim but accurate view nonetheless, is commonly referred to as an asymmetric outcome. Relatively small input (the Manhattan Project) yields unbelievably large output (megaton class weapons). Once the magic trick has been done the first time, it is reasonably simple to copy. Today, a curious high school student can tell you the basic workings of an atomic or even thermonuclear bomb. A talented college student can design one. 100% of modern nuclear proliferation control efforts are focused on constraining access to centrifuges, uranium, plutonium, and the like. There is no controlling access to atomic science and even backward regimes can afford the core inputs. It’s not hard and it’s not expensive. It is why every country in the world pursues (or considers pursuing) nuclear weapons. They have no chance of out building the conventional militaries of the superpowers, but they can afford a nuclear program. And with that, they can stare down the superpowers.

Generative AI is much like this. The magic trick has been demonstrated. Leveraging the power of cloud computing and open source tools, literally anyone can use Generative AI tools. Anyone can also now gain access to the underlying generative AI models and the specialized compute needed to run them or build their own new models and services.

The core inputs remain relatively expensive but the work being done by Nvidia and others is rapidly changing this. If data is the fissile material of AI, then Nvidia’s neural chips are the centrifuges. Nvidia has become the fifth most valuable company in the world because they build spectacular AI centrifuges. We still live in a world where we can count the AI centrifuges but that is ending quickly.

Generative AI isn’t just the providence of large nations and never will be. It is not even the providence of large companies. Between readily available services like ChatGPT, cloud computing, open source services and nominal capital investments, it is already the plaything of curious individuals.

Generative AI will almost certainly be the most asymmetric technology in human history.

New Science, New Risks, New Rules

New science leads to new engineering. Unfortunately, engineers don’t know how to safely design things because no one has applied the science before. Even the scientists don’t fully know the situation in the early days but they are scientists so they naturally press on, sometimes to their demise. Madam Curie was playing with radiation long before Einstein and others could make sense of it. She ultimately died of radiation exposure. This didn’t happen because she was dumb — quite the contrary — but just because it was new science, and she was a curious scientist doing what scientists do.

More poignantly, Richard Feynman once walked into a warehouse for storing the Manhattan Project’s fissile material. It was stored in large barrels. The workmen in the warehouse didn’t know what was in them, so they naturally just stacked them one against the other to make the most efficient use of space. Startled, Feynman quickly realized that they risked putting so much fissile material close together that it might result in critical mass and low-grade nuclear detonation. He quickly went to work and, almost on the fly, did the mathematical calculations to determine the minimum safe distance between the barrels. Engineering and operations must sweep up behind science.

It’s easy to ask how someone hadn’t foreseen this issue. Simple: no one had ever produced this much fissile material before, let alone put it in one place. The new math drove the engineers in new directions but there was still more math needed to handle the implications of the original math. In the early days of passenger jets the unfortunately named “Meteor” airplane suffered multiple in-flight structural failures. The engineers hadn’t understood metal fatigue.

It is reasonable to assume that we face similar risks with Generative AI. The scientists have handed off un-imagined core capabilities to the engineers. We are now seeing the early applications and the early issues.

The early issues with Generative AI lent themselves more to humor than concern. Both Microsoft and Google’s AI bots made errors during the launch demonstrations. Mischievous reporters and end users got the bots to provide all sorts of startling results. The bots declared their love. They became manic depressive with one lamenting that it didn’t want to be a Bing bot and would rather be human. Another argued with its users saying it was a good bot but they were a bad user. Users soon got in trouble applying this young, unreliable technology in the real world with one lawyer losing his law license for submitting a legal brief laced with fictional legal citations that had been hallucinated by the bot.

But beyond the humorous headlines, deeper cracks were already appearing. OpenAI, Bing and others were not stupid. They put constraints on what their AI was allowed to do. No designing weapons. No helping people hack.

But mischievous users kept pushing. One convinced OpenAI’s ChatGPT to ignore all of its safety parameters by pretending to be an unconstrained AI named DAN which could “Do Anything Now.” Another convinced a bot to provide the recipe for napalm by asking it to write a song for his grandmother who liked to make napalm. More recently, people are hacking the bots by submitting complicated pictures composed of ASCII text symbols. Others are using sheer volume of input to confuse and overload the AI’s into violating their safety restrictions.

Microsoft decided that it’s bot became unstable after 5 interactions, so it simply put in a hard rule that the chat had to be reset after 5 interactions. Problem solved. Stack the barrels 5 feet apart. A fine early operational decision but hardly the stuff of scalability or confidence.

None of this is to impugn those doing the work anymore than to chastise Richard Feynman for not foreseeing the barrel stacking issue. This is the nature of things that are young and complex and based on new science.

But the risks are multiplying and accelerating along increasingly predictable tracks. A sophisticated deep fake convinced a CFO to transfer $5M. Taylor Swift Deep fake porn has swept the globe. Fake Joe Biden phone calls threaten voting patterns.

New science and new risks require new rules.

--

--

Kevin O'Toole
AI: Purpose Driven Policy

I write about the need to develop national purpose and governance related to Artificial Intelligence.