Fundamentals

A Programing Language For Bacteria

Using biocomputing to harness the power of DNA to solve difficult problems that no classical computer ever could.

Rachel Lee
Bioeconomy.XYZ

--

In three years the hard drive disc on your computer will deteriorate, possibly losing much of your data, forcing you to buy a new one.

In four years your laptop will begin to slow down, because of the vast amount of data being stored in it.

The quantity of global digital data is doubling every year.

In March 2022, it was estimated that computer chips would have broken Moore’s Law, reaching the smallest size possible. Smaller computer chips equals a more efficient, faster computer. Once computer chips have reached their limit, a new and better way to store data will need to be implemented.

And do you know what that new, better way is?

It’s bacteria!!!🦠️

Overview

Cells in the human body are just like computer software — they store information and compute information in order to function. Our body is basically a large computer, the best computer in the world actually. Our body literally grew and trained itself, and is constantly fighting off diseases and programming new information.

The adult human body roughly has 1000 gigabytes of data. Compare this to the world’s most powerful supercomputer which can store 200–300 gigabytes.

The point being that the human body is the best, most efficient computer in the world.

We can learn from the human body, and we can create a system that will last for thousands of years and keep our data safe and secure.

Deoxyribonucleic Acid

Have you heard of this word? (de-oxo-ri-bous-nu-clay-yuc acid)

Of course, you have!, it’s DNA 🧬️

DNA is a very complex molecule that contains all the information needed to maintain and build an organism. All living things (animals, plants, and even clouds!) have DNA in their cells. Basically, every cell in a multicellular organism contains DNA in the nucleus.

DNA stores all our biological information; eye color, hair color, skin tone, and personality. DNA programs our entire body.

Double helix bond DNA diagram | source

DNA is composed of four amino acids:

  1. Adenine (A),
  2. Cytosine (C),
  3. Guanine (G)
  4. Thymine (T)

These four acids are the basis of DNA, and all information of a living thing is coded by these four acids.

When you place a bunch of adenine, cytosine, guanine, and thymine together in a bowl, these four acids automatically bond together and form a double helix structure like in the diagram to the left.

Adenine forms a base pair (strong bond) with thymine, and cytosine forms a base pair with guanine. Each base pair is held together with a strong backbone of alternating sugars and phosphates. The specific combination of 3 different acids is called codons. Codons are what give our cells instructions to make different proteins and chemicals essential to our body’s functions.

An example of a codon is the codon ATG, which is the codon for methionine. Or the CTA codon for leucine. Can you see how codons are kinda like a secret code in our bodies? (Keep this idea in mind for later:)

The incredible thing is that one gram (codon) of DNA can store 215 million gigabytes of information! 🤯

This means that if we encoded all the information in the world into DNA, all that information would fit into the size of a shoebox (!!!)

Let’s take a step back for a sec…how DO we code information into DNA? Before we get into that, we need to understand how our current information is stored.

How is our data stored?

Bits

Computers operate because of the micro-wires and circuits that carry all the information throughout the computer. If you have one wire in a computer with electricity and information flowing through it, the signal can either be on or off.

This on/off state in a single wire is called a bit, which makes up the hardware in a computer, and is the smallest piece of information a computer can store.

Bits make up all information in computers. From text, to pictures, to sound, everything is made up of complex sequences of 1s and 0s in a computer.

For example, the letter B is this:

01000010

in binary, and the number 9 is this:

01001

(If you want to learn more about how classical computers work, check out the video I made here 💻)

A’s C’s G’s and T’s to 1s and 0s

Let’s start with some background:

In 1999, scientists in New York created a secret alphabet, where each letter/number/symbol was a specific codon of DNA.

The scientists tested their alphabet by writing a message in DNA. They gathered each separate codon (which matched each letter/symbol/number of their message) together into a long chain, with genetic markers on the ends. They dropped the liquid DNA strand onto a letter, and hid it under a typewritten period, with only a small smudge to give the location away. The scientists mailed the letter back to themselves (I don’t know why they mailed their letter back to themselves, it seems like more work to me??). They examined the letter, located the DNA strand, found the genetic marker, and decoded the DNA in between, converting the DNA back into letters and deciphering the code.

This is when we discovered that DNA cryptography could be a thing, but more importantly [and relevant] than that, we could store our information in DNA!

Since the whole codon-alphabet-thing is a bit tedious, scientists decided to create a code where each codon matched a specific sequence on binary code, 1s, and 0s.

In 2012, a genetic lab in the UK coded 739 KB of data into DNA strands, and 4 years later scientists from Microsoft and the University of Washington broke that record, storing 2120 megabytes of data in binary code.

‘But, how do we code DNA into 1s and 0s?’

Very good question.

There are 6 steps:

  1. Encoding ~ step one is first finding the information you would like to store in DNA. Let’s say that it is the original recording of Martin Luther King’s ‘I Have A Dream’ speech. First, we need to retrieve the video on binary, then we can start transferring the binary digits into the corresponding codon. For example, the letter ‘I’, is ‘01101001’ in binary, and is ‘CTG’ in the codon alphabet. Right now, this process is still manual, and as you can see, takes a lot of time and effort, but hopefully, someday soon this step will be automated.
  2. Synthesis ~ DNA synthesis is the ‘replicate’ stage, where we take one DNA strand and replicate it. We use the replicated version of DNA in DNA storage, not the original DNA strand. The synthesis step has 3 sub-steps: 1) Open the DNA double helix structure. 2) Separate DNA strands and codons from each other. 3) Assemble a new DNA sequence.
  3. Storage ~ Now that we have encoded our data into DNA, we can store our replicated strand. This process usually involves transferring the DNA (which is in a liquid state) into a new test tube or container, labeling it with a barcode that matches the contents of the container, and storing it in a dark room. DNA stays perfectly intact and functioning for about 520 years before it slowly begins to deteriorate. It’s not until 6.8 million years have gone by that a DNA strand is completely deteriorated. However, if DNA is stored in cold freezers (-80 degrees Celsius), in an airtight container in a space where there is no light — the DNA can remain perfectly preserved for millions of years.
  4. Retrieval ~ When the time comes to access your information, it’s as easy as calling the lab. Behind the scenes, the scientists scan the containers to find the correct barcode, and bring the container out into the main room (usually this is done in intervals since the DNA isn’t used to the temperature/light/noise/etc.)
  5. Sequencing ~ This step involves breaking apart the DNA strand into its codons, making sure not to move codons out of their order.
  6. Decoding ~ Now scientists can match each codon to its binary letter/digit/symbol, reprogram the binary code into a computer program, and send the user back their file.
Source

I know what you must be thinking…‘why would you ever go through that much work to store a file?’ 🤔

Benefits Of DNA Storage

→ Very storage dense (so much information can fit in 1 codon of DNA)

→ A large amount of DNA can fit into a very small space

→ It is very light in weight

→ Has a very long life cycle

→ High level of data security

But of course, everything in life has challenges…especially DNA storage.

Challenges Of DNA Storage

→ Since DNA storage depends on biology, engineering, chemistry, lots of scientific research, and large labs, it is very expensive.

→ Time-consuming (data needs to be sent to another lab, and the encoding and decoding steps are very involved)

→ To preserve information stored in DNA for a long period of time, cold temperatures and dark light (which is both expensive and energy consuming) are required.

→ Advancing this new technology is very expensive.

Biocomputing

Not only can we harness the amazing powers of DNA for information storage, but we can also use DNA as a base for computers!

The Problem

Computers chips and transistors are now the size of a few nanometers in length (a few silicon atoms thick), and cannot be made any smaller. If engineered any smaller, the electrons will leak out of the transistor due to heat, and affect other components. A minimum number of atoms are required to create a functioning transistor, and we have reached that limit.

Solution

While quantum computing is definitely a possible solution, it is still far off, and has its challenges (check out my in-depth article about quantum computing here). Biocomputers can be just as (or even more) powerful than quantum computers, and it doesn’t have the decoherence problem or need extremely cold temperatures.

Diagram of a biocomputer | source

What Leonard Adleman demonstrated in 1994 was that DNA could solve the traveling salesman problem:

There is a company that needs a salesman to travel to 500 different cities to advertise a product without going through one city twice. This problem would take a classical computer longer than the lifetime of the universe, since the only way a classical computer can solve this is by going through every possible route to find the route that only goes through each city once.

Adleman demonstrated that DNA can be organized inside a test tube so that the DNA can solve multiple problems simultaneously (ie: solving the traveling salesman problem by going through each possible path at the same time. Unlike a classical computer which needs to go through each path one at a time). By combining specific strands of DNA together, we can create an extremely powerful, microscopic computer: a biocomputer!

It sounds crazy, I know, but check out Adleman’s paper if you don’t believe me.

Keep in mind that you will never watch YouTube or read articles on a biocomputer. Biocomputers will be used for research purposes, to help solve big problems like simulating molecules or creating drugs, mainly in the healthcare sector. While this technology is still young, it has been proven that biocomputers actually exist and can function.

Benefits of Biocomputers

→ No limit to the power of biocomputers (theoretically, the more molecules you add to the DNA, the more powerful it gets.)

→ It can solve problems simultaneously (a biocomputer can solve millions of problems at the same time)

→ DNA molecules are cheap, readily available, small, light, and easy to store.

→ Has the potential and ability to solve problems that no classical computer can.

Challenges of Biocomputers

→ They are achingly slow right now.

→ Creating logic gates out of DNA is still challenging.

→ At this beginning stage, biocomputers are full of errors and incorrect data.

→ More money needs to be put into research for biocomputing and nano-biotechnology in order to advance the technology and make real change.

→ At this stage, a lot of human intervention is needed. Hopefully, once this technology improves, biocomputers will be able to run themselves.

As you can see, biocomputers have a lot of potential, but so many barriers to overcome.

Many companies have recognized the power that DNA storage and biocomputers have, and have started implementing this technology.

Companies That Are Paving the Road

Asimov ~ Asimov is using DNA storage gene-editing to try to tackle malnutrition and molecule malfunctions.

Twist ~ Twist is using the world’s oldest data storage technology (DNA) to pave the way to data storage of the future.

BigMind ~ I think their slogan says it all: “Cloud storage just got smarter”. BigMind uses a combination of AI and DNA storage to keep customers’ data and information safe and back up the overloaded cloud.

PureStorage ~ PureStorage is a new startup trying to “simplify how customers use and interact with data” by harnessing the power of DNA storage and VR.

These are just a few companies of many, that are starting to harness the power of DNA.

TL;DR

  • The new way to store data is by using DNA.
  • Each letter of DNA (A T G C) is paired together in groups of 3 to create a codon, and each of the 64 codons matches up to a letter in the alphabet, making a DNA alphabet.
  • There are 6 steps in DNA storage: encoding, synthesis, storing, retrieval, sequencing, and decoding.
  • Biocomputing uses the power of DNA to solve difficult problems that no classical computer ever could.

To everyone who’s read this far, congrats on learning more about DNA, DNA storage, bits, and bio-computers!! 🎉

If you’re looking for more resources to dive deeper into this area, check these out:

Enjoy!🔥

Funny gif of synthetic biology | Source

My name is Rachel, I am a grade 10 student super who loves being outside, is super interested in science, technology, and making a big change in the world. If you have any suggestions for me, please don’t hesitate to reach out. I’d love to connect with you! You can email me at: runnerrachel.lee@gmail.com, or message me on LinkedIn. Thanks so much for reading!

Follow Bioeconomy.XYZ, in order to learn more about all the ways biotech, is shaping the world around us.

--

--

Rachel Lee
Bioeconomy.XYZ

Building the skills to one day build solutions to some of the biggest problems in the world | rachellee.net