FROM PUNCH CARD TO DNA DATA STORAGE

JAY KISHAN PANJIYAR
The Zerone
Published in
4 min readDec 8, 2018

As technology has evolved, computers have allowed for increasingly capacious and efficient data storage, which in turn has allowed increasingly sophisticated ways to use it. Data storage devices have evolved drastically from being large trunks with the capacity to hold few kilobytes of data, to microchips able to hold a few gigabytes of data.

Since the invention of computers and various computing devices, data storage capacity has always been a major concern. As the computers have gotten more advanced, data size has increased thus creating an everlasting demand for increased storage capacity.

The evolution of storage technology in chronological order is illustrated:

Storage technology in chronological order

Everyone is familiar with most of the above-mentioned storage technologies except DNA storage (in my opinion). Therefore let us see DNA storage in brief:

What is DNA storage?

DNA digital data storage is defined as the process of encoding and decoding binary data to and from synthesized DNA strands. DNA molecules are genetic blueprints for living cells and organisms. DNA has an information-storage density several orders of magnitude higher than any other known storage technology.

In five years, humans produced 4.4 zettabytes of data; that’s set to explode to 160 zettabytes (each year!) by 2025. Current infrastructure can handle only a fraction of the coming data deluge, which is expected to consume all the world’s microchip-grade silicon by 2040. Previous research has shown that just a few grams of DNA can store an exabyte of data and keep it intact for up to 2,000 years.

Imagine formatting every movie ever made into DNA; it would be smaller than the size of a sugar cube. And it would last for 10,000 years.

Evolution of DNA storage

On August 16, 2012, the journal Science published a research paper by George Church and colleagues at Harvard University, in which DNA was encoded with digital information that included an HTML draft of a 53,400-word book written by the lead researcher, eleven JPEG images and one JavaScript program. Multiple copies for redundancy were added and 5.5 petabits were stored in each cubic millimeter of DNA. This research result showed that besides its other functions, DNA can also be another type of storage media such as hard drives and magnetic tapes.

An improved system was reported in the journal Nature in January 2013, in an article led by researchers from the European Bioinformatics Institute (EBI) and submitted at around the same time as the paper of Church and colleagues. Over five million bits of data were stored, retrieved, and reproduced. All the DNA files reproduced the information between 99.99% and 100% accuracy.

The long-term stability of data encoded in DNA was reported in February 2015, in an article by researchers from ETH Zurich. The team added redundancy via Reed–Solomon error correction coding and by encapsulating the DNA within silica glass spheres via Sol-gel chemistry.

In 2016 research by Church and Technicolor Research and Innovation were published in which, 22 MB of an MPEG compressed movie sequence was stored and recovered from DNA.

In March 2017, Yaniv Erlich and Dina Zielinski of Columbia University and the New York Genome Center published a method known as DNA Fountain that stored data at a density of 215 petabytes per gram of DNA. The technique approaches the Shannon capacity of DNA storage, achieving 85% of the theoretical limit.

In March 2018, University of Washington and Microsoft published results demonstrating storage and retrieval of approximately 200 MB of data. The research also proposed and evaluated a method for random access of data items stored in DNA.

Storage mechanism

To achieve random access on DNA a library of ‘primers’ are attached to each DNA sequence. The primers, together with a polymerase chain reaction (PCR), are used as targets to select desired snippets of DNA through random access. Before synthesizing the DNA containing data from a file, the researchers appended both ends of each DNA sequence with PCR primer targets from the primer library. Then, these primers are later used to select the desired strands through random access and used a new algorithm designed to more efficiently decode and restore the data to its original, digital state.

The researchers also developed an algorithm for decoding and restoring data more efficiently. the researchers encoded to synthetic DNA a record 200 MB of data consisting of 35 files ranging in size from 29 KB to 44 MB. The files contained high-definition video, audio, images, and text. The researchers believe the approach they have used for random access will scale to physically isolated pools of DNA containing several terabytes each.

Storage Cost

The costs per megabyte were estimated at $12,400 to encode data and $220 for retrieval. However, it was noted that the exponential decrease in DNA synthesis and sequencing costs, if it continues into the future, should make the technology cost-effective for long-term data storage within about ten years.

Drawbacks

The drawback is that it’s expensive and extremely slow to write data to DNA, which involves converting 0's and 1's to the DNA molecules adenine, thymine, cytosine, and guanine while getting data back from DNA involves sequencing it and decoding files back to 0's and 1's. Finding and retrieving specific files stored on DNA is also a challenge.

--

--