How DNA can be used to store the entire world’s data, in just a shoebox!
I know this sounds crazy, but this is actually possible, and it's being done right now!
What’s wrong with the way we’re storing data now?
As you can see below, we are very, and I mean very, dependent on the internet, especially now with all that’s happening remotely, such as attending conferences from the other side of the world, or even class from the comfort of your bed (great for some, not so much for the others). But with about 59% of the global population on the internet, and over an additional 1.5 billion people projected to be using the internet by 2030, there is one thing for sure: there is going to be a lot more data.
Consequently, this data is going to have to be stored somewhere, and as it stands, we currently face a data-storage problem as we won’t be able to meet the requirements of storing that much data in the future. Reasons include high consumption of physical space, energy, money to maintain, and the fact that these storage devices will need to be replaced every few decades or so.
Okay, so what’s the solution?
Scientists are currently looking at alternative mediums of digital storage to address the problem, and there is one, in particular, that stands out from the rest. I’m sure you’ve seen the meme: ‘Modern problems require modern solutions’ — But what if I told you that this ‘modern’ problem, actually has a 4.5 billion-year-old solution. Well, what is this solution? DNA.
And I know what you're probably thinking “How’s that even possible?”, but don’t worry, we’ll get to that later.
But why DNA?
Like Dina Zielinski said in her TEDx talk (which actually inspired me to write this article):
“DNA is nature’s oldest storing device, after all, it contains all the information necessary to build and maintain a human being”
An example she provides the audience with is our own genome. Our genome is the complete set of genetic instructions that contains all the information needed to build and develop an organism.
Our DNA is made up of 4 nucleotide bases: Adenine (A), Thymine (T), Guanine (G), Cytosine (C). A always pairs with T, and G always pairs with C. The Human Genome Project was an international scientific research project with the goal of determining the order of the base pairs that make up human DNA. (aka sequencing, is the term used for reading DNA/determining its order)
After 13 long years of hard work, scientists were finally able to determine the order of base pairs that programmed humans. If we took all these 3 billion base pairs and printed them out on A4 pages with a standard font and formatting, and stacked them all up, it would be as high as approximately 130m! And the fascinating thing is, all of that, is stored just within a portion of a single human body cell - and we’ve got over 37.2 trillion cells in our bodies. This really goes to show how much data can be stored just within a single cell- and how densely packed data is stored in DNA.
By taking advantage of this, theoretically, we’d be able to store the entire world’s data that is on the internet, just within a shoebox full of DNA!
Additionally, traditional storage devices can only last up to a few decades at best, before degrading and becoming unreliable, meaning these storage devices will have to be regularly checked and replaced by newer ones. Whereas on the other hand, DNA has a half-life of almost 500 years! Meaning that it’ll take approximately 500 years, just for half of the data to wear out. But given the right conditions, DNA can last for hundreds of thousands of years!
But how do we know that? Well, it’s because we were able to recover and sequence DNA from mammoths (20,000 years old), neanderthals (40,000 years old), bison from 60,000 years ago. But the current record is held by ancient horses, we were able to recover and read their DNA from over 700,000 years ago!
Furthermore, DNA doesn’t require any electricity to store information. That’s a bonus point because now not only are we reducing the physical space needed and maintenance costs, but we’re also saving energy! Making DNA the perfect storage device for our data.
So how does it work?
So we know that DNA consists of four nucleotide bases: A, T, C, G. And you also probably know that all that goes on in a computer, comes down to just binary: a bunch of 1s and 0s — Computers use binary to store and process data.
So all you have to do is simply translate the binary, into A, T, C, G. For example, each of the bases could be, A = 00, C= 01, G= 10, T= 11.
So if you had 1100 0101 0010 0111, that would become GA TT AC TG. Once the binary representation of your data is translated to the nucleotide bases, it is then sent to the lab to print out the bases and stitch it all together to make synthetic DNA. This process is called DNA synthesis (writing DNA).
Once you have your synthetic DNA storing your data, you’ll need to sequence that DNA and decode it to get back the original binary values, which can then be processed by a computer.
The Future of DNA Storage
While using DNA as a medium of data storage has great advantages, it’s important to acknowledge the downsides that come with it. The most significant one being the fact that, currently, DNA sequencing and synthesizing is extremely slow — it can take a few hours or even days just to sequence DNA. Thus, DNA storage may not be the best option, as of now, where information will be needed quickly. Rather, it can work best for archival purposes.
Another disadvantage is the cost. Although maintenance and storing DNA is relatively much cheaper than the hard drives that are being used today, sequencing and synthesizing DNA can be very expensive. According to CNET: In total, Dina Zielinski’s team were able to put 2MB of data into DNA, but that entire process cost them about $7,000.
But look at it this way, yes, it’s extremely expensive for a relatively small amount of data, but like all technology, as we make more technological advancements and discoveries, this process will only get cheaper and faster.
We’ll probably have to wait for a few more years to see DNA storage becoming more feasible and mainstream, but what we know for sure is that DNA has a lot of potential, probably more than what we’ve anticipated, and that biological computing, can very well be where the future of computing is headed.
Key Takeaways
- Problem: We are generating a lot of data and are running of storage
- Possible Solution: DNA
- Why DNA?: Dense, cheap to store, and durable
- How will it work?: Convert binary to nucleotide bases. Synthesize to make synthetic DNA containing information. Sequence to read that DNA.
- Cons of DNA: Synthesizing and sequencing DNA is expensive and time-consuming.
- Future: DNA has a lot of potential and will become much cheaper in the future.