Note: A milliliter of liquid can contain up to 1 billion bacteria, and you can see that the potential capacity of bacteria-based memory is enormous. The idea of storing data inside bacteria has been around for about a decade. Even very simple bacteria have long strands of DNA with tons of bases available for data encryption, and bacteria are by their nature far more resilient to damage than more traditional electronic storage. Bacteria are nature's hardiest survivors, capable of surviving just about any disaster that would finish off a regular hard drive. Besides, bacteria's natural reproduction would create lots of redundant copies of the data, which would help preserve the integrity of the information and make retrieval easier.
Preparing traditional data for storage inside bacteria is simple enough. There are four DNA bases that can be used to make up the DNA strings: adenine, cytosine, guanine, and thymine. That basically means we're working with a four number system, also known as quaternary numbers.
In a presentation on their breakthrough, the Hong Kong researchers showed how to change the word "iGEM" into DNA-ready code. They used the ASCII table to convert each of the individual letters into a numerical value (i=105, G=71, etc.), which can then be changed from base-10 to base-4 (105=1221, 71=0113, etc.). Finally, those numbers can be changed into their DNA base equivalents, with 0, 1, 2, and 3 replaced with A, T, C, and G. And so iGEM becomes ATCTATTGATTTATGT.
DNA strands aren't long enough to store complicated information like a photograph or a book, so the best available solution is to fragment the data into lots of little pieces and spread it among the different cells. To make that work, the researchers have to create a system that allows the fragments to identified and ultimately put back in the right order. So they created a three-part structure for all the DNA: header, message, and checksum.
The header is an 8-base-long sequence that is divided into four levels of identifying information - zone, region, area and district - which allows each fragment to be put back in the right order. After the message carries the actual usable data, the checksum provides a repetition of the original header, which is useful in controlling for minor mutations to the bacteria.
So, let's say the information has been encrypted and placed in lots of different cells of bacteria. How then does someone retrieve the data on the other end? The decrypter would take the DNA and run it through what's known as next-generation high-throughput sequencing, or NGS. This particular type of sequencing analyzes and compares multiple copies of the same sequence and then uses majority-voting to figure out which bases are correct if parts of the data have decayed. Then the compression algorithms could be reversed to restore the raw data to its original form.
The last step would be snapping the fragments back together in the correct order so that the DNA strands could be translated back into useful data. This is where we go from just data storage to data encryption. The person trying to read the data needs a formula that will reveal the right order of the headers and checksums - without that formula, the data remains meaningless.
Now, there does seem like one potential concern with using E. coli to store data: isn't E. coli dangerous? It appears there's not too much to worry about there - the researchers used non-virulent strains of the bacteria, and the bacteria can't do much more than store the data and reproduce. The DNA sequences that represent the data are total gibberish when it comes to encoding potentially dangerous proteins.
(Content extracted from Web)