Harvard cracks DNA storage, crams 700 terabytes of data into a single gram

The work, carried out by George Church and Sri Kosuri, basically treats DNA as just another digital storage device. Instead of binary data being encoded as magnetic regions on a hard drive platter, strands of DNA that store 96 bits are synthesized, with each of the bases (TGAC) representing a binary value (T and G = 1, A and C = 0).

To read the data stored in DNA, you simply sequence it — just as if you were sequencing the human genome — and convert each of the TGAC bases back into binary. To aid with sequencing, each strand of DNA has a 19-bit address block at the start (the red bits in the image below) — so a whole vat of DNA can be sequenced out of order, and then sorted into usable data using the addresses.

Scientists have been eyeing up DNA as a potential storage medium for a long time, for three very good reasons: It’s incredibly dense (you can store one bit per base, and a base is only a few atoms large); it’s volumetric (beaker) rather than planar (hard disk); and it’s incredibly stable — where other bleeding-edge storage mediums need to be kept in sub-zero vacuums, DNA can survive for hundreds of thousands of years in a box in your garage.

It is only with recent advances in microfluidics and labs-on-a-chip that synthesizing and sequencing DNA has become an everyday task, though. While it took years for the original Human Genome Project to analyze a single human genome (some 3 billion DNA base pairs), modern lab equipment with microfluidic chips can do it in hours. Now this isn’t to say that Church and Kosuri’s DNA storage is fast — but it’s fast enough for very-long-term archival