Although researchers have previously put data on DNA, this is the first time anyone has included error-checking routines that enable reliable retrieval.Steven Hunt / Getty Images
A team of scientists in the U.K. has encoded text, sound files and a photograph onto strings of DNA, and successfully retrieved it without errors.
Nick Goldman and Ewan Birney of the European Bioinformatics Institute at the European Molecular Biology Laboratory used DNA to store a set of Shakespeare's sonnets, a recording of Martin Luther King's "I Have a Dream" speech, a photograph of the laboratory and an early research paper by James Watson and Francis Crick, who discovered DNA. Goldman and Birney then read back the data with 100 percent accuracy.
Although researchers have previously put data on DNA, this is the first time anyone has included error-checking routines that enable reliable retrieval.
Storing data on DNA has many advantages. DNA storage is far more compact than conventional memory. In fact all of the data the Internet transmits in a month could fit into about two pounds of DNA. Data stored on DNA eliminates the problem of re-saving files in new electronic formats every few years. The data is read by sequencing the DNA strand.
Lastly, DNA doesn't degrade nearly as fast as plastic magnetic tapes or even the average solid state hard drive. It's possible to sequence DNA that is tens of thousands of years old. "Keep DNA cold, dry and dark and it lasts a long time," Birney said.
Goldman said he and Birney, who published their work in this week's Nature, got the idea while mulling over alternatives to electronic storage. "We were at the pub," he said at a press conference. "DNA is a very compact way to store information, and we realized we could do this."
Storing digital data by conventional methods doesn't exactly take up a lot of space these days. One can get a pocket-sized hard drive that stores a terabyte of information, equal to hold about 2,000 hours of music. But storing information on DNA means cramming 2,000 times as much data onto a sugar cube-sized device.
To get the information onto the DNA, the scientists first converted the data to the familiar ones and zeros of binary code. A computer program matched those numbers to one of the four building blocks of DNA: adenine, cytosine, guanine and thymine. Each of those chemicals, called a base is marked as an A, C, G, or T.
To help reduce the error rates in the data, the two scientists came up with a strategy to use strings of bases that did not repeat. For example, A could only be followed by a C, G, or T. At the same time each base was dependent on the one preceding it. They also came up with way to encode the information in the DNA in both directions – forwards and backwards, to add redundancy.
They ran all of these instructions plus the binary code through a computer program and came up with a kind of genetic blueprint on paper.
Nick Goldman of the European Molecular Biology Laboratory examines synthesized DNA in a vial.European Molecular Biology Laboratory
Next, they sent the genetic instruction to the biological lab Agilent Technologies in California. Agilent constructed pieced together DNA strands made of the bases, according to Goldman and Birney's instructions. Then, the lab shipped the scientists a tiny vial.
The vial contained a long string of DNA encoded with the sonnets, the speech, the photo and the research paper. To read the information, the scientists used a machine designed to analysis, or sequence, DNA molecules. One of the methods it used was similar to how ordinary computers pieces together a digital data that comes from disparate locations on a hard drive. Basically, it looks for tiny pieces of information at the end of each string, which flags where the piece fits in the larger string.
While this work is proof-of-concept, it doesn't mean that people will be seeing DNA-based backup drives on desktops soon. The biggest obstacle is encoding the information into the DNA itself. Making strands of DNA is expensive and time-consuming, and it's difficult to make longer strings of it. Reading the information, on the other hand, is considerably easier, and sequencing technology has become cheaper over the last decade.
But it is still expensive: Goldman noted that commercial rates for synthesizing the DNA are between $10,000 and $20,000. Sequencing it is still in the thousands as well. Goldman noted that making DNA and sequencing it would have to be one percent the cost that it is now to make it practical in less than 50 years.
George Church, a professor of genetics at Harvard who demonstrated a similar idea in August, told Discovery News that DNA could one day replace ordinary hard drives. "It's a million or a billion times denser and requires much less energy to run," he said. "You could have information storage as paint or wallpaper," he said.
Goldman noted that both his and Church's labs were working on the idea at the same time, without knowledge of each other.