A team of scientists in the U.K. has encoded text, sound files and a photograph onto strings of DNA, and successfully retrieved it without errors.
Nick Goldman and Ewan Birney of the European Bioinformatics Institute at the European Molecular Biology Laboratory used DNA to store a set of Shakespeare's sonnets, a recording of Martin Luther King's "I Have a Dream" speech, a photograph of the laboratory and an early research paper by James Watson and Francis Crick, who discovered DNA. Goldman and Birney then read back the data with 100 percent accuracy.
Although researchers have previously put data on DNA, this is the first time anyone has included error-checking routines that enable reliable retrieval.
Storing data on DNA has many advantages. DNA storage is far more compact than conventional memory. In fact all of the data the Internet transmits in a month could fit into about two pounds of DNA. Data stored on DNA eliminates the problem of re-saving files in new electronic formats every few years. The data is read by sequencing the DNA strand.
Lastly, DNA doesn't degrade nearly as fast as plastic magnetic tapes or even the average solid state hard drive. It's possible to sequence DNA that is tens of thousands of years old. "Keep DNA cold, dry and dark and it lasts a long time," Birney said.
Goldman said he and Birney, who published their work in this week's Nature, got the idea while mulling over alternatives to electronic storage. "We were at the pub," he said at a press conference. "DNA is a very compact way to store information, and we realized we could do this."
Storing digital data by conventional methods doesn't exactly take up a lot of space these days. One can get a pocket-sized hard drive that stores a terabyte of information, equal to hold about 2,000 hours of music. But storing information on DNA means cramming 2,000 times as much data onto a sugar cube-sized device.
To get the information onto the DNA, the scientists first converted the data to the familiar ones and zeros of binary code. A computer program matched those numbers to one of the four building blocks of DNA: adenine, cytosine, guanine and thymine. Each of those chemicals, called a base is marked as an A, C, G, or T.
To help reduce the error rates in the data, the two scientists came up with a strategy to use strings of bases that did not repeat. For example, A could only be followed by a C, G, or T. At the same time each base was dependent on the one preceding it. They also came up with way to encode the information in the DNA in both directions – forwards and backwards, to add redundancy.
They ran all of these instructions plus the binary code through a computer program and came up with a kind of genetic blueprint on paper.