DNA

DNA (Deoxyribonucleic acid) is the primary biological method of storing genetic material in organisms. With the exception of some viruses which only have RNA, all known life forms have DNA (normally found in cell nuclei) containing their genes.

The structure of DNA was discovered by James D. Watson and Francis Crick in 1953, and the human genome was first completely sequenced in 2007. DNA is now used for a wide variety of applications, ranging from genetic engineering to forensic science in solving crimes by tracing DNA samples in evidence to suspects. It can even be used to determine which dog pooped on the lawn.

DNA consists of a double-helix structure with two strands, each containing a sequence of four basic units: guanine, adenine, thymine, and cytosine, which in standard DNA sequencing are written as G, A, T, and C respectively. Each of these is able to bond chemically with only one other base in the opposite strand: A bonds with T, and C bonds with G. This means that when the two strands are separated and each is allowed to "grow" a new complementary strand by immersion in a pool of loose bases until the fitting ones "stick", you end up with two identical DNA double-helices. This is how cells copy their DNA when they reproduce. On the rare occasions that an incorrect base manages to stick, a transcription error known as a mutation occurs, and this is how organisms evolve.

The genetic code consists of sequences of DNA codons which correspond to particular proteins, or specialized commands to inform the developing organism of how it is supposed to interpret the genes. In a sense, the genetic code is a biological data storage mechanism and programming language. Biological processes involving the replication and interpretation of DNA also make use of RNA, a similar substance to DNA but with some differences.

As of 2013, researchers have managed to store and retrieve information encoded in synthetic DNA (including Shakespeare's sonnets and graphic and audio files), and this may actually become a viable means of data archiving in the future. The encoding scheme uses trinary digits (trits), unlike the normal binary-based computer storage systems, as numbers of this base can easily be encoded using the four DNA bases in a system where the same base does not appear twice in a row. A form of Huffman coding is used to code a sequence of bytes (which could in turn be part of any electronic file format) as a series of trinary numbers.

This suggests the possibility that alien visitors or lost ancient civilizations may have already implanted messages in the DNA of living creatures, just waiting for scientists to sequence the DNA of the right species to discover them... if they haven't mutated too much in the meantime. The DNA of living creatures is known to have many sequences of apparently-useless genetic material, presumably the debris from past evolution, but could there be artificial data hidden there?

Info on DNA as data storage

 * Abstract of paper in Nature about experimental DNA data storage
 * Data encoding spec for above experiment
 * Boing Boing article and discussion
 * Comic making fun of DNA data storage
 * MIT can now use E. coli DNA tape recorders for living and replicating data storage
 * Yet Another DNA Storage Technique

Legal issues

 * Banning of 'file-sharing' of seeds
 * It's time to free our genes (gene patent controversy)
 * Cartoon video illustrating gene-patent issue
 * Public Domain Human Genome Project Generated More Research And More Commercial Activity Than Proprietary Competitor
 * U.S. Supreme Court ruling (2013) that naturally-occuring genes are not patentable

Other links and references

 * DNA (Wikipedia)
 * Base pair (Wikipedia)
 * Genetic code (Wikipedia)
 * Single-nucleotide polymorphism (Wikipedia)
 * DNA photographed for the first time
 * Four-stranded DNA discovered
 * Google wants to index your DNA too
 * Art Emerges from DNA Left Behind
 * Should we bring back extinct species?
 * The technology that links taxonomy and Star Trek
 * Ancient horse is oldest creature to reveal DNA sequence
 * DNA nanostructures as "tiny Lego bricks"
 * Find and replace across an entire genome
 * SNPedia: wiki investigating human genetics
 * Scientists discover double meaning in genetic code
 * Open source genomics
 * Seeing X chromosomes in a new light
 * DNA bread
 * Exogen Bio - How damaged is your DNA?
 * Richard Dawkins on how information in DNA can increase by evolution
 * The Lyon hypothesis, nicely illustrated
 * New Letters Added to the Genetic Alphabet
 * Easy DNA Editing Will Remake the World. Buckle Up.
 * Chromosomal DNA Replication: The DNA Replication Fork (video)