From Just Solve the File Format Problem
Jump to: navigation, search
File Format
Name DNA
Released ~3.7 billion BC


DNA (Deoxyribonucleic acid) is the primary biological method of storing genetic material in organisms. With the exception of some viruses which only have RNA, all known life forms have DNA (normally found in cell nuclei) containing their genes.

The structure of DNA was discovered by James D. Watson and Francis Crick in 1953, and the human genome was first completely sequenced in 2007. DNA is now used for a wide variety of applications, ranging from genetic engineering to forensic science in solving crimes by tracing DNA samples in evidence to suspects. It can even be used to determine which dog pooped on the lawn.

DNA consists of a double-helix structure with two strands, each containing a sequence of four basic units: guanine, adenine, thymine, and cytosine, which in standard DNA sequencing are written as G, A, T, and C respectively. Each of these is able to bond chemically with only one other base in the opposite strand: A bonds with T, and C bonds with G. This means that when the two strands are separated and each is allowed to "grow" a new complementary strand by immersion in a pool of loose bases until the fitting ones "stick", you end up with two identical DNA double-helices. This is how cells copy their DNA when they reproduce. On the rare occasions that an incorrect base manages to stick, a transcription error known as a mutation occurs, and this is how organisms evolve.

The genetic code consists of sequences of DNA codons which correspond to particular proteins, or specialized commands to inform the developing organism of how it is supposed to interpret the genes. In a sense, the genetic code is a biological data storage mechanism and programming language. Biological processes involving the replication and interpretation of DNA also make use of RNA, a similar substance to DNA but with some differences.

As of 2013, researchers have managed to store and retrieve information encoded in synthetic DNA (including Shakespeare's sonnets and graphic and audio files), and this may actually become a viable means of data archiving in the future. The encoding scheme uses trinary digits (trits), unlike the normal binary-based computer storage systems, as numbers of this base can easily be encoded using the four DNA bases in a system where the same base does not appear twice in a row. A form of Huffman coding is used to code a sequence of bytes (which could in turn be part of any electronic file format) as a series of trinary numbers.

This suggests the possibility that alien visitors or lost ancient civilizations may have already implanted messages in the DNA of living creatures, just waiting for scientists to sequence the DNA of the right species to discover them... if they haven't mutated too much in the meantime. The DNA of living creatures is known to have many sequences of apparently-useless genetic material, presumably the debris from past evolution, but could there be artificial data hidden there?


See also

Info on DNA as data storage

Legal issues

Other links and references

Personal tools