FASTA and FASTQ

FASTA and FASTQ are text-based formats for representing nucleotide (DNA or RNA) or peptide sequences, used in biology. The FASTA format is a simple representation of the elements of the sequences using letters (the standard C, G, T, and A for DNA nucleotides, and other letters for special uses, as well as a set of letters for peptides in amino acids), while FASTQ also encodes quality scores for the data.

File extensions
A number of extensions are used, and they are not always completely standardized.


 * .fasta, .fas, .fa, .seq, .fsa: Generic FASTA
 * .fna: FASTA nucleic acids
 * .ffn: FASTA nucleotide coding regions for a genome
 * .faa 	FASTA amino acids
 * .mpfa: FASTA amino acides in multiple proteins
 * .frn: FASTA non-coding RNA
 * .fastq: FASTQ

Files may also be distributed in compressed forms, adding second extensions such as .fastq.gz.

Links

 * Wikipedia: FASTA
 * Wikipedia: FASTQ
 * What is FASTA format?
 * Single letter codes for nucleotides
 * Processing Illumina data: FASTQ format
 * Format FASTQ sequences and barcode data
 * FASTA format converter
 * Bio::SeqIO (Perl) (more info) (connected with Bio::Seq)
 * SeqIO (Python) (connected with Seq) (more info)