FASTA and FASTQ
From Just Solve the File Format Problem
(Difference between revisions)
Dan Tobias (Talk | contribs) (→Links) |
|||
(2 intermediate revisions by one user not shown) | |||
Line 2: | Line 2: | ||
|subcat=Scientific Data formats | |subcat=Scientific Data formats | ||
|extensions={{ext|fasta}}, {{ext|fas}}, {{ext|fa}}, {{ext|seq}}, {{ext|fsa}}, {{ext|fna}}, {{ext|ffn}}, {{ext|faa}}, {{ext|mpfa}}, {{ext|frn}}, {{ext|fastq}} | |extensions={{ext|fasta}}, {{ext|fas}}, {{ext|fa}}, {{ext|seq}}, {{ext|fsa}}, {{ext|fna}}, {{ext|ffn}}, {{ext|faa}}, {{ext|mpfa}}, {{ext|frn}}, {{ext|fastq}} | ||
+ | |wikidata={{wikidata|Q1593782}}, {{wikidata|Q3063023}} | ||
}} | }} | ||
'''FASTA and FASTQ''' are text-based formats for representing nucleotide ([[DNA]] or [[RNA]]) or peptide sequences, used in biology. The FASTA format is a simple representation of the elements of the sequences using letters (the standard C, G, T, and A for DNA nucleotides, and other letters for special uses, as well as a set of letters for peptides in amino acids), while FASTQ also encodes quality scores for the data. | '''FASTA and FASTQ''' are text-based formats for representing nucleotide ([[DNA]] or [[RNA]]) or peptide sequences, used in biology. The FASTA format is a simple representation of the elements of the sequences using letters (the standard C, G, T, and A for DNA nucleotides, and other letters for special uses, as well as a set of letters for peptides in amino acids), while FASTQ also encodes quality scores for the data. | ||
Line 14: | Line 15: | ||
* .mpfa: FASTA amino acides in multiple proteins | * .mpfa: FASTA amino acides in multiple proteins | ||
* .frn: FASTA non-coding RNA | * .frn: FASTA non-coding RNA | ||
− | * .fastq: FASTQ | + | * .fastq, .fq: FASTQ |
Files may also be distributed in compressed forms, adding second extensions such as .fastq.gz. | Files may also be distributed in compressed forms, adding second extensions such as .fastq.gz. | ||
+ | |||
+ | == Samples == | ||
+ | * [https://www.ncbi.nlm.nih.gov/datasets/taxonomy/9606/ Human Genome in FASTA and other formats] | ||
+ | * [https://github.com/hartwigmedical/testdata FASTQ File Samples] | ||
== Links == | == Links == |
Latest revision as of 00:17, 11 June 2024
FASTA and FASTQ are text-based formats for representing nucleotide (DNA or RNA) or peptide sequences, used in biology. The FASTA format is a simple representation of the elements of the sequences using letters (the standard C, G, T, and A for DNA nucleotides, and other letters for special uses, as well as a set of letters for peptides in amino acids), while FASTQ also encodes quality scores for the data.
[edit] File extensions
A number of extensions are used, and they are not always completely standardized.
- .fasta, .fas, .fa, .seq, .fsa: Generic FASTA
- .fna: FASTA nucleic acids
- .ffn: FASTA nucleotide coding regions for a genome
- .faa FASTA amino acids
- .mpfa: FASTA amino acides in multiple proteins
- .frn: FASTA non-coding RNA
- .fastq, .fq: FASTQ
Files may also be distributed in compressed forms, adding second extensions such as .fastq.gz.
[edit] Samples
[edit] Links
Categories:
- File Formats
- Electronic File Formats
- Scientific Data formats
- File formats with extension .fasta
- File formats with extension .fas
- File formats with extension .fa
- File formats with extension .seq
- File formats with extension .fsa
- File formats with extension .fna
- File formats with extension .ffn
- File formats with extension .faa
- File formats with extension .mpfa
- File formats with extension .frn
- File formats with extension .fastq