Scientific Data formats
From Just Solve the File Format Problem
				
								
				(Difference between revisions)
				
																
				
				
								
				Dan Tobias  (Talk | contribs)  (→Mathematical)  | 
			Dan Tobias  (Talk | contribs)   (→Physics)  | 
			||
| Line 223: | Line 223: | ||
== Physics ==  | == Physics ==  | ||
| + | |||
| + | See subcategory [[Physics data]]  | ||
* [[CGNS]] (Computational Fluid Dynamics General Notation System)  | * [[CGNS]] (Computational Fluid Dynamics General Notation System)  | ||
Revision as of 22:45, 27 October 2014
See also Health and Medicine for medical/biomedical data formats.
Contents | 
General
- CDF (Common Data Format)
 - EAS3 (binary file format for structured data)
 - HDF (Hierarchical Data Format, originally from NCSA, now maintained by The HDF Group)
 - NRRD (Nearly Raw Raster Data -- a simple format for n-dimensional raster data)
 - NetCDF (Network Common Data Format)
 - SDXF (Structured Data Exchange Format)
 - Silo (a storage format for visualization developed at Lawrence Livermore National Laboratory)
 - Simple Data format (SDF) By George H. Fisher, Space Sciences Lab, UC Berkeley (A platform-independent, precision-preserving binary data I/O format capable of handling large, multi-dimensional arrays)
 - Standard Delay Format (SDF) A standard data structure for timing data
 - XDF (eXtensible Data Format)
 - XSIL (Extensible Scientific Interchange Language)
 
Astronomical and Space
-  Flexible Image Transport System (FITS)
- PSRFITS (Pulsar data storage standard)
 
 - ICER
 - PDS (Planetary Data System)
 - PDS4
 - VOTable (IVOA standard table format)
 - SDF (Starlink Data Format) and NDF (Starlink's Extensible N-Dimensional Data Format).
 - VICAR
 
Biological
- 23andMe
 - AB1 (Chromatogram files used by DNA sequencing instruments from Applied Biosystems)
 - ABCD (Access to Biological Collection Data)
 - ABCDDNA (Access to Biological Collection Data DNA extension)
 - ABCDEFG (Access to Biological Collection Data Extension For Geosciences)
 - ACE (Sequence assembly format)
 - Affymetrix Raw Intensity Format
 - ARLEQUIN Project Format
 - Axt Alignment Format
 - BAM (Binary compressed SAM format)
 - BED (Browser extensible display format describing genes and other features of DNA sequences)
 - BEDgraph
 - Big Browser Extensible Data Format
 - Big Wiggle Format
 - Binary Alignement Map Format
 - Binary Probe Map Format
 - Binary sequence information Format
 - Biological Pathway eXchange
 - BLAT alignment Format
 - BRIX generated O Format
 - CAF (Common Assembly Format for sequence assembly)
 - CellML
 - CHADO XML interchange Format
 - Chain Format for pairwise alignment
 - CHARMM Card File Format
 - CLUSTAL-W Alignment Format
 - CLUSTAL-W Dendrogram Guide File Format
 - Clustered Data Table Format
 - Complete Genomics
 - DELTA (DEscription Language for TAxonomy)
 - DAS (Distributed Sequence Annotation System)
 - DBN (Dot Bracket Notation (DBN) - Vienna Format)
 - EMBL (Flatfile format used by the EMBL for nucleotide and peptide sequences)
 - EML (Environmental Markup Language) not to be confused with EML (Ecological Metadata Language)
 - ENCODE (Peak information Format)
 - FASTA and FASTQ (File format for sequence data, FASTQ with quality)
 - FuGEFlow
 - FuGE-ML (Functional Genomics Experiment Markup Language)
 - Gating-ML
 - GCDML (Genomic Contextual Data Markup Language)
 - GelML Gel electrophoresis Markup Language
 - GenBank (Flatfile format used by NCBI for nucleotide and peptide sequences)
 - Gene Feature File (Versions 1 and 3)
 - GFF (General feature format for describing genes and other features of DNA, RNA and protein sequences)
 - Gene Prediction File Format
 - GenePattern GeneSet Table Format
 - Genome Annotation File (version 1 and 2)
 - GTF (Gene transfer format holds information about gene structure)
 - HMMER
 - ICB (ICM binary file Format)
 - imzML (imaging mz Markup Language)
 - ISA-Tab (Investigation Study Assay Tabular)
 - ISND sequence record XML
 - KGML (KEGG Mark-up Language)
 - MAGE-Tab (MicroArray Gene Expression Tabular)
 - MCL (Microbiological Common Language)
 - MIARE-TAB (Minimum Information About a RNAi Experiment Tabular)
 - microarray track data Browser Extensible Data Format
 - MINiML (MIAME Notation in Markup Language)
 - mini Protein Data Bank Format
 - MIQAS-TAB (Minimal Information for QTLs and Association Studies Tabular)
 - MITAB
 - mmCIF (macromolecular Crystallographic Information File)
 - Multiple Alignment Forma
 - mzData (deprecated)
 - mzIdentML
 - mzML
 - mzQuantML
 - mzXML (deprecated)
 - NCD (Natural Collections Descriptions)
 - NDTF (Neurophysiology Data Translation Format)
 - net alignment annotation Format
 - NeuroML (Neuroscience eXtensible Markup Language)
 - New Hampshire eXtended Format
 - Newick tree Format
 - NEXUS (Encodes mixed information about genetic sequence data in a block structured format)
 - Nimblegen Design File Format
 - Nimblegen Gene Data Format
 - NMR-STAR (NMR Self-defining Text Archive and Retrieval format)
 - nucleotide inFormation binary Format
 - ODM (Operational Data Model)
 - Open Biomedical Ontology Flat File Format
 - Personal Genome SNP Format
 - PHD (Output from the basecalling software Phred)
 - phyloXML (XML for evolutionary biology and comparative genomics)
 - Pre-Clustering File Format
 - Protein Data Bank (PDB; Structures of biomolecules deposited in Protein Data Bank)
 - Protein InFormation Resource Format
 - PRM (Protocol Representation Model (Medical Research))
 - PSI-MI XML
 - PSI-PAR
 - RDML (Real-time PCR Data Markup Language)
 - SAM (Sequence Alignment/Map format)
 - SCF (Staden chromatogram files used to store data from DNA sequencing)
 - SBML (Systems Biology Markup Language used to store biochemical network computational models)
 - SDD (Structured Descriptive Data)
 - SED-ML (Simulation Experiment Description Markup Language)
 - Sequence Alignment Map Format
 - SOFT (Simple Omnibus Format in Text)
 - spML (Separation Markup Language)
 - SRA-XML (Short Read Archive eXtensible Markup Language)
 - Standard Flowgram Format
 - Stockholm Multiple Alignment Format (Representing multiple sequence alignments)
 - SBML (System Biology Markup Language)
 - SBGN (Systems Biology Graphical Notation)
 - SBRML (Systems Biology Results Markup Language)
 - Swiss-Prot (Flatfile format used for protein sequences from the Swiss-Prot database)
 - TAIR annotation data Format
 - TAPIR (TDWG Access Protocol for Information Retrieval)
 - TCS (Taxonomic Concept transfer Schema)
 - TraML (Transition Markup Language)
 - UniProtKB XML Format
 - VCF (Variant Call Format)
 - Wiggle Format
 
Chemical
- CCP4 (X-ray crystallography voxels (electron density))
 - CDX (ChemDraw file format)
 - CDXML (ChemDraw file format)
 - CHM (ChemDraw file format)
 - CIF (Crystallographic Information File, standardised by IUCr)
 - CML (Chemical markup language)
 - CTab (Chemical table file .mol, .sd, .sdf)
 - HITRAN (spectroscopic data with one optical/infrared transition per line in the ASCII file (.hit))
 - JCAMP (Joint Committee on Atomic and Molecular Physical Data, .dx, .jdx)
 - MOL (MDL Molfile)
 - MOP (MOPAC format)
 - MRC (voxels in cryo-electron microscopy)
 - MST ACD/ChemSketch v1 file format
 - Protein Data Bank (PDB)
 - RPT ACD/ChemSketch v1 file format
 - RXN (Reaction file format)
 - SK2 (ACD/ChemSketch v2 file format)
 - SKC (ISIS/Draw file format)
 - SMILES (Simplified molecular input line entry specification, .smi)
 - SPC (spectroscopic data)
 - Structure Data File (SDF)
 - TGF (ISIS/Draw reaction file format)
 
Chemical data may be distinguished in various ways, including Chemical MIME types.
Ecological
- Darwin Core (Standard for sharing information about biological diversity)
 - EML (Ecological Metadata Language), not to be confused with EML (Environmental Markup Language)
 
Geographic and Geospatial
See also Geospatial
- DEM (Digital Elevation Model)
 - DOQ (Digital Orthophotos)
 - e00 (ESRI ArcInfo Interchange File)
 - FGDC (Content Standard for Digital Geospatial Metadata??)
 - GeoTIFF (Geospatial extensions to TIFF)
 - GML (Geography Markup Language)
 - HDFEOS, HD2, HD4 (Hierarchical Data Format-Earth Observing System)
 - KML (KML (formerly Keyhole Markup Language), Version 2.2)
 - NDF (National Landsat Archive Production System (NLAPS) Data Format)
 - SAIF (Spatial Archive and Interchange Format, Canadian)
 - SDTS (Spatial Data Transfer Standard)
 - shp and shx (ESRI Shapefile must have components; other optional components as well, see entry)
 - MrSID (MrSID- Multi-resolution Seamless Image Database)
 - TAB (MapInfo dataset format, must have component)
 
Mathematical
- DOT (graph description language)
 - graph6, sparse6 (ASCII encoding of Adjacency matrices (.g6, .s6))
 - graphML (Graph Markup Language)
 -  Mathematica
- Mathematica notebook (.nb, .nbp)
 - Mathematica package file (M)
 
 - MathML
 -  MATLAB
- MAT (MATLAB data format)
 - Matlab figure
 - MATLAB script file (m)
 
 - OPJ (Origin data format)
 - Statistica
 - WP2 WinPlot
 
Oceanographic, Atmospheric and Meteorological
- GRIB (Gridded Binary)
 - BUFR (Binary Universal Format Representation)
 - IOAPI (netCDF augmented with metadata from the I/O API)
 - PP (UK Met Office format for weather model data)
 
Physics
See subcategory Physics data
- CGNS (Computational Fluid Dynamics General Notation System)
 - NeXuS (Common data format for neutron, x-ray and muon science)
 - QCDml (Lattice QCD gauge configuration markup language)
 
Scientific Signal data
- ACQ (AcqKnowledge File Format for Windows)
 - BioSemi (BDF) data format
 - BKR (EEG data format)
 - CFWB (Chart Data File Format)
 - EDF (European data format)
 - FEF (File Exchange Format for Vital signs)
 - GDF (General data formats for biomedical signals)
 - GMS (Gesture And Motion Signal format)
 - IROCK (intelliRock Sensor Data File Format)
 - MFER (Medical waveform Format Encoding Rules)
 - REC (ATI Vision recorder file)
 - SCP-ECG (Standard Communication Protocol for Computer assisted electrocardiography)
 - SEG Y (Reflection seismology data format)
 - SIGIF (SIGnal Interchange Format)
 
Social Sciences
- Atlas.ti (Computer-assisted qualitative data analysis package)
 - DDI (Data Documentation Initiative)
 - DO ("DO file" command script for the Stata Statistical package)
 - DTA (Binary data file for the Stata Statistical package)
 - M2k (MAXQDA)
 - NVivo (Computer-assisted qualitative data analysis package)
 - R (Statistical package)
 - SAS (Statistical package)
 - SAV (Binary "SPSS data format" for the SPSS Statistical package)
 - SPO (Output file for the SPSS Statistical package - version 14)
 - SPS ("Syntax file" (plain text command script) for the SPSS Statistical package)
 - SPV (Output file for the SPSS Statistical package - version 17 and later)
 - Transana (Computer-assisted qualitative data analysis package)