Scientific Data formats

See also Health and Medicine for medical/biomedical data formats, and also see Engineering.

General

 * Common Data Format (CDF)
 * EAS3 (binary file format for structured data)
 * HDF (Hierarchical Data Format, originally from NCSA, now maintained by The HDF Group)
 * HDF4
 * HDF5
 * NRRD (Nearly Raw Raster Data -- a simple format for n-dimensional raster data)
 * NetCDF (Network Common Data Format)
 * ROOT (CERN data-analysis package and related formats, used in their Open Data initiative)
 * SDXF (Structured Data Exchange Format)
 * Silo (a storage format for visualization developed at Lawrence Livermore National Laboratory)
 * Simple Data format (SDF) By George H. Fisher, Space Sciences Lab, UC Berkeley (A platform-independent, precision-preserving binary data I/O format capable of handling large, multi-dimensional arrays)
 * Standard Delay Format (SDF) A standard data structure for timing data
 * XDF (Extensible Data Format)
 * XSIL (Extensible Scientific Interchange Language)

Astronomical and Space

 * Advanced Scientific Data Format
 * CPA (PRISM)
 * Flexible Image Transport System (FITS)
 * PSRFITS (Pulsar data storage standard)
 * ICER
 * NASA Raster Metafile
 * ODL (NASA Object Description Language)
 * PDS (Planetary Data System)
 * PDS4
 * VOTable (IVOA standard table format)
 * SBIG CCDOPS image
 * Standard Archive Format (used for USAF missile data)
 * SDF (Starlink Data Format) and NDF (Starlink's Extensible N-Dimensional Data Format).
 * VICAR
 * WinMiPS

Biological

 * 23andMe
 * AB1 (Chromatogram files used by DNA sequencing instruments from Applied Biosystems)
 * ABCD (Access to Biological Collection Data)
 * ABCDDNA (Access to Biological Collection Data DNA extension)
 * ABCDEFG (Access to Biological Collection Data Extension For Geosciences)
 * ACE (Sequence assembly format)
 * Affymetrix Raw Intensity Format
 * ARF (Axon Raw Format)
 * ARLEQUIN Project Format
 * Axt Alignment Format
 * BAM (Binary compressed SAM format)
 * BED (Browser extensible display format describing genes and other features of DNA sequences)
 * BEDgraph
 * Big Browser Extensible Data Format
 * Big Wiggle Format
 * Binary Alignement Map Format
 * Binary Probe Map Format
 * Binary sequence information Format
 * Biological Pathway eXchange
 * BLAT alignment Format
 * BRIX generated O Format
 * CAF (Common Assembly Format for sequence assembly)
 * CASTEP
 * CellML
 * CHADO XML interchange Format
 * Chain Format for pairwise alignment
 * CHARMM Card File Format
 * CLUSTAL-W Alignment Format
 * CLUSTAL-W Dendrogram Guide File Format
 * Clustered Data Table Format
 * Complete Genomics
 * CRAM
 * DELTA (DEscription Language for TAxonomy)
 * DAS (Distributed Sequence Annotation System)
 * DBN (Dot Bracket Notation (DBN) - Vienna Format)
 * EMBL (Flatfile format used by the EMBL for nucleotide and peptide sequences)
 * EML (Environmental Markup Language) not to be confused with EML (Ecological Metadata Language)
 * ENCODE (Peak information Format)
 * FASTA and FASTQ (File format for sequence data, FASTQ with quality)
 * FuGEFlow
 * FuGE-ML (Functional Genomics Experiment Markup Language)
 * Gating-ML
 * GCDML (Genomic Contextual Data Markup Language)
 * GelML Gel electrophoresis Markup Language
 * GenBank (Flatfile format used by NCBI for nucleotide and peptide sequences)
 * Gene Feature File (Versions 1 and 3)
 * GFF (General feature format for describing genes and other features of DNA, RNA and protein sequences)
 * Gene Prediction File Format
 * GenePattern GeneSet Table Format
 * Genome Annotation File (version 1 and 2)
 * GTF (Gene transfer format holds information about gene structure)
 * HMMER
 * ICB (ICM binary file Format)
 * Image Cytometry Experiment (ICE)
 * Image Cytometry Standard (ICS)
 * imzML (imaging mz Markup Language)
 * ISA-Tab (Investigation Study Assay Tabular)
 * ISND sequence record XML
 * KGML (KEGG Mark-up Language)
 * MAGE-Tab (MicroArray Gene Expression Tabular)
 * MCL (Microbiological Common Language)
 * MIARE-TAB (Minimum Information About a RNAi Experiment Tabular)
 * microarray track data Browser Extensible Data Format
 * MINiML (MIAME Notation in Markup Language)
 * mini Protein Data Bank Format
 * MIQAS-TAB (Minimal Information for QTLs and Association Studies Tabular)
 * MITAB
 * mmCIF (macromolecular Crystallographic Information File)
 * Multiple Alignment Forma
 * mzData (deprecated)
 * mzIdentML
 * mzML
 * mzQuantML
 * mzXML (deprecated)
 * NCD (Natural Collections Descriptions)
 * NDTF (Neurophysiology Data Translation Format)
 * net alignment annotation Format
 * NeuroML (Neuroscience eXtensible Markup Language)
 * New Hampshire eXtended Format
 * Newick tree Format
 * NEXUS (Encodes mixed information about genetic sequence data in a block structured format)
 * Nimblegen Design File Format
 * Nimblegen Gene Data Format
 * NMR-STAR (NMR Self-defining Text Archive and Retrieval format)
 * nucleotide inFormation binary Format
 * ODM (Operational Data Model)
 * Open Biomedical Ontology Flat File Format
 * Personal Genome SNP Format
 * PHD (Output from the basecalling software Phred)
 * phyloXML (XML for evolutionary biology and comparative genomics)
 * Pre-Clustering File Format
 * Protein Data Bank (PDB; Structures of biomolecules deposited in Protein Data Bank)
 * Protein InFormation Resource Format
 * PRM (Protocol Representation Model (Medical Research))
 * PSI-MI XML
 * PSI-PAR
 * RDML (Real-time PCR Data Markup Language)
 * SAM (Sequence Alignment/Map format)
 * SCF (Staden chromatogram files used to store data from DNA sequencing)
 * SBML (Systems Biology Markup Language used to store biochemical network computational models)
 * SDD (Structured Descriptive Data)
 * SED-ML (Simulation Experiment Description Markup Language)
 * SOFT (Simple Omnibus Format in Text)
 * spML (Separation Markup Language)
 * SRA-XML (Short Read Archive eXtensible Markup Language)
 * Standard Flowgram Format
 * Stockholm Multiple Alignment Format (Representing multiple sequence alignments)
 * SBML (System Biology Markup Language)
 * SBGN (Systems Biology Graphical Notation)
 * SBRML (Systems Biology Results Markup Language)
 * Swiss-Prot (Flatfile format used for protein sequences from the Swiss-Prot database)
 * TAIR annotation data Format
 * TAPIR (TDWG Access Protocol for Information Retrieval)
 * TCS (Taxonomic Concept transfer Schema)
 * TraML (Transition Markup Language)
 * UniProtKB XML Format
 * VCF (Variant Call Format)
 * Wiggle Format

Chemical

 * CCP4 (X-ray crystallography voxels (electron density))
 * CDX (ChemDraw file format)
 * CDXML (ChemDraw file format)
 * CHM (ChemDraw file format)
 * CIF (Crystallographic Information File, standardised by IUCr)
 * CML (Chemical markup language)
 * CTab (Chemical table file .mol, .sd, .sdf)
 * HITRAN (spectroscopic data with one optical/infrared transition per line in the ASCII file (.hit))
 * JCAMP (Joint Committee on Atomic and Molecular Physical Data, .dx, .jdx)
 * MOL (MDL Molfile)
 * MOP (MOPAC format)
 * MRC (voxels in cryo-electron microscopy)
 * MST ACD/ChemSketch v1 file format
 * Protein Data Bank (PDB)
 * RPT (OpenLynx) Waters OpenLynx reports
 * RXN (Reaction file format)
 * SK2 (ACD/ChemSketch v2 file format)
 * SKC (ISIS/Draw file format)
 * SMILES (Simplified molecular input line entry specification, .smi)
 * SPC (Spectroscopic Data)
 * Structure Data File (SDF)
 * TGF (ISIS/Draw reaction file format)
 * XYZ Chem Wiki

Chemical data may be distinguished in various ways, including Chemical MIME types.

Earth Sciences

 * Adaptable Seismic Data Format
 * Network-Day Tape
 * QuakeML
 * SEED
 * SEG-D (formats, mostly tape based, for seismic data)
 * SEG Y (Reflection seismology data format)
 * SEIS-PROV
 * StationXML

Ecological

 * Darwin Core (Standard for sharing information about biological diversity)
 * Electronic Data Deliverable (EDD; EPA Superfund)
 * EML (Ecological Metadata Language), not to be confused with EML (Environmental Markup Language)

Environmental

 * HYT (AquiferTest)

Geographic and Geospatial
See also Geospatial


 * DEM (Digital Elevation Model)
 * DOQ (Digital Orthophotos)
 * e00 (ESRI ArcInfo Interchange File)
 * FGDC (Content Standard for Digital Geospatial Metadata??)
 * GeoTIFF (Geospatial extensions to TIFF)
 * GML (Geography Markup Language)
 * HDFEOS, HD2, HD4 (Hierarchical Data Format-Earth Observing System)
 * KML (KML (formerly Keyhole Markup Language), Version 2.2)
 * NDF (National Landsat Archive Production System (NLAPS) Data Format)
 * SAIF (Spatial Archive and Interchange Format, Canadian)
 * SDTS (Spatial Data Transfer Standard)
 * Shapefile (ESRI, shp/shx)
 * MrSID (MrSID- Multi-resolution Seamless Image Database)
 * TAB (MapInfo dataset format, must have component)

Mathematical

 * AsciiMath
 * DOT (graph description language)
 * GEXF (Graph Exchange XML Format)
 * graph6, sparse6 (ASCII encoding of Adjacency matrices (.g6, .s6))
 * graphML (Graph Markup Language)
 * JMP (.jmp)
 * KaleidaGraph (.qda, .qdc)
 * Life 1.05
 * Life 1.06
 * MacWavelets
 * Mathematica
 * Computable Document Format (.cdf)
 * Mathematica notebook (.nb, .nbp)
 * Mathematica package file (M)
 * Wolfram Language
 * Macrocell
 * MCell
 * MathML
 * MATLAB
 * MAT (MATLAB data format)
 * Matlab figure
 * MATLAB script file (m)
 * Matlab Model (.mdl, .slx)
 * Minitab (.mtw, .mpj)
 * OPJ (Origin data format)
 * PDL (Perl Data Language)
 * Plaintext (cellular automata)
 * RLE (cellular automata)
 * Rule (Golly)
 * Small Object Format
 * Statistica
 * CSS Software (Complete Statistical System)
 * CSS STATISTICA
 * WP2 WinPlot

Microscopy

 * Amber ARR Bitmap Image
 * Aperio SVS
 * Bio
 * BioRad confocal image
 * DeltaVision
 * DM2 (Gatan Digital Micrograph 2)
 * DM3 (Gatan Digital Micrograph 3)
 * DM4 (Gatan Digital Micrograph 4)
 * GATAN
 * HMSA (.msa)
 * Image Cytometry Experiment (ICE)
 * Image Cytometry Standard (ICS)
 * KONTRON
 * LIFF (Openlab Layered Image File Format)
 * LSM (Zeiss Light Speed Microscope)
 * MetaMorph Stack (.stk)
 * MRC (Medical Research Council)
 * OME-TIFF (Open Microscopy Imaging format)
 * OME-XML (Open Microscopy Imaging format)
 * SMV
 * VGS-8
 * Zeiss BIVAS

Neutron and X-ray Scattering

 * canSAS (tools for small-angle scattering)
 * CIF (Crystallographic Information File, standardised by IUCr)
 * NeXus (NeXus is a common data format for neutron, x-ray, and muon science)

Oceanographic, Atmospheric and Meteorological

 * GRIB (Gridded Binary)
 * BUFR (Binary Universal Format Representation)
 * IOAPI (netCDF augmented with metadata from the I/O API)
 * Meteosat data
 * PP (UK Met Office format for weather model data)

Physics
See subcategory Physics data

Scientific Signal data

 * ACQ (AcqKnowledge File Format for Windows)
 * BioSemi (BDF) data format
 * BKR (EEG data format)
 * CFWB (Chart Data File Format)
 * EDF (European data format)
 * FEF (File Exchange Format for Vital signs)
 * General Data Format for Biosignals (GDF)
 * GMS (Gesture And Motion Signal format)
 * IROCK (intelliRock Sensor Data File Format)
 * MFER (Medical waveform Format Encoding Rules)
 * REC (ATI Vision recorder file)
 * SCP-ECG (Standard Communication Protocol for Computer assisted electrocardiography)
 * SIGIF (SIGnal Interchange Format)

Social Sciences

 * Atlas.ti (Computer-assisted qualitative data analysis package)
 * DDI (Data Documentation Initiative)
 * DO ("DO file" command script for the Stata Statistical package)
 * DTA (Binary data file for the Stata Statistical package)
 * Linguistic Annotation Framework (LAF; used by computational linguists to annotate language samples)
 * M2k (MAXQDA)
 * NVivo (Computer-assisted qualitative data analysis package)
 * R (Statistical package)
 * SAS (Statistical package)
 * SAS Transport File (.xpt)
 * SAV (Binary "SPSS data format" for the SPSS Statistical package)
 * SPO (Output file for the SPSS Statistical package - version 14)
 * SPS ("Syntax file" (plain text command script) for the SPSS Statistical package)
 * SPV (Output file for the SPSS Statistical package - version 17 and later)
 * Transana (Computer-assisted qualitative data analysis package)

Spectra

 * Bruker (XRF software, .pdz)
 * Niton (XRF software, .ndt)
 * EDAX Spectrum (.spc)
 * Thermo Scientific SPC (.spc)
 * EMSA/MAS
 * HMSA Hyper-Dimensional Data

Miscellaneous

 * AIML (Artificial Intelligence Markup Language)
 * EMD-DF64 (used for high frequency energy monitoring)
 * IES (IESNA LM-63 Photometric Data File)
 * Jupyter Notebook (.ipynb)

Links

 * Improving on “Access to Research”
 * Software Tools For Molecular Microscopy