Scientific Data formats
From Just Solve the File Format Problem
				
								
				(Difference between revisions)
				
																
				
				
								
				Dan Tobias  (Talk | contribs)  (→Mathematical)  | 
			 (→Microscopy)  | 
			||
| (89 intermediate revisions by 7 users not shown) | |||
| Line 6: | Line 6: | ||
}}  | }}  | ||
| − | See also [[Health and Medicine]] for medical/biomedical data formats.  | + | See also [[Health and Medicine]] for medical/biomedical data formats, and also see [[Engineering]].  | 
== General ==  | == General ==  | ||
| − | * [[  | + | * [[Common Data Format]] (CDF)  | 
* [[EAS3]] (binary file format for structured data)  | * [[EAS3]] (binary file format for structured data)  | ||
* [[HDF]] (Hierarchical Data Format, originally from NCSA, now maintained by The HDF Group)  | * [[HDF]] (Hierarchical Data Format, originally from NCSA, now maintained by The HDF Group)  | ||
** [[HDF4]]  | ** [[HDF4]]  | ||
** [[HDF5]]  | ** [[HDF5]]  | ||
| + | * [[IGOR]] (.ibw)  | ||
* [[NRRD]] (Nearly Raw Raster Data -- a simple format for n-dimensional raster data)  | * [[NRRD]] (Nearly Raw Raster Data -- a simple format for n-dimensional raster data)  | ||
* [[NetCDF]] (Network Common Data Format)  | * [[NetCDF]] (Network Common Data Format)  | ||
| + | * [[ROOT]] (CERN data-analysis package and related formats, used in their Open Data initiative)  | ||
* [[SDXF]] (Structured Data Exchange Format)  | * [[SDXF]] (Structured Data Exchange Format)  | ||
* [[Silo]] (a storage format for visualization developed at Lawrence Livermore National Laboratory)  | * [[Silo]] (a storage format for visualization developed at Lawrence Livermore National Laboratory)  | ||
* [[Simple Data format]] (SDF) By George H. Fisher, Space Sciences Lab, UC Berkeley (A platform-independent, precision-preserving binary data I/O format capable of handling large, multi-dimensional arrays)  | * [[Simple Data format]] (SDF) By George H. Fisher, Space Sciences Lab, UC Berkeley (A platform-independent, precision-preserving binary data I/O format capable of handling large, multi-dimensional arrays)  | ||
* [[Standard Delay Format]] (SDF) A standard data structure for timing data  | * [[Standard Delay Format]] (SDF) A standard data structure for timing data  | ||
| − | * [[XDF  | + | * [[XDF (Extensible Data Format)]] [https://en.wikipedia.org/wiki/Extensible_Data_Format]  | 
* [[XSIL]] (Extensible Scientific Interchange Language)  | * [[XSIL]] (Extensible Scientific Interchange Language)  | ||
== Astronomical and Space ==  | == Astronomical and Space ==  | ||
| + | * [[Advanced Scientific Data Format]]  | ||
| + | * [[ARN (Astronomical Research Network)]]  | ||
| + | * [[CPA (PRISM)]]  | ||
* [[Flexible Image Transport System]] (FITS)  | * [[Flexible Image Transport System]] (FITS)  | ||
** [[PSRFITS]] (Pulsar data storage standard)  | ** [[PSRFITS]] (Pulsar data storage standard)  | ||
* [[ICER]]  | * [[ICER]]  | ||
| + | * [[NASA Raster Metafile]]  | ||
| + | * [[ODL (NASA Object Description Language)]]  | ||
* [[PDS]] (Planetary Data System)  | * [[PDS]] (Planetary Data System)  | ||
* [[PDS4]]  | * [[PDS4]]  | ||
* [[VOTable]] (IVOA standard table format)  | * [[VOTable]] (IVOA standard table format)  | ||
| + | * [[SBIG CCDOPS image]]  | ||
| + | * [[Standard Archive Format]] (used for USAF missile data)  | ||
* [[Starlink_Data_Format|SDF]] (Starlink Data Format) and [[N-Dimensional_Data_Format|NDF]] (Starlink's Extensible N-Dimensional Data Format).  | * [[Starlink_Data_Format|SDF]] (Starlink Data Format) and [[N-Dimensional_Data_Format|NDF]] (Starlink's Extensible N-Dimensional Data Format).  | ||
* [[VICAR]]  | * [[VICAR]]  | ||
| + | * [[WinMiPS]]  | ||
== Biological ==  | == Biological ==  | ||
| Line 42: | Line 52: | ||
* [[ACE (Sequence assembly)|ACE]] (Sequence assembly format)  | * [[ACE (Sequence assembly)|ACE]] (Sequence assembly format)  | ||
* [[Affymetrix Raw Intensity Format]]  | * [[Affymetrix Raw Intensity Format]]  | ||
| + | * [[AnnData Object]] (.h5ad)  | ||
| + | * [[ARF (Axon Raw Format)]]  | ||
* [[ARLEQUIN Project Format]]  | * [[ARLEQUIN Project Format]]  | ||
* [[Axt Alignment Format]]  | * [[Axt Alignment Format]]  | ||
| − | * [[BAM]] (Binary compressed SAM format)  | + | * [[BAM (Binary Alignment Map)|BAM]] (Binary compressed SAM format)  | 
* [[BED]] (Browser extensible display format describing genes and other features of DNA sequences)  | * [[BED]] (Browser extensible display format describing genes and other features of DNA sequences)  | ||
* [[BEDgraph]]  | * [[BEDgraph]]  | ||
| Line 56: | Line 68: | ||
* [[BRIX generated O Format]]    | * [[BRIX generated O Format]]    | ||
* [[CAF (Common Assembly Format)|CAF]] (Common Assembly Format for sequence assembly)  | * [[CAF (Common Assembly Format)|CAF]] (Common Assembly Format for sequence assembly)  | ||
| + | * [[CASTEP]]  | ||
* [[CellML]]  | * [[CellML]]  | ||
* [[CHADO XML interchange Format]]  | * [[CHADO XML interchange Format]]  | ||
| Line 64: | Line 77: | ||
* [[Clustered Data Table Format]]  | * [[Clustered Data Table Format]]  | ||
* [[Complete Genomics]]  | * [[Complete Genomics]]  | ||
| + | * [[CRAM]]  | ||
* [[DELTA]] (DEscription Language for TAxonomy)  | * [[DELTA]] (DEscription Language for TAxonomy)  | ||
* [[DAS]] (Distributed Sequence Annotation System)  | * [[DAS]] (Distributed Sequence Annotation System)  | ||
| Line 71: | Line 85: | ||
* [[ENCODE]] (Peak information Format)  | * [[ENCODE]] (Peak information Format)  | ||
* [[FASTA and FASTQ]] (File format for sequence data, FASTQ with quality)  | * [[FASTA and FASTQ]] (File format for sequence data, FASTQ with quality)  | ||
| + | * [[FAST5]] (.fast5)  | ||
* [[FuGEFlow]]  | * [[FuGEFlow]]  | ||
* [[FuGE-ML]] (Functional Genomics Experiment Markup Language)  | * [[FuGE-ML]] (Functional Genomics Experiment Markup Language)  | ||
| Line 78: | Line 93: | ||
* [[GenBank]] (Flatfile format used by NCBI for nucleotide and peptide sequences)  | * [[GenBank]] (Flatfile format used by NCBI for nucleotide and peptide sequences)  | ||
* [[Gene Feature File]] (Versions 1 and 3)  | * [[Gene Feature File]] (Versions 1 and 3)  | ||
| − | |||
* [[Gene Prediction File Format]]  | * [[Gene Prediction File Format]]  | ||
* [[GenePattern GeneSet Table Format]]  | * [[GenePattern GeneSet Table Format]]  | ||
* [[Genome Annotation File]] (version 1 and 2)  | * [[Genome Annotation File]] (version 1 and 2)  | ||
| + | * [[Genozip]]  | ||
| + | * [[GFF]] (General feature format for describing genes and other features of DNA, RNA and protein sequences)  | ||
* [[GTF]] (Gene transfer format holds information about gene structure)  | * [[GTF]] (Gene transfer format holds information about gene structure)  | ||
* [[HMMER]]  | * [[HMMER]]  | ||
* [[ICB]] (ICM binary file Format)  | * [[ICB]] (ICM binary file Format)  | ||
| + | * [[Image Cytometry Experiment]] (ICE)  | ||
| + | * [[Image Cytometry Standard]] (ICS)  | ||
* [[imzML]] (imaging mz Markup Language)  | * [[imzML]] (imaging mz Markup Language)  | ||
* [[ISA-Tab]] (Investigation Study Assay Tabular)  | * [[ISA-Tab]] (Investigation Study Assay Tabular)  | ||
| Line 132: | Line 150: | ||
* [[SDD]] (Structured Descriptive Data)  | * [[SDD]] (Structured Descriptive Data)  | ||
* [[SED-ML]] (Simulation Experiment Description Markup Language)  | * [[SED-ML]] (Simulation Experiment Description Markup Language)  | ||
| − | |||
* [[SOFT]] (Simple Omnibus Format in Text)  | * [[SOFT]] (Simple Omnibus Format in Text)  | ||
* [[spML]] (Separation Markup Language)  | * [[spML]] (Separation Markup Language)  | ||
| Line 152: | Line 169: | ||
== Chemical ==  | == Chemical ==  | ||
* [[CCP4]] (X-ray crystallography voxels (electron density))  | * [[CCP4]] (X-ray crystallography voxels (electron density))  | ||
| − | * [[CDX]] (ChemDraw file format)  | + | * [[CDX (ChemDraw Exchange)|CDX]] (ChemDraw file format)  | 
* [[CDXML]] (ChemDraw file format)  | * [[CDXML]] (ChemDraw file format)  | ||
* [[CHM (ChemDraw)|CHM]] (ChemDraw file format)  | * [[CHM (ChemDraw)|CHM]] (ChemDraw file format)  | ||
| Line 165: | Line 182: | ||
* [[MST]] ACD/ChemSketch v1 file format  | * [[MST]] ACD/ChemSketch v1 file format  | ||
* [[Protein Data Bank]] (PDB)  | * [[Protein Data Bank]] (PDB)  | ||
| − | * [[RPT]]   | + | * [[RPT (OpenLynx)]] Waters OpenLynx reports  | 
* [[RXN]] (Reaction file format)  | * [[RXN]] (Reaction file format)  | ||
* [[SK2]] (ACD/ChemSketch v2 file format)  | * [[SK2]] (ACD/ChemSketch v2 file format)  | ||
* [[SKC]] (ISIS/Draw file format)  | * [[SKC]] (ISIS/Draw file format)  | ||
* [[SMILES]] (Simplified molecular input line entry specification, .smi)  | * [[SMILES]] (Simplified molecular input line entry specification, .smi)  | ||
| − | * [[SPC  | + | * [[SPC (Spectroscopic Data)]]  | 
* [[Structure Data File]] (SDF)  | * [[Structure Data File]] (SDF)  | ||
* [[TGF]] (ISIS/Draw reaction file format)  | * [[TGF]] (ISIS/Draw reaction file format)  | ||
| + | * [[XYZ Chem]] [https://en.wikipedia.org/wiki/XYZ_file_format Wiki]  | ||
Chemical data may be distinguished in various ways, including [http://www.ch.ic.ac.uk/chemime/ Chemical MIME] types.  | Chemical data may be distinguished in various ways, including [http://www.ch.ic.ac.uk/chemime/ Chemical MIME] types.  | ||
| + | |||
| + | == Earth Sciences ==  | ||
| + | * [[Adaptable Seismic Data Format]]  | ||
| + | * [[Network-Day Tape]]  | ||
| + | * [[QuakeML]]  | ||
| + | * [[SEED]]  | ||
| + | * [[SEG-D]] (formats, mostly tape based, for seismic data)  | ||
| + | * [[SEG Y]] (Reflection seismology data format)  | ||
| + | * [[SEIS-PROV]]  | ||
| + | * [[StationXML]]  | ||
== Ecological ==  | == Ecological ==  | ||
* [[Darwin Core]] (Standard for sharing information about biological diversity)  | * [[Darwin Core]] (Standard for sharing information about biological diversity)  | ||
| + | * [[Electronic Data Deliverable]] (EDD; EPA Superfund)  | ||
* [[EML (Ecological Metadata Language)]], not to be confused with [[EML (Environmental Markup Language)]]  | * [[EML (Ecological Metadata Language)]], not to be confused with [[EML (Environmental Markup Language)]]  | ||
| + | |||
| + | == Environmental ==  | ||
| + | * [[HYT]] (AquiferTest)  | ||
== Geographic and Geospatial ==  | == Geographic and Geospatial ==  | ||
| Line 190: | Line 222: | ||
* [[GeoTIFF]] (Geospatial extensions to TIFF)  | * [[GeoTIFF]] (Geospatial extensions to TIFF)  | ||
* [[GML]] (Geography Markup Language)  | * [[GML]] (Geography Markup Language)  | ||
| − | * [[  | + | * [[HDF-EOS]] (Hierarchical Data Format-Earth Observing System)[https://hdfeos.org/ 1] (HD2, HD4, HD5)  | 
* [[KML]] (KML (formerly Keyhole Markup Language), Version 2.2)  | * [[KML]] (KML (formerly Keyhole Markup Language), Version 2.2)  | ||
* [[NDF]] (National Landsat Archive Production System (NLAPS) Data Format)  | * [[NDF]] (National Landsat Archive Production System (NLAPS) Data Format)  | ||
* [[SAIF]] (Spatial Archive and Interchange Format, Canadian)  | * [[SAIF]] (Spatial Archive and Interchange Format, Canadian)  | ||
* [[SDTS]] (Spatial Data Transfer Standard)  | * [[SDTS]] (Spatial Data Transfer Standard)  | ||
| − | * [[  | + | * [[Shapefile]] (ESRI, shp/shx)  | 
* [[MrSID]] (MrSID- Multi-resolution Seamless Image Database)  | * [[MrSID]] (MrSID- Multi-resolution Seamless Image Database)  | ||
* [[TAB]] (MapInfo dataset format, must have component)  | * [[TAB]] (MapInfo dataset format, must have component)  | ||
| + | * [[Bathymetric Attributed Grid]] (.bag)  | ||
== Mathematical ==  | == Mathematical ==  | ||
| + | * [[AsciiMath]]  | ||
* [[DOT (graph description language)]]  | * [[DOT (graph description language)]]  | ||
| + | * [[GEXF]] (Graph Exchange XML Format)  | ||
* [[graph6, sparse6]] (ASCII encoding of Adjacency matrices (.g6, .s6))  | * [[graph6, sparse6]] (ASCII encoding of Adjacency matrices (.g6, .s6))  | ||
* [[graphML]] (Graph Markup Language)  | * [[graphML]] (Graph Markup Language)  | ||
| + | * GraphPad Prism  | ||
| + | ** [[PZM]]  | ||
| + | ** [[PZF]]  | ||
| + | ** [[PZFX]]  | ||
| + | ** [[PRISM]]  | ||
| + | * [[JMP]] (.jmp)  | ||
| + | * [[KaleidaGraph]] (.qda, .qdc)  | ||
| + | * [[Life 1.05]]  | ||
| + | * [[Life 1.06]]  | ||
| + | * [[MacWavelets]]  | ||
* Mathematica  | * Mathematica  | ||
| + | ** [[Computable Document Format]] (.cdf)  | ||
** [[Mathematica notebook]] (.nb, .nbp)  | ** [[Mathematica notebook]] (.nb, .nbp)  | ||
** [[Mathematica package file]] (M)  | ** [[Mathematica package file]] (M)  | ||
| + | ** [[Wolfram Language]]  | ||
| + | * [[Macrocell]]  | ||
| + | * [[MCell]]  | ||
* [[MathML]]  | * [[MathML]]  | ||
* MATLAB  | * MATLAB  | ||
| Line 211: | Line 260: | ||
** [[Matlab figure]]  | ** [[Matlab figure]]  | ||
** [[MATLAB script file]] (m)  | ** [[MATLAB script file]] (m)  | ||
| + | ** [[Matlab Model]] (.mdl, .slx)  | ||
| + | * [[Minitab]] (.mtw, .mpj)  | ||
| + | * [[NPY and NPZ (NumPy)]]  | ||
* [[OPJ]] (Origin data format)  | * [[OPJ]] (Origin data format)  | ||
| + | * [[PDL]] (Perl Data Language)  | ||
| + | * [[Plaintext (cellular automata)]]  | ||
| + | * [[RLE (cellular automata)]]  | ||
| + | * [[Rule (Golly)]]  | ||
| + | * [[Small Object Format]]  | ||
* [[Statistica]]  | * [[Statistica]]  | ||
| + | ** [[CSS Software]] (Complete Statistical System)  | ||
| + | ** [[CSS STATISTICA]]  | ||
* [[WP2]] WinPlot  | * [[WP2]] WinPlot  | ||
| + | |||
| + | == Microscopy ==   | ||
| + | |||
| + | * [[Amber ARR Bitmap Image]]  | ||
| + | * [[Aperio SVS]]  | ||
| + | * [[Bio]]  | ||
| + | * [[BioRad confocal image]]  | ||
| + | * [[CZI]] (Zeiss) [https://www.zeiss.com/microscopy/us/products/software/zeiss-zen/czi-image-file-format.html]  | ||
| + | * [[DeltaVision]]  | ||
| + | * [[DM2]] (Gatan Digital Micrograph 2)  | ||
| + | * [[DM3]] (Gatan Digital Micrograph 3)  | ||
| + | * [[DM4]] (Gatan Digital Micrograph 4)  | ||
| + | * [[GATAN]]  | ||
| + | * [[HMSA]] (.msa)  | ||
| + | * [[Image Cytometry Experiment]] (ICE)  | ||
| + | * [[Image Cytometry Standard]] (ICS)  | ||
| + | * [[KONTRON]]  | ||
| + | * [[LIFF]] (Openlab Layered Image File Format)   | ||
| + | * [[LSM]] (Zeiss Light Speed Microscope)   | ||
| + | * [[MetaMorph Stack]] (.stk)  | ||
| + | * [[MRC]] (Medical Research Council)  | ||
| + | * [[OME-TIFF]] (Open Microscopy Imaging format)  | ||
| + | * [[OME-XML]] (Open Microscopy Imaging format)  | ||
| + | * [[SMV]]  | ||
| + | * [[VGS-8]]  | ||
| + | * [[Zeiss BIVAS]]  | ||
| + | |||
| + | == Neutron and X-ray Scattering ==  | ||
| + | |||
| + | * [[canSAS]] (tools for small-angle scattering)  | ||
| + | * [[CIF]] (Crystallographic Information File, standardised by IUCr)  | ||
| + | * [[NeXus]] (NeXus is a common data format for neutron, x-ray, and muon science)  | ||
== Oceanographic, Atmospheric and Meteorological ==  | == Oceanographic, Atmospheric and Meteorological ==  | ||
| Line 220: | Line 311: | ||
* [[BUFR]] (Binary Universal Format Representation)  | * [[BUFR]] (Binary Universal Format Representation)  | ||
* [[IOAPI]] (netCDF augmented with metadata from the I/O API)  | * [[IOAPI]] (netCDF augmented with metadata from the I/O API)  | ||
| + | * [[Meteosat data]]  | ||
* [[PP]] (UK Met Office format for weather model data)  | * [[PP]] (UK Met Office format for weather model data)  | ||
== Physics ==  | == Physics ==  | ||
| − | + | See subcategory [[Physics data]]  | |
| − | + | ||
| − | + | ||
== Scientific Signal data ==  | == Scientific Signal data ==  | ||
| Line 236: | Line 326: | ||
* [[EDF]] (European data format)  | * [[EDF]] (European data format)  | ||
* [[FEF]] (File Exchange Format for Vital signs)  | * [[FEF]] (File Exchange Format for Vital signs)  | ||
| − | * [[  | + | * [[General Data Format for Biosignals]] (GDF)  | 
* [[GMS]] (Gesture And Motion Signal format)  | * [[GMS]] (Gesture And Motion Signal format)  | ||
* [[IROCK]] (intelliRock Sensor Data File Format)  | * [[IROCK]] (intelliRock Sensor Data File Format)  | ||
| Line 242: | Line 332: | ||
* [[REC]] (ATI Vision recorder file)  | * [[REC]] (ATI Vision recorder file)  | ||
* [[SCP-ECG]] (Standard Communication Protocol for Computer assisted electrocardiography)  | * [[SCP-ECG]] (Standard Communication Protocol for Computer assisted electrocardiography)  | ||
| − | |||
* [[SIGIF]] (SIGnal Interchange Format)  | * [[SIGIF]] (SIGnal Interchange Format)  | ||
| Line 248: | Line 337: | ||
* [[Atlas.ti]] ([[Computer-assisted qualitative data analysis]] package)  | * [[Atlas.ti]] ([[Computer-assisted qualitative data analysis]] package)  | ||
| − | * [[DDI]] (Data Documentation Initiative)  | + | * [[DDI (Data Documentation Initiative)|DDI]] (Data Documentation Initiative)  | 
* [[DO]] ("DO file" command script for the [[Stata]] Statistical package)  | * [[DO]] ("DO file" command script for the [[Stata]] Statistical package)  | ||
* [[DTA]] (Binary data file for the [[Stata]] Statistical package)  | * [[DTA]] (Binary data file for the [[Stata]] Statistical package)  | ||
| + | * [[Linguistic Annotation Framework]] (LAF; used by computational linguists to annotate language samples)   | ||
* [[M2k]] (MAXQDA)  | * [[M2k]] (MAXQDA)  | ||
* [[NVivo]] ([[Computer-assisted qualitative data analysis]] package)  | * [[NVivo]] ([[Computer-assisted qualitative data analysis]] package)  | ||
* [[R]] (Statistical package)  | * [[R]] (Statistical package)  | ||
* [[SAS]] (Statistical package)  | * [[SAS]] (Statistical package)  | ||
| + | ** [[SAS Transport File]] (.xpt)  | ||
* [[SAV]] (Binary "[[SPSS]] data format" for the [[SPSS]] Statistical package)  | * [[SAV]] (Binary "[[SPSS]] data format" for the [[SPSS]] Statistical package)  | ||
* [[SPO]] (Output file for the [[SPSS]] Statistical package - version 14)  | * [[SPO]] (Output file for the [[SPSS]] Statistical package - version 14)  | ||
* [[SPS]] ("Syntax file" (plain text command script) for the [[SPSS]] Statistical package)  | * [[SPS]] ("Syntax file" (plain text command script) for the [[SPSS]] Statistical package)  | ||
* [[SPV]] (Output file for the [[SPSS]] Statistical package - version 17 and later)  | * [[SPV]] (Output file for the [[SPSS]] Statistical package - version 17 and later)  | ||
| + | * [[Statistix]] (.sx)  | ||
* [[Transana]] ([[Computer-assisted qualitative data analysis]] package)  | * [[Transana]] ([[Computer-assisted qualitative data analysis]] package)  | ||
| + | |||
| + | == Spectra ==  | ||
| + | * [[Bruker]] (XRF software, .pdz)  | ||
| + | * [[Niton]] (XRF software, .ndt)  | ||
| + | * [[EDAX Spectrum]] (.spc)  | ||
| + | * [[Thermo Scientific SPC]] (.spc)  | ||
| + | * [[EMSA/MAS]]  | ||
| + | * [[HMSA Hyper-Dimensional Data]]  | ||
| + | |||
| + | == Miscellaneous ==  | ||
| + | |||
| + | * [[AIML]] (Artificial Intelligence Markup Language)  | ||
| + | * [[EMD-DF64]] (used for high frequency energy monitoring)  | ||
| + | * [[IES]] (IESNA LM-63 Photometric Data File)  | ||
| + | * [[Jupyter Notebook]] (.ipynb)  | ||
== Links ==  | == Links ==  | ||
* [http://cameronneylon.net/blog/improving-on-access-to-research/ Improving on “Access to Research”]  | * [http://cameronneylon.net/blog/improving-on-access-to-research/ Improving on “Access to Research”]  | ||
| + | * [[WikiBooks:Software Tools For Molecular Microscopy]]  | ||
Latest revision as of 12:14, 22 October 2025
See also Health and Medicine for medical/biomedical data formats, and also see Engineering.
[edit] General
- Common Data Format (CDF)
 - EAS3 (binary file format for structured data)
 - HDF (Hierarchical Data Format, originally from NCSA, now maintained by The HDF Group)
 - IGOR (.ibw)
 - NRRD (Nearly Raw Raster Data -- a simple format for n-dimensional raster data)
 - NetCDF (Network Common Data Format)
 - ROOT (CERN data-analysis package and related formats, used in their Open Data initiative)
 - SDXF (Structured Data Exchange Format)
 - Silo (a storage format for visualization developed at Lawrence Livermore National Laboratory)
 - Simple Data format (SDF) By George H. Fisher, Space Sciences Lab, UC Berkeley (A platform-independent, precision-preserving binary data I/O format capable of handling large, multi-dimensional arrays)
 - Standard Delay Format (SDF) A standard data structure for timing data
 - XDF (Extensible Data Format) [1]
 - XSIL (Extensible Scientific Interchange Language)
 
[edit] Astronomical and Space
- Advanced Scientific Data Format
 - ARN (Astronomical Research Network)
 - CPA (PRISM)
 -  Flexible Image Transport System (FITS)
- PSRFITS (Pulsar data storage standard)
 
 - ICER
 - NASA Raster Metafile
 - ODL (NASA Object Description Language)
 - PDS (Planetary Data System)
 - PDS4
 - VOTable (IVOA standard table format)
 - SBIG CCDOPS image
 - Standard Archive Format (used for USAF missile data)
 - SDF (Starlink Data Format) and NDF (Starlink's Extensible N-Dimensional Data Format).
 - VICAR
 - WinMiPS
 
[edit] Biological
- 23andMe
 - AB1 (Chromatogram files used by DNA sequencing instruments from Applied Biosystems)
 - ABCD (Access to Biological Collection Data)
 - ABCDDNA (Access to Biological Collection Data DNA extension)
 - ABCDEFG (Access to Biological Collection Data Extension For Geosciences)
 - ACE (Sequence assembly format)
 - Affymetrix Raw Intensity Format
 - AnnData Object (.h5ad)
 - ARF (Axon Raw Format)
 - ARLEQUIN Project Format
 - Axt Alignment Format
 - BAM (Binary compressed SAM format)
 - BED (Browser extensible display format describing genes and other features of DNA sequences)
 - BEDgraph
 - Big Browser Extensible Data Format
 - Big Wiggle Format
 - Binary Alignement Map Format
 - Binary Probe Map Format
 - Binary sequence information Format
 - Biological Pathway eXchange
 - BLAT alignment Format
 - BRIX generated O Format
 - CAF (Common Assembly Format for sequence assembly)
 - CASTEP
 - CellML
 - CHADO XML interchange Format
 - Chain Format for pairwise alignment
 - CHARMM Card File Format
 - CLUSTAL-W Alignment Format
 - CLUSTAL-W Dendrogram Guide File Format
 - Clustered Data Table Format
 - Complete Genomics
 - CRAM
 - DELTA (DEscription Language for TAxonomy)
 - DAS (Distributed Sequence Annotation System)
 - DBN (Dot Bracket Notation (DBN) - Vienna Format)
 - EMBL (Flatfile format used by the EMBL for nucleotide and peptide sequences)
 - EML (Environmental Markup Language) not to be confused with EML (Ecological Metadata Language)
 - ENCODE (Peak information Format)
 - FASTA and FASTQ (File format for sequence data, FASTQ with quality)
 - FAST5 (.fast5)
 - FuGEFlow
 - FuGE-ML (Functional Genomics Experiment Markup Language)
 - Gating-ML
 - GCDML (Genomic Contextual Data Markup Language)
 - GelML Gel electrophoresis Markup Language
 - GenBank (Flatfile format used by NCBI for nucleotide and peptide sequences)
 - Gene Feature File (Versions 1 and 3)
 - Gene Prediction File Format
 - GenePattern GeneSet Table Format
 - Genome Annotation File (version 1 and 2)
 - Genozip
 - GFF (General feature format for describing genes and other features of DNA, RNA and protein sequences)
 - GTF (Gene transfer format holds information about gene structure)
 - HMMER
 - ICB (ICM binary file Format)
 - Image Cytometry Experiment (ICE)
 - Image Cytometry Standard (ICS)
 - imzML (imaging mz Markup Language)
 - ISA-Tab (Investigation Study Assay Tabular)
 - ISND sequence record XML
 - KGML (KEGG Mark-up Language)
 - MAGE-Tab (MicroArray Gene Expression Tabular)
 - MCL (Microbiological Common Language)
 - MIARE-TAB (Minimum Information About a RNAi Experiment Tabular)
 - microarray track data Browser Extensible Data Format
 - MINiML (MIAME Notation in Markup Language)
 - mini Protein Data Bank Format
 - MIQAS-TAB (Minimal Information for QTLs and Association Studies Tabular)
 - MITAB
 - mmCIF (macromolecular Crystallographic Information File)
 - Multiple Alignment Forma
 - mzData (deprecated)
 - mzIdentML
 - mzML
 - mzQuantML
 - mzXML (deprecated)
 - NCD (Natural Collections Descriptions)
 - NDTF (Neurophysiology Data Translation Format)
 - net alignment annotation Format
 - NeuroML (Neuroscience eXtensible Markup Language)
 - New Hampshire eXtended Format
 - Newick tree Format
 - NEXUS (Encodes mixed information about genetic sequence data in a block structured format)
 - Nimblegen Design File Format
 - Nimblegen Gene Data Format
 - NMR-STAR (NMR Self-defining Text Archive and Retrieval format)
 - nucleotide inFormation binary Format
 - ODM (Operational Data Model)
 - Open Biomedical Ontology Flat File Format
 - Personal Genome SNP Format
 - PHD (Output from the basecalling software Phred)
 - phyloXML (XML for evolutionary biology and comparative genomics)
 - Pre-Clustering File Format
 - Protein Data Bank (PDB; Structures of biomolecules deposited in Protein Data Bank)
 - Protein InFormation Resource Format
 - PRM (Protocol Representation Model (Medical Research))
 - PSI-MI XML
 - PSI-PAR
 - RDML (Real-time PCR Data Markup Language)
 - SAM (Sequence Alignment/Map format)
 - SCF (Staden chromatogram files used to store data from DNA sequencing)
 - SBML (Systems Biology Markup Language used to store biochemical network computational models)
 - SDD (Structured Descriptive Data)
 - SED-ML (Simulation Experiment Description Markup Language)
 - SOFT (Simple Omnibus Format in Text)
 - spML (Separation Markup Language)
 - SRA-XML (Short Read Archive eXtensible Markup Language)
 - Standard Flowgram Format
 - Stockholm Multiple Alignment Format (Representing multiple sequence alignments)
 - SBML (System Biology Markup Language)
 - SBGN (Systems Biology Graphical Notation)
 - SBRML (Systems Biology Results Markup Language)
 - Swiss-Prot (Flatfile format used for protein sequences from the Swiss-Prot database)
 - TAIR annotation data Format
 - TAPIR (TDWG Access Protocol for Information Retrieval)
 - TCS (Taxonomic Concept transfer Schema)
 - TraML (Transition Markup Language)
 - UniProtKB XML Format
 - VCF (Variant Call Format)
 - Wiggle Format
 
[edit] Chemical
- CCP4 (X-ray crystallography voxels (electron density))
 - CDX (ChemDraw file format)
 - CDXML (ChemDraw file format)
 - CHM (ChemDraw file format)
 - CIF (Crystallographic Information File, standardised by IUCr)
 - CML (Chemical markup language)
 - CTab (Chemical table file .mol, .sd, .sdf)
 - HITRAN (spectroscopic data with one optical/infrared transition per line in the ASCII file (.hit))
 - JCAMP (Joint Committee on Atomic and Molecular Physical Data, .dx, .jdx)
 - MOL (MDL Molfile)
 - MOP (MOPAC format)
 - MRC (voxels in cryo-electron microscopy)
 - MST ACD/ChemSketch v1 file format
 - Protein Data Bank (PDB)
 - RPT (OpenLynx) Waters OpenLynx reports
 - RXN (Reaction file format)
 - SK2 (ACD/ChemSketch v2 file format)
 - SKC (ISIS/Draw file format)
 - SMILES (Simplified molecular input line entry specification, .smi)
 - SPC (Spectroscopic Data)
 - Structure Data File (SDF)
 - TGF (ISIS/Draw reaction file format)
 - XYZ Chem Wiki
 
Chemical data may be distinguished in various ways, including Chemical MIME types.
[edit] Earth Sciences
- Adaptable Seismic Data Format
 - Network-Day Tape
 - QuakeML
 - SEED
 - SEG-D (formats, mostly tape based, for seismic data)
 - SEG Y (Reflection seismology data format)
 - SEIS-PROV
 - StationXML
 
[edit] Ecological
- Darwin Core (Standard for sharing information about biological diversity)
 - Electronic Data Deliverable (EDD; EPA Superfund)
 - EML (Ecological Metadata Language), not to be confused with EML (Environmental Markup Language)
 
[edit] Environmental
- HYT (AquiferTest)
 
[edit] Geographic and Geospatial
See also Geospatial
- DEM (Digital Elevation Model)
 - DOQ (Digital Orthophotos)
 - e00 (ESRI ArcInfo Interchange File)
 - FGDC (Content Standard for Digital Geospatial Metadata??)
 - GeoTIFF (Geospatial extensions to TIFF)
 - GML (Geography Markup Language)
 - HDF-EOS (Hierarchical Data Format-Earth Observing System)1 (HD2, HD4, HD5)
 - KML (KML (formerly Keyhole Markup Language), Version 2.2)
 - NDF (National Landsat Archive Production System (NLAPS) Data Format)
 - SAIF (Spatial Archive and Interchange Format, Canadian)
 - SDTS (Spatial Data Transfer Standard)
 - Shapefile (ESRI, shp/shx)
 - MrSID (MrSID- Multi-resolution Seamless Image Database)
 - TAB (MapInfo dataset format, must have component)
 - Bathymetric Attributed Grid (.bag)
 
[edit] Mathematical
- AsciiMath
 - DOT (graph description language)
 - GEXF (Graph Exchange XML Format)
 - graph6, sparse6 (ASCII encoding of Adjacency matrices (.g6, .s6))
 - graphML (Graph Markup Language)
 - GraphPad Prism
 - JMP (.jmp)
 - KaleidaGraph (.qda, .qdc)
 - Life 1.05
 - Life 1.06
 - MacWavelets
 -  Mathematica
- Computable Document Format (.cdf)
 - Mathematica notebook (.nb, .nbp)
 - Mathematica package file (M)
 - Wolfram Language
 
 - Macrocell
 - MCell
 - MathML
 -  MATLAB
- MAT (MATLAB data format)
 - Matlab figure
 - MATLAB script file (m)
 - Matlab Model (.mdl, .slx)
 
 - Minitab (.mtw, .mpj)
 - NPY and NPZ (NumPy)
 - OPJ (Origin data format)
 - PDL (Perl Data Language)
 - Plaintext (cellular automata)
 - RLE (cellular automata)
 - Rule (Golly)
 - Small Object Format
 -  Statistica
- CSS Software (Complete Statistical System)
 - CSS STATISTICA
 
 - WP2 WinPlot
 
[edit] Microscopy
- Amber ARR Bitmap Image
 - Aperio SVS
 - Bio
 - BioRad confocal image
 - CZI (Zeiss) [2]
 - DeltaVision
 - DM2 (Gatan Digital Micrograph 2)
 - DM3 (Gatan Digital Micrograph 3)
 - DM4 (Gatan Digital Micrograph 4)
 - GATAN
 - HMSA (.msa)
 - Image Cytometry Experiment (ICE)
 - Image Cytometry Standard (ICS)
 - KONTRON
 - LIFF (Openlab Layered Image File Format)
 - LSM (Zeiss Light Speed Microscope)
 - MetaMorph Stack (.stk)
 - MRC (Medical Research Council)
 - OME-TIFF (Open Microscopy Imaging format)
 - OME-XML (Open Microscopy Imaging format)
 - SMV
 - VGS-8
 - Zeiss BIVAS
 
[edit] Neutron and X-ray Scattering
- canSAS (tools for small-angle scattering)
 - CIF (Crystallographic Information File, standardised by IUCr)
 - NeXus (NeXus is a common data format for neutron, x-ray, and muon science)
 
[edit] Oceanographic, Atmospheric and Meteorological
- GRIB (Gridded Binary)
 - BUFR (Binary Universal Format Representation)
 - IOAPI (netCDF augmented with metadata from the I/O API)
 - Meteosat data
 - PP (UK Met Office format for weather model data)
 
[edit] Physics
See subcategory Physics data
[edit] Scientific Signal data
- ACQ (AcqKnowledge File Format for Windows)
 - BioSemi (BDF) data format
 - BKR (EEG data format)
 - CFWB (Chart Data File Format)
 - EDF (European data format)
 - FEF (File Exchange Format for Vital signs)
 - General Data Format for Biosignals (GDF)
 - GMS (Gesture And Motion Signal format)
 - IROCK (intelliRock Sensor Data File Format)
 - MFER (Medical waveform Format Encoding Rules)
 - REC (ATI Vision recorder file)
 - SCP-ECG (Standard Communication Protocol for Computer assisted electrocardiography)
 - SIGIF (SIGnal Interchange Format)
 
[edit] Social Sciences
- Atlas.ti (Computer-assisted qualitative data analysis package)
 - DDI (Data Documentation Initiative)
 - DO ("DO file" command script for the Stata Statistical package)
 - DTA (Binary data file for the Stata Statistical package)
 - Linguistic Annotation Framework (LAF; used by computational linguists to annotate language samples)
 - M2k (MAXQDA)
 - NVivo (Computer-assisted qualitative data analysis package)
 - R (Statistical package)
 -  SAS (Statistical package)
- SAS Transport File (.xpt)
 
 - SAV (Binary "SPSS data format" for the SPSS Statistical package)
 - SPO (Output file for the SPSS Statistical package - version 14)
 - SPS ("Syntax file" (plain text command script) for the SPSS Statistical package)
 - SPV (Output file for the SPSS Statistical package - version 17 and later)
 - Statistix (.sx)
 - Transana (Computer-assisted qualitative data analysis package)
 
[edit] Spectra
- Bruker (XRF software, .pdz)
 - Niton (XRF software, .ndt)
 - EDAX Spectrum (.spc)
 - Thermo Scientific SPC (.spc)
 - EMSA/MAS
 - HMSA Hyper-Dimensional Data
 
[edit] Miscellaneous
- AIML (Artificial Intelligence Markup Language)
 - EMD-DF64 (used for high frequency energy monitoring)
 - IES (IESNA LM-63 Photometric Data File)
 - Jupyter Notebook (.ipynb)