Scientific Data formats

From Just Solve the File Format Problem
(Difference between revisions)
Jump to: navigation, search
m (Oceanographic, Atmospheric and Meteorological)
(Miscellaneous: Added QuakeML)
(37 intermediate revisions by 5 users not shown)
Line 9: Line 9:
  
 
== General ==
 
== General ==
* [[CDF]] (Common Data Format)
+
* [[Common Data Format]] (CDF)
 
* [[EAS3]] (binary file format for structured data)
 
* [[EAS3]] (binary file format for structured data)
 
* [[HDF]] (Hierarchical Data Format, originally from NCSA, now maintained by The HDF Group)
 
* [[HDF]] (Hierarchical Data Format, originally from NCSA, now maintained by The HDF Group)
Line 16: Line 16:
 
* [[NRRD]] (Nearly Raw Raster Data -- a simple format for n-dimensional raster data)
 
* [[NRRD]] (Nearly Raw Raster Data -- a simple format for n-dimensional raster data)
 
* [[NetCDF]] (Network Common Data Format)
 
* [[NetCDF]] (Network Common Data Format)
* There are several formats abbreviated as [[SDF]], including:
+
* [[ROOT]] (CERN data-analysis package and related formats, used in their Open Data initiative)
** [[Simple Data format]] (SDF) By George H. Fisher, Space Sciences Lab, UC Berkeley (A platform-independent, precision-preserving binary data I/O format capable of handling large, multi-dimensional arrays)
+
** [[Simple Data format-DPT]] A new format from the Data Protocols Team for publishing and sharing data
+
** [[Standard Delay Format]] A standard data structure for timing data
+
** [[Structure Data File]]  A file format for a chemical table file
+
 
* [[SDXF]] (Structured Data Exchange Format)
 
* [[SDXF]] (Structured Data Exchange Format)
 
* [[Silo]] (a storage format for visualization developed at Lawrence Livermore National Laboratory)
 
* [[Silo]] (a storage format for visualization developed at Lawrence Livermore National Laboratory)
 +
* [[Simple Data format]] (SDF) By George H. Fisher, Space Sciences Lab, UC Berkeley (A platform-independent, precision-preserving binary data I/O format capable of handling large, multi-dimensional arrays)
 +
* [[Standard Delay Format]] (SDF) A standard data structure for timing data
 
* [[XDF]] (eXtensible Data Format)
 
* [[XDF]] (eXtensible Data Format)
 
* [[XSIL]] (Extensible Scientific Interchange Language)
 
* [[XSIL]] (Extensible Scientific Interchange Language)
  
 
== Astronomical and Space ==
 
== Astronomical and Space ==
 +
* [[Advanced Scientific Data Format]]
 
* [[Flexible Image Transport System]] (FITS)
 
* [[Flexible Image Transport System]] (FITS)
 
** [[PSRFITS]] (Pulsar data storage standard)
 
** [[PSRFITS]] (Pulsar data storage standard)
 
* [[ICER]]
 
* [[ICER]]
* [[PDS]]/[[ODL (NASA Object Description Language)|ODL]] (Planetary Data System)
+
* [[NASA Raster Metafile]]
 +
* [[ODL (NASA Object Description Language)]]
 +
* [[PDS]] (Planetary Data System)
 
* [[PDS4]]
 
* [[PDS4]]
 
* [[VOTable]] (IVOA standard table format)
 
* [[VOTable]] (IVOA standard table format)
 +
* [[SBIG CCDOPS image]]
 +
* [[Standard Archive Format]] (used for USAF missile data)
 
* [[Starlink_Data_Format|SDF]] (Starlink Data Format) and [[N-Dimensional_Data_Format|NDF]] (Starlink's Extensible N-Dimensional Data Format).
 
* [[Starlink_Data_Format|SDF]] (Starlink Data Format) and [[N-Dimensional_Data_Format|NDF]] (Starlink's Extensible N-Dimensional Data Format).
 
* [[VICAR]]
 
* [[VICAR]]
 +
* [[WinMiPS]]
  
 
== Biological ==
 
== Biological ==
  
 +
* [[23andMe]]
 
* [[AB1]] (Chromatogram files used by DNA sequencing instruments from Applied Biosystems)
 
* [[AB1]] (Chromatogram files used by DNA sequencing instruments from Applied Biosystems)
 
* [[ABCD]] (Access to Biological Collection Data)
 
* [[ABCD]] (Access to Biological Collection Data)
Line 44: Line 49:
 
* [[ACE (Sequence assembly)|ACE]] (Sequence assembly format)
 
* [[ACE (Sequence assembly)|ACE]] (Sequence assembly format)
 
* [[Affymetrix Raw Intensity Format]]
 
* [[Affymetrix Raw Intensity Format]]
 +
* [[ARF (Axon Raw Format)]]
 
* [[ARLEQUIN Project Format]]
 
* [[ARLEQUIN Project Format]]
 
* [[Axt Alignment Format]]
 
* [[Axt Alignment Format]]
Line 65: Line 71:
 
* [[CLUSTAL-W Dendrogram Guide File Format]]
 
* [[CLUSTAL-W Dendrogram Guide File Format]]
 
* [[Clustered Data Table Format]]
 
* [[Clustered Data Table Format]]
 +
* [[Complete Genomics]]
 
* [[DELTA]] (DEscription Language for TAxonomy)
 
* [[DELTA]] (DEscription Language for TAxonomy)
 
* [[DAS]] (Distributed Sequence Annotation System)
 
* [[DAS]] (Distributed Sequence Annotation System)
Line 118: Line 125:
 
* [[ODM]] (Operational Data Model)
 
* [[ODM]] (Operational Data Model)
 
* [[Open Biomedical Ontology Flat File Format]]
 
* [[Open Biomedical Ontology Flat File Format]]
* [[PDB]] (Structures of biomolecules deposited in Protein Data Bank)
 
 
* [[Personal Genome SNP Format]]
 
* [[Personal Genome SNP Format]]
 
* [[PHD]] (Output from the basecalling software Phred)
 
* [[PHD]] (Output from the basecalling software Phred)
 
* [[phyloXML]] (XML for evolutionary biology and comparative genomics)
 
* [[phyloXML]] (XML for evolutionary biology and comparative genomics)
 
* [[Pre-Clustering File Format]]
 
* [[Pre-Clustering File Format]]
 +
* [[Protein Data Bank]] (PDB; Structures of biomolecules deposited in Protein Data Bank)
 
* [[Protein InFormation Resource Format]]
 
* [[Protein InFormation Resource Format]]
 
* [[PRM]] (Protocol Representation Model (Medical Research))
 
* [[PRM]] (Protocol Representation Model (Medical Research))
Line 165: Line 172:
 
* [[MRC]] (voxels in cryo-electron microscopy)
 
* [[MRC]] (voxels in cryo-electron microscopy)
 
* [[MST]] ACD/ChemSketch v1 file format
 
* [[MST]] ACD/ChemSketch v1 file format
* [[PDB]] (Protein Data Bank)
+
* [[Protein Data Bank]] (PDB)
 
* [[RPT]] ACD/ChemSketch v1 file format
 
* [[RPT]] ACD/ChemSketch v1 file format
 
* [[RXN]] (Reaction file format)
 
* [[RXN]] (Reaction file format)
Line 171: Line 178:
 
* [[SKC]] (ISIS/Draw file format)
 
* [[SKC]] (ISIS/Draw file format)
 
* [[SMILES]] (Simplified molecular input line entry specification, .smi)
 
* [[SMILES]] (Simplified molecular input line entry specification, .smi)
* [[SPC]] (spectroscopic data)
+
* [[SPC (Spectroscopic Data)]]
 
* [[Structure Data File]] (SDF)
 
* [[Structure Data File]] (SDF)
 
* [[TGF]] (ISIS/Draw reaction file format)
 
* [[TGF]] (ISIS/Draw reaction file format)
Line 179: Line 186:
 
== Ecological ==
 
== Ecological ==
 
* [[Darwin Core]] (Standard for sharing information about biological diversity)
 
* [[Darwin Core]] (Standard for sharing information about biological diversity)
 +
* [[Electronic Data Deliverable]] (EDD; EPA Superfund)
 
* [[EML (Ecological Metadata Language)]], not to be confused with [[EML (Environmental Markup Language)]]
 
* [[EML (Ecological Metadata Language)]], not to be confused with [[EML (Environmental Markup Language)]]
  
Line 196: Line 204:
 
* [[SAIF]] (Spatial Archive and Interchange Format, Canadian)
 
* [[SAIF]] (Spatial Archive and Interchange Format, Canadian)
 
* [[SDTS]] (Spatial Data Transfer Standard)
 
* [[SDTS]] (Spatial Data Transfer Standard)
* [[shp and shx]] (ESRI [[Shapefile]] must have components; other optional components as well, see entry)
+
* [[Shapefile]] (ESRI, shp/shx)
 
* [[MrSID]] (MrSID- Multi-resolution Seamless Image Database)
 
* [[MrSID]] (MrSID- Multi-resolution Seamless Image Database)
 
* [[TAB]] (MapInfo dataset format, must have component)
 
* [[TAB]] (MapInfo dataset format, must have component)
  
 
== Mathematical ==
 
== Mathematical ==
 +
* [[AsciiMath]]
 +
* [[DOT (graph description language)]]
 +
* [[GEXF]] (Graph Exchange XML Format)
 
* [[graph6, sparse6]] (ASCII encoding of Adjacency matrices (.g6, .s6))
 
* [[graph6, sparse6]] (ASCII encoding of Adjacency matrices (.g6, .s6))
 
* [[graphML]] (Graph Markup Language)
 
* [[graphML]] (Graph Markup Language)
* [[m]] (MATLAB script file)
+
* [[MacWavelets]]
* [[M]] (Mathematica package file)
+
* Mathematica
* [[MAT]] (MATLAB matrix data format)
+
** [[Computable Document Format]] (.cdf)
 +
** [[Mathematica notebook]] (.nb, .nbp)
 +
** [[Mathematica package file]] (M)
 +
** [[Wolfram Language]]
 
* [[MathML]]
 
* [[MathML]]
 +
* MATLAB
 +
** [[MAT]] (MATLAB data format)
 +
** [[Matlab figure]]
 +
** [[MATLAB script file]] (m)
 
* [[OPJ]] (Origin data format)
 
* [[OPJ]] (Origin data format)
 
* [[Statistica]]
 
* [[Statistica]]
 
* [[WP2]] WinPlot
 
* [[WP2]] WinPlot
 +
 +
== Microscopy ==
 +
 +
* [[Amber ARR Bitmap Image]]
 +
* [[Aperio SVS]]
 +
* [[Bio]]
 +
* [[BioRad confocal image]]
 +
* [[DeltaVision]]
 +
* [[dm2]] (Gatan Digital Micrograph 2)
 +
* [[dm3]] (Gatan Digital Micrograph 3) ({{PRONOM|fmt/1131}})
 +
* [[GATAN]]
 +
* [[Image Cytometry Standard]] (ICS)
 +
* [[KONTRON]]
 +
* [[LIFF]] (Openlab Layered Image File Format)
 +
* [[LSM]] (Zeiss Light Speed Microscope)
 +
* [[MetaMorph Stack]] (.stk)
 +
* [[MRC]] (Medical Research Council)
 +
* [[OME-TIFF]] (Open Microscopy Imaging format)
 +
* [[OME-XML]] (Open Microscopy Imaging format)
 +
* [[SMV]]
 +
* [[VGS-8]]
 +
* [[Zeiss BIVAS]]
  
 
== Oceanographic, Atmospheric and Meteorological ==
 
== Oceanographic, Atmospheric and Meteorological ==
Line 216: Line 256:
 
* [[BUFR]] (Binary Universal Format Representation)
 
* [[BUFR]] (Binary Universal Format Representation)
 
* [[IOAPI]] (netCDF augmented with metadata from the I/O API)
 
* [[IOAPI]] (netCDF augmented with metadata from the I/O API)
 +
* [[Meteosat data]]
 
* [[PP]] (UK Met Office format for weather model data)
 
* [[PP]] (UK Met Office format for weather model data)
  
 
== Physics ==
 
== Physics ==
  
* [[CGNS]] (Computational Fluid Dynamics General Notation System)
+
See subcategory [[Physics data]]
* [[NeXuS]] (Common data format for neutron, x-ray and muon science)
+
* [[QCDml]] (Lattice QCD gauge configuration markup language)
+
  
 
== Scientific Signal data ==
 
== Scientific Signal data ==
Line 232: Line 271:
 
* [[EDF]] (European data format)
 
* [[EDF]] (European data format)
 
* [[FEF]] (File Exchange Format for Vital signs)
 
* [[FEF]] (File Exchange Format for Vital signs)
* [[GDF]] (General data formats for biomedical signals)
+
* [[General Data Format for Biosignals]] (GDF)
 
* [[GMS]] (Gesture And Motion Signal format)
 
* [[GMS]] (Gesture And Motion Signal format)
 
* [[IROCK]] (intelliRock Sensor Data File Format)
 
* [[IROCK]] (intelliRock Sensor Data File Format)
Line 256: Line 295:
 
* [[SPV]] (Output file for the [[SPSS]] Statistical package - version 17 and later)
 
* [[SPV]] (Output file for the [[SPSS]] Statistical package - version 17 and later)
 
* [[Transana]] ([[Computer-assisted qualitative data analysis]] package)
 
* [[Transana]] ([[Computer-assisted qualitative data analysis]] package)
 +
 +
== Miscellaneous ==
 +
 +
* [[AIML]] (Artificial Intelligence Markup Language)
 +
* [[Jupyter Notebook]] (.ipynb)
 +
* [[QuakeML]]
  
 
== Links ==
 
== Links ==
 
* [http://cameronneylon.net/blog/improving-on-access-to-research/ Improving on “Access to Research”]
 
* [http://cameronneylon.net/blog/improving-on-access-to-research/ Improving on “Access to Research”]

Revision as of 05:35, 15 May 2019

File Format
Name Scientific Data formats
Ontology

Mad scientist from 1940 movie

Mad scientist from 1940 movie

See also Health and Medicine for medical/biomedical data formats.

Contents

General

  • Common Data Format (CDF)
  • EAS3 (binary file format for structured data)
  • HDF (Hierarchical Data Format, originally from NCSA, now maintained by The HDF Group)
  • NRRD (Nearly Raw Raster Data -- a simple format for n-dimensional raster data)
  • NetCDF (Network Common Data Format)
  • ROOT (CERN data-analysis package and related formats, used in their Open Data initiative)
  • SDXF (Structured Data Exchange Format)
  • Silo (a storage format for visualization developed at Lawrence Livermore National Laboratory)
  • Simple Data format (SDF) By George H. Fisher, Space Sciences Lab, UC Berkeley (A platform-independent, precision-preserving binary data I/O format capable of handling large, multi-dimensional arrays)
  • Standard Delay Format (SDF) A standard data structure for timing data
  • XDF (eXtensible Data Format)
  • XSIL (Extensible Scientific Interchange Language)

Astronomical and Space

Biological

Chemical

  • CCP4 (X-ray crystallography voxels (electron density))
  • CDX (ChemDraw file format)
  • CDXML (ChemDraw file format)
  • CHM (ChemDraw file format)
  • CIF (Crystallographic Information File, standardised by IUCr)
  • CML (Chemical markup language)
  • CTab (Chemical table file .mol, .sd, .sdf)
  • HITRAN (spectroscopic data with one optical/infrared transition per line in the ASCII file (.hit))
  • JCAMP (Joint Committee on Atomic and Molecular Physical Data, .dx, .jdx)
  • MOL (MDL Molfile)
  • MOP (MOPAC format)
  • MRC (voxels in cryo-electron microscopy)
  • MST ACD/ChemSketch v1 file format
  • Protein Data Bank (PDB)
  • RPT ACD/ChemSketch v1 file format
  • RXN (Reaction file format)
  • SK2 (ACD/ChemSketch v2 file format)
  • SKC (ISIS/Draw file format)
  • SMILES (Simplified molecular input line entry specification, .smi)
  • SPC (Spectroscopic Data)
  • Structure Data File (SDF)
  • TGF (ISIS/Draw reaction file format)

Chemical data may be distinguished in various ways, including Chemical MIME types.

Ecological

Geographic and Geospatial

See also Geospatial

  • DEM (Digital Elevation Model)
  • DOQ (Digital Orthophotos)
  • e00 (ESRI ArcInfo Interchange File)
  • FGDC (Content Standard for Digital Geospatial Metadata??)
  • GeoTIFF (Geospatial extensions to TIFF)
  • GML (Geography Markup Language)
  • HDFEOS, HD2, HD4 (Hierarchical Data Format-Earth Observing System)
  • KML (KML (formerly Keyhole Markup Language), Version 2.2)
  • NDF (National Landsat Archive Production System (NLAPS) Data Format)
  • SAIF (Spatial Archive and Interchange Format, Canadian)
  • SDTS (Spatial Data Transfer Standard)
  • Shapefile (ESRI, shp/shx)
  • MrSID (MrSID- Multi-resolution Seamless Image Database)
  • TAB (MapInfo dataset format, must have component)

Mathematical

Microscopy

Oceanographic, Atmospheric and Meteorological

  • GRIB (Gridded Binary)
  • BUFR (Binary Universal Format Representation)
  • IOAPI (netCDF augmented with metadata from the I/O API)
  • Meteosat data
  • PP (UK Met Office format for weather model data)

Physics

See subcategory Physics data

Scientific Signal data

  • ACQ (AcqKnowledge File Format for Windows)
  • BioSemi (BDF) data format
  • BKR (EEG data format)
  • CFWB (Chart Data File Format)
  • EDF (European data format)
  • FEF (File Exchange Format for Vital signs)
  • General Data Format for Biosignals (GDF)
  • GMS (Gesture And Motion Signal format)
  • IROCK (intelliRock Sensor Data File Format)
  • MFER (Medical waveform Format Encoding Rules)
  • REC (ATI Vision recorder file)
  • SCP-ECG (Standard Communication Protocol for Computer assisted electrocardiography)
  • SEG Y (Reflection seismology data format)
  • SIGIF (SIGnal Interchange Format)

Social Sciences

Miscellaneous

Links

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox