Scientific Data formats
From Just Solve the File Format Problem
(Difference between revisions)
(Added an "Earth Sciences" section consisting mostly of redlinks (cannibalizing from the signal data section in the process) - I intend to fill most of these out when I have more time) |
(→Biological) |
||
(56 intermediate revisions by 5 users not shown) | |||
Line 6: | Line 6: | ||
}} | }} | ||
− | See also [[Health and Medicine]] for medical/biomedical data formats. | + | See also [[Health and Medicine]] for medical/biomedical data formats, and also see [[Engineering]]. |
== General == | == General == | ||
Line 14: | Line 14: | ||
** [[HDF4]] | ** [[HDF4]] | ||
** [[HDF5]] | ** [[HDF5]] | ||
+ | * [[IGOR]] (.ibw) | ||
* [[NRRD]] (Nearly Raw Raster Data -- a simple format for n-dimensional raster data) | * [[NRRD]] (Nearly Raw Raster Data -- a simple format for n-dimensional raster data) | ||
* [[NetCDF]] (Network Common Data Format) | * [[NetCDF]] (Network Common Data Format) | ||
Line 21: | Line 22: | ||
* [[Simple Data format]] (SDF) By George H. Fisher, Space Sciences Lab, UC Berkeley (A platform-independent, precision-preserving binary data I/O format capable of handling large, multi-dimensional arrays) | * [[Simple Data format]] (SDF) By George H. Fisher, Space Sciences Lab, UC Berkeley (A platform-independent, precision-preserving binary data I/O format capable of handling large, multi-dimensional arrays) | ||
* [[Standard Delay Format]] (SDF) A standard data structure for timing data | * [[Standard Delay Format]] (SDF) A standard data structure for timing data | ||
− | * [[XDF | + | * [[XDF (Extensible Data Format)]] [https://en.wikipedia.org/wiki/Extensible_Data_Format] |
* [[XSIL]] (Extensible Scientific Interchange Language) | * [[XSIL]] (Extensible Scientific Interchange Language) | ||
== Astronomical and Space == | == Astronomical and Space == | ||
* [[Advanced Scientific Data Format]] | * [[Advanced Scientific Data Format]] | ||
+ | * [[CPA (PRISM)]] | ||
* [[Flexible Image Transport System]] (FITS) | * [[Flexible Image Transport System]] (FITS) | ||
** [[PSRFITS]] (Pulsar data storage standard) | ** [[PSRFITS]] (Pulsar data storage standard) | ||
Line 49: | Line 51: | ||
* [[ACE (Sequence assembly)|ACE]] (Sequence assembly format) | * [[ACE (Sequence assembly)|ACE]] (Sequence assembly format) | ||
* [[Affymetrix Raw Intensity Format]] | * [[Affymetrix Raw Intensity Format]] | ||
+ | * [[AnnData Object]] (.h5ad) | ||
* [[ARF (Axon Raw Format)]] | * [[ARF (Axon Raw Format)]] | ||
* [[ARLEQUIN Project Format]] | * [[ARLEQUIN Project Format]] | ||
* [[Axt Alignment Format]] | * [[Axt Alignment Format]] | ||
− | * [[BAM]] (Binary compressed SAM format) | + | * [[BAM (Binary Alignment Map)|BAM]] (Binary compressed SAM format) |
* [[BED]] (Browser extensible display format describing genes and other features of DNA sequences) | * [[BED]] (Browser extensible display format describing genes and other features of DNA sequences) | ||
* [[BEDgraph]] | * [[BEDgraph]] | ||
Line 64: | Line 67: | ||
* [[BRIX generated O Format]] | * [[BRIX generated O Format]] | ||
* [[CAF (Common Assembly Format)|CAF]] (Common Assembly Format for sequence assembly) | * [[CAF (Common Assembly Format)|CAF]] (Common Assembly Format for sequence assembly) | ||
+ | * [[CASTEP]] | ||
* [[CellML]] | * [[CellML]] | ||
* [[CHADO XML interchange Format]] | * [[CHADO XML interchange Format]] | ||
Line 72: | Line 76: | ||
* [[Clustered Data Table Format]] | * [[Clustered Data Table Format]] | ||
* [[Complete Genomics]] | * [[Complete Genomics]] | ||
+ | * [[CRAM]] | ||
* [[DELTA]] (DEscription Language for TAxonomy) | * [[DELTA]] (DEscription Language for TAxonomy) | ||
* [[DAS]] (Distributed Sequence Annotation System) | * [[DAS]] (Distributed Sequence Annotation System) | ||
Line 79: | Line 84: | ||
* [[ENCODE]] (Peak information Format) | * [[ENCODE]] (Peak information Format) | ||
* [[FASTA and FASTQ]] (File format for sequence data, FASTQ with quality) | * [[FASTA and FASTQ]] (File format for sequence data, FASTQ with quality) | ||
+ | * [[FAST5]] (.fast5) | ||
* [[FuGEFlow]] | * [[FuGEFlow]] | ||
* [[FuGE-ML]] (Functional Genomics Experiment Markup Language) | * [[FuGE-ML]] (Functional Genomics Experiment Markup Language) | ||
Line 86: | Line 92: | ||
* [[GenBank]] (Flatfile format used by NCBI for nucleotide and peptide sequences) | * [[GenBank]] (Flatfile format used by NCBI for nucleotide and peptide sequences) | ||
* [[Gene Feature File]] (Versions 1 and 3) | * [[Gene Feature File]] (Versions 1 and 3) | ||
− | |||
* [[Gene Prediction File Format]] | * [[Gene Prediction File Format]] | ||
* [[GenePattern GeneSet Table Format]] | * [[GenePattern GeneSet Table Format]] | ||
* [[Genome Annotation File]] (version 1 and 2) | * [[Genome Annotation File]] (version 1 and 2) | ||
+ | * [[Genozip]] | ||
+ | * [[GFF]] (General feature format for describing genes and other features of DNA, RNA and protein sequences) | ||
* [[GTF]] (Gene transfer format holds information about gene structure) | * [[GTF]] (Gene transfer format holds information about gene structure) | ||
* [[HMMER]] | * [[HMMER]] | ||
* [[ICB]] (ICM binary file Format) | * [[ICB]] (ICM binary file Format) | ||
+ | * [[Image Cytometry Experiment]] (ICE) | ||
+ | * [[Image Cytometry Standard]] (ICS) | ||
* [[imzML]] (imaging mz Markup Language) | * [[imzML]] (imaging mz Markup Language) | ||
* [[ISA-Tab]] (Investigation Study Assay Tabular) | * [[ISA-Tab]] (Investigation Study Assay Tabular) | ||
Line 140: | Line 149: | ||
* [[SDD]] (Structured Descriptive Data) | * [[SDD]] (Structured Descriptive Data) | ||
* [[SED-ML]] (Simulation Experiment Description Markup Language) | * [[SED-ML]] (Simulation Experiment Description Markup Language) | ||
− | |||
* [[SOFT]] (Simple Omnibus Format in Text) | * [[SOFT]] (Simple Omnibus Format in Text) | ||
* [[spML]] (Separation Markup Language) | * [[spML]] (Separation Markup Language) | ||
Line 160: | Line 168: | ||
== Chemical == | == Chemical == | ||
* [[CCP4]] (X-ray crystallography voxels (electron density)) | * [[CCP4]] (X-ray crystallography voxels (electron density)) | ||
− | * [[CDX]] (ChemDraw file format) | + | * [[CDX (ChemDraw Exchange)|CDX]] (ChemDraw file format) |
* [[CDXML]] (ChemDraw file format) | * [[CDXML]] (ChemDraw file format) | ||
* [[CHM (ChemDraw)|CHM]] (ChemDraw file format) | * [[CHM (ChemDraw)|CHM]] (ChemDraw file format) | ||
Line 173: | Line 181: | ||
* [[MST]] ACD/ChemSketch v1 file format | * [[MST]] ACD/ChemSketch v1 file format | ||
* [[Protein Data Bank]] (PDB) | * [[Protein Data Bank]] (PDB) | ||
− | * [[RPT]] | + | * [[RPT (OpenLynx)]] Waters OpenLynx reports |
* [[RXN]] (Reaction file format) | * [[RXN]] (Reaction file format) | ||
* [[SK2]] (ACD/ChemSketch v2 file format) | * [[SK2]] (ACD/ChemSketch v2 file format) | ||
Line 181: | Line 189: | ||
* [[Structure Data File]] (SDF) | * [[Structure Data File]] (SDF) | ||
* [[TGF]] (ISIS/Draw reaction file format) | * [[TGF]] (ISIS/Draw reaction file format) | ||
+ | * [[XYZ Chem]] [https://en.wikipedia.org/wiki/XYZ_file_format Wiki] | ||
Chemical data may be distinguished in various ways, including [http://www.ch.ic.ac.uk/chemime/ Chemical MIME] types. | Chemical data may be distinguished in various ways, including [http://www.ch.ic.ac.uk/chemime/ Chemical MIME] types. | ||
Line 189: | Line 198: | ||
* [[QuakeML]] | * [[QuakeML]] | ||
* [[SEED]] | * [[SEED]] | ||
+ | * [[SEG-D]] (formats, mostly tape based, for seismic data) | ||
* [[SEG Y]] (Reflection seismology data format) | * [[SEG Y]] (Reflection seismology data format) | ||
+ | * [[SEIS-PROV]] | ||
+ | * [[StationXML]] | ||
== Ecological == | == Ecological == | ||
Line 195: | Line 207: | ||
* [[Electronic Data Deliverable]] (EDD; EPA Superfund) | * [[Electronic Data Deliverable]] (EDD; EPA Superfund) | ||
* [[EML (Ecological Metadata Language)]], not to be confused with [[EML (Environmental Markup Language)]] | * [[EML (Ecological Metadata Language)]], not to be confused with [[EML (Environmental Markup Language)]] | ||
+ | |||
+ | == Environmental == | ||
+ | * [[HYT]] (AquiferTest) | ||
== Geographic and Geospatial == | == Geographic and Geospatial == | ||
Line 206: | Line 221: | ||
* [[GeoTIFF]] (Geospatial extensions to TIFF) | * [[GeoTIFF]] (Geospatial extensions to TIFF) | ||
* [[GML]] (Geography Markup Language) | * [[GML]] (Geography Markup Language) | ||
− | * [[ | + | * [[HDF-EOS]] (Hierarchical Data Format-Earth Observing System)[https://hdfeos.org/ 1] (HD2, HD4, HD5) |
* [[KML]] (KML (formerly Keyhole Markup Language), Version 2.2) | * [[KML]] (KML (formerly Keyhole Markup Language), Version 2.2) | ||
* [[NDF]] (National Landsat Archive Production System (NLAPS) Data Format) | * [[NDF]] (National Landsat Archive Production System (NLAPS) Data Format) | ||
Line 214: | Line 229: | ||
* [[MrSID]] (MrSID- Multi-resolution Seamless Image Database) | * [[MrSID]] (MrSID- Multi-resolution Seamless Image Database) | ||
* [[TAB]] (MapInfo dataset format, must have component) | * [[TAB]] (MapInfo dataset format, must have component) | ||
+ | * [[Bathymetric Attributed Grid]] (.bag) | ||
== Mathematical == | == Mathematical == | ||
Line 221: | Line 237: | ||
* [[graph6, sparse6]] (ASCII encoding of Adjacency matrices (.g6, .s6)) | * [[graph6, sparse6]] (ASCII encoding of Adjacency matrices (.g6, .s6)) | ||
* [[graphML]] (Graph Markup Language) | * [[graphML]] (Graph Markup Language) | ||
+ | * GraphPad Prism | ||
+ | ** [[PZM]] | ||
+ | ** [[PZF]] | ||
+ | ** [[PZFX]] | ||
+ | ** [[PRISM]] | ||
+ | * [[JMP]] (.jmp) | ||
+ | * [[KaleidaGraph]] (.qda, .qdc) | ||
+ | * [[Life 1.05]] | ||
+ | * [[Life 1.06]] | ||
* [[MacWavelets]] | * [[MacWavelets]] | ||
* Mathematica | * Mathematica | ||
Line 227: | Line 252: | ||
** [[Mathematica package file]] (M) | ** [[Mathematica package file]] (M) | ||
** [[Wolfram Language]] | ** [[Wolfram Language]] | ||
+ | * [[Macrocell]] | ||
+ | * [[MCell]] | ||
* [[MathML]] | * [[MathML]] | ||
* MATLAB | * MATLAB | ||
Line 232: | Line 259: | ||
** [[Matlab figure]] | ** [[Matlab figure]] | ||
** [[MATLAB script file]] (m) | ** [[MATLAB script file]] (m) | ||
+ | ** [[Matlab Model]] (.mdl, .slx) | ||
+ | * [[Minitab]] (.mtw, .mpj) | ||
+ | * [[NPY and NPZ (NumPy)]] | ||
* [[OPJ]] (Origin data format) | * [[OPJ]] (Origin data format) | ||
+ | * [[PDL]] (Perl Data Language) | ||
+ | * [[Plaintext (cellular automata)]] | ||
+ | * [[RLE (cellular automata)]] | ||
+ | * [[Rule (Golly)]] | ||
+ | * [[Small Object Format]] | ||
* [[Statistica]] | * [[Statistica]] | ||
+ | ** [[CSS Software]] (Complete Statistical System) | ||
+ | ** [[CSS STATISTICA]] | ||
* [[WP2]] WinPlot | * [[WP2]] WinPlot | ||
Line 243: | Line 280: | ||
* [[BioRad confocal image]] | * [[BioRad confocal image]] | ||
* [[DeltaVision]] | * [[DeltaVision]] | ||
− | * [[ | + | * [[DM2]] (Gatan Digital Micrograph 2) |
− | * [[ | + | * [[DM3]] (Gatan Digital Micrograph 3) |
+ | * [[DM4]] (Gatan Digital Micrograph 4) | ||
* [[GATAN]] | * [[GATAN]] | ||
+ | * [[HMSA]] (.msa) | ||
+ | * [[Image Cytometry Experiment]] (ICE) | ||
* [[Image Cytometry Standard]] (ICS) | * [[Image Cytometry Standard]] (ICS) | ||
* [[KONTRON]] | * [[KONTRON]] | ||
Line 257: | Line 297: | ||
* [[VGS-8]] | * [[VGS-8]] | ||
* [[Zeiss BIVAS]] | * [[Zeiss BIVAS]] | ||
+ | |||
+ | == Neutron and X-ray Scattering == | ||
+ | |||
+ | * [[canSAS]] (tools for small-angle scattering) | ||
+ | * [[CIF]] (Crystallographic Information File, standardised by IUCr) | ||
+ | * [[NeXus]] (NeXus is a common data format for neutron, x-ray, and muon science) | ||
== Oceanographic, Atmospheric and Meteorological == | == Oceanographic, Atmospheric and Meteorological == | ||
Line 289: | Line 335: | ||
* [[Atlas.ti]] ([[Computer-assisted qualitative data analysis]] package) | * [[Atlas.ti]] ([[Computer-assisted qualitative data analysis]] package) | ||
− | * [[DDI]] (Data Documentation Initiative) | + | * [[DDI (Data Documentation Initiative)|DDI]] (Data Documentation Initiative) |
* [[DO]] ("DO file" command script for the [[Stata]] Statistical package) | * [[DO]] ("DO file" command script for the [[Stata]] Statistical package) | ||
* [[DTA]] (Binary data file for the [[Stata]] Statistical package) | * [[DTA]] (Binary data file for the [[Stata]] Statistical package) | ||
+ | * [[Linguistic Annotation Framework]] (LAF; used by computational linguists to annotate language samples) | ||
* [[M2k]] (MAXQDA) | * [[M2k]] (MAXQDA) | ||
* [[NVivo]] ([[Computer-assisted qualitative data analysis]] package) | * [[NVivo]] ([[Computer-assisted qualitative data analysis]] package) | ||
* [[R]] (Statistical package) | * [[R]] (Statistical package) | ||
* [[SAS]] (Statistical package) | * [[SAS]] (Statistical package) | ||
+ | ** [[SAS Transport File]] (.xpt) | ||
* [[SAV]] (Binary "[[SPSS]] data format" for the [[SPSS]] Statistical package) | * [[SAV]] (Binary "[[SPSS]] data format" for the [[SPSS]] Statistical package) | ||
* [[SPO]] (Output file for the [[SPSS]] Statistical package - version 14) | * [[SPO]] (Output file for the [[SPSS]] Statistical package - version 14) | ||
* [[SPS]] ("Syntax file" (plain text command script) for the [[SPSS]] Statistical package) | * [[SPS]] ("Syntax file" (plain text command script) for the [[SPSS]] Statistical package) | ||
* [[SPV]] (Output file for the [[SPSS]] Statistical package - version 17 and later) | * [[SPV]] (Output file for the [[SPSS]] Statistical package - version 17 and later) | ||
+ | * [[Statistix]] (.sx) | ||
* [[Transana]] ([[Computer-assisted qualitative data analysis]] package) | * [[Transana]] ([[Computer-assisted qualitative data analysis]] package) | ||
+ | |||
+ | == Spectra == | ||
+ | * [[Bruker]] (XRF software, .pdz) | ||
+ | * [[Niton]] (XRF software, .ndt) | ||
+ | * [[EDAX Spectrum]] (.spc) | ||
+ | * [[Thermo Scientific SPC]] (.spc) | ||
+ | * [[EMSA/MAS]] | ||
+ | * [[HMSA Hyper-Dimensional Data]] | ||
== Miscellaneous == | == Miscellaneous == | ||
* [[AIML]] (Artificial Intelligence Markup Language) | * [[AIML]] (Artificial Intelligence Markup Language) | ||
+ | * [[EMD-DF64]] (used for high frequency energy monitoring) | ||
+ | * [[IES]] (IESNA LM-63 Photometric Data File) | ||
* [[Jupyter Notebook]] (.ipynb) | * [[Jupyter Notebook]] (.ipynb) | ||
== Links == | == Links == | ||
* [http://cameronneylon.net/blog/improving-on-access-to-research/ Improving on “Access to Research”] | * [http://cameronneylon.net/blog/improving-on-access-to-research/ Improving on “Access to Research”] | ||
+ | * [[WikiBooks:Software Tools For Molecular Microscopy]] |
Latest revision as of 14:20, 18 December 2024
See also Health and Medicine for medical/biomedical data formats, and also see Engineering.
[edit] General
- Common Data Format (CDF)
- EAS3 (binary file format for structured data)
- HDF (Hierarchical Data Format, originally from NCSA, now maintained by The HDF Group)
- IGOR (.ibw)
- NRRD (Nearly Raw Raster Data -- a simple format for n-dimensional raster data)
- NetCDF (Network Common Data Format)
- ROOT (CERN data-analysis package and related formats, used in their Open Data initiative)
- SDXF (Structured Data Exchange Format)
- Silo (a storage format for visualization developed at Lawrence Livermore National Laboratory)
- Simple Data format (SDF) By George H. Fisher, Space Sciences Lab, UC Berkeley (A platform-independent, precision-preserving binary data I/O format capable of handling large, multi-dimensional arrays)
- Standard Delay Format (SDF) A standard data structure for timing data
- XDF (Extensible Data Format) [1]
- XSIL (Extensible Scientific Interchange Language)
[edit] Astronomical and Space
- Advanced Scientific Data Format
- CPA (PRISM)
- Flexible Image Transport System (FITS)
- PSRFITS (Pulsar data storage standard)
- ICER
- NASA Raster Metafile
- ODL (NASA Object Description Language)
- PDS (Planetary Data System)
- PDS4
- VOTable (IVOA standard table format)
- SBIG CCDOPS image
- Standard Archive Format (used for USAF missile data)
- SDF (Starlink Data Format) and NDF (Starlink's Extensible N-Dimensional Data Format).
- VICAR
- WinMiPS
[edit] Biological
- 23andMe
- AB1 (Chromatogram files used by DNA sequencing instruments from Applied Biosystems)
- ABCD (Access to Biological Collection Data)
- ABCDDNA (Access to Biological Collection Data DNA extension)
- ABCDEFG (Access to Biological Collection Data Extension For Geosciences)
- ACE (Sequence assembly format)
- Affymetrix Raw Intensity Format
- AnnData Object (.h5ad)
- ARF (Axon Raw Format)
- ARLEQUIN Project Format
- Axt Alignment Format
- BAM (Binary compressed SAM format)
- BED (Browser extensible display format describing genes and other features of DNA sequences)
- BEDgraph
- Big Browser Extensible Data Format
- Big Wiggle Format
- Binary Alignement Map Format
- Binary Probe Map Format
- Binary sequence information Format
- Biological Pathway eXchange
- BLAT alignment Format
- BRIX generated O Format
- CAF (Common Assembly Format for sequence assembly)
- CASTEP
- CellML
- CHADO XML interchange Format
- Chain Format for pairwise alignment
- CHARMM Card File Format
- CLUSTAL-W Alignment Format
- CLUSTAL-W Dendrogram Guide File Format
- Clustered Data Table Format
- Complete Genomics
- CRAM
- DELTA (DEscription Language for TAxonomy)
- DAS (Distributed Sequence Annotation System)
- DBN (Dot Bracket Notation (DBN) - Vienna Format)
- EMBL (Flatfile format used by the EMBL for nucleotide and peptide sequences)
- EML (Environmental Markup Language) not to be confused with EML (Ecological Metadata Language)
- ENCODE (Peak information Format)
- FASTA and FASTQ (File format for sequence data, FASTQ with quality)
- FAST5 (.fast5)
- FuGEFlow
- FuGE-ML (Functional Genomics Experiment Markup Language)
- Gating-ML
- GCDML (Genomic Contextual Data Markup Language)
- GelML Gel electrophoresis Markup Language
- GenBank (Flatfile format used by NCBI for nucleotide and peptide sequences)
- Gene Feature File (Versions 1 and 3)
- Gene Prediction File Format
- GenePattern GeneSet Table Format
- Genome Annotation File (version 1 and 2)
- Genozip
- GFF (General feature format for describing genes and other features of DNA, RNA and protein sequences)
- GTF (Gene transfer format holds information about gene structure)
- HMMER
- ICB (ICM binary file Format)
- Image Cytometry Experiment (ICE)
- Image Cytometry Standard (ICS)
- imzML (imaging mz Markup Language)
- ISA-Tab (Investigation Study Assay Tabular)
- ISND sequence record XML
- KGML (KEGG Mark-up Language)
- MAGE-Tab (MicroArray Gene Expression Tabular)
- MCL (Microbiological Common Language)
- MIARE-TAB (Minimum Information About a RNAi Experiment Tabular)
- microarray track data Browser Extensible Data Format
- MINiML (MIAME Notation in Markup Language)
- mini Protein Data Bank Format
- MIQAS-TAB (Minimal Information for QTLs and Association Studies Tabular)
- MITAB
- mmCIF (macromolecular Crystallographic Information File)
- Multiple Alignment Forma
- mzData (deprecated)
- mzIdentML
- mzML
- mzQuantML
- mzXML (deprecated)
- NCD (Natural Collections Descriptions)
- NDTF (Neurophysiology Data Translation Format)
- net alignment annotation Format
- NeuroML (Neuroscience eXtensible Markup Language)
- New Hampshire eXtended Format
- Newick tree Format
- NEXUS (Encodes mixed information about genetic sequence data in a block structured format)
- Nimblegen Design File Format
- Nimblegen Gene Data Format
- NMR-STAR (NMR Self-defining Text Archive and Retrieval format)
- nucleotide inFormation binary Format
- ODM (Operational Data Model)
- Open Biomedical Ontology Flat File Format
- Personal Genome SNP Format
- PHD (Output from the basecalling software Phred)
- phyloXML (XML for evolutionary biology and comparative genomics)
- Pre-Clustering File Format
- Protein Data Bank (PDB; Structures of biomolecules deposited in Protein Data Bank)
- Protein InFormation Resource Format
- PRM (Protocol Representation Model (Medical Research))
- PSI-MI XML
- PSI-PAR
- RDML (Real-time PCR Data Markup Language)
- SAM (Sequence Alignment/Map format)
- SCF (Staden chromatogram files used to store data from DNA sequencing)
- SBML (Systems Biology Markup Language used to store biochemical network computational models)
- SDD (Structured Descriptive Data)
- SED-ML (Simulation Experiment Description Markup Language)
- SOFT (Simple Omnibus Format in Text)
- spML (Separation Markup Language)
- SRA-XML (Short Read Archive eXtensible Markup Language)
- Standard Flowgram Format
- Stockholm Multiple Alignment Format (Representing multiple sequence alignments)
- SBML (System Biology Markup Language)
- SBGN (Systems Biology Graphical Notation)
- SBRML (Systems Biology Results Markup Language)
- Swiss-Prot (Flatfile format used for protein sequences from the Swiss-Prot database)
- TAIR annotation data Format
- TAPIR (TDWG Access Protocol for Information Retrieval)
- TCS (Taxonomic Concept transfer Schema)
- TraML (Transition Markup Language)
- UniProtKB XML Format
- VCF (Variant Call Format)
- Wiggle Format
[edit] Chemical
- CCP4 (X-ray crystallography voxels (electron density))
- CDX (ChemDraw file format)
- CDXML (ChemDraw file format)
- CHM (ChemDraw file format)
- CIF (Crystallographic Information File, standardised by IUCr)
- CML (Chemical markup language)
- CTab (Chemical table file .mol, .sd, .sdf)
- HITRAN (spectroscopic data with one optical/infrared transition per line in the ASCII file (.hit))
- JCAMP (Joint Committee on Atomic and Molecular Physical Data, .dx, .jdx)
- MOL (MDL Molfile)
- MOP (MOPAC format)
- MRC (voxels in cryo-electron microscopy)
- MST ACD/ChemSketch v1 file format
- Protein Data Bank (PDB)
- RPT (OpenLynx) Waters OpenLynx reports
- RXN (Reaction file format)
- SK2 (ACD/ChemSketch v2 file format)
- SKC (ISIS/Draw file format)
- SMILES (Simplified molecular input line entry specification, .smi)
- SPC (Spectroscopic Data)
- Structure Data File (SDF)
- TGF (ISIS/Draw reaction file format)
- XYZ Chem Wiki
Chemical data may be distinguished in various ways, including Chemical MIME types.
[edit] Earth Sciences
- Adaptable Seismic Data Format
- Network-Day Tape
- QuakeML
- SEED
- SEG-D (formats, mostly tape based, for seismic data)
- SEG Y (Reflection seismology data format)
- SEIS-PROV
- StationXML
[edit] Ecological
- Darwin Core (Standard for sharing information about biological diversity)
- Electronic Data Deliverable (EDD; EPA Superfund)
- EML (Ecological Metadata Language), not to be confused with EML (Environmental Markup Language)
[edit] Environmental
- HYT (AquiferTest)
[edit] Geographic and Geospatial
See also Geospatial
- DEM (Digital Elevation Model)
- DOQ (Digital Orthophotos)
- e00 (ESRI ArcInfo Interchange File)
- FGDC (Content Standard for Digital Geospatial Metadata??)
- GeoTIFF (Geospatial extensions to TIFF)
- GML (Geography Markup Language)
- HDF-EOS (Hierarchical Data Format-Earth Observing System)1 (HD2, HD4, HD5)
- KML (KML (formerly Keyhole Markup Language), Version 2.2)
- NDF (National Landsat Archive Production System (NLAPS) Data Format)
- SAIF (Spatial Archive and Interchange Format, Canadian)
- SDTS (Spatial Data Transfer Standard)
- Shapefile (ESRI, shp/shx)
- MrSID (MrSID- Multi-resolution Seamless Image Database)
- TAB (MapInfo dataset format, must have component)
- Bathymetric Attributed Grid (.bag)
[edit] Mathematical
- AsciiMath
- DOT (graph description language)
- GEXF (Graph Exchange XML Format)
- graph6, sparse6 (ASCII encoding of Adjacency matrices (.g6, .s6))
- graphML (Graph Markup Language)
- GraphPad Prism
- JMP (.jmp)
- KaleidaGraph (.qda, .qdc)
- Life 1.05
- Life 1.06
- MacWavelets
- Mathematica
- Computable Document Format (.cdf)
- Mathematica notebook (.nb, .nbp)
- Mathematica package file (M)
- Wolfram Language
- Macrocell
- MCell
- MathML
- MATLAB
- MAT (MATLAB data format)
- Matlab figure
- MATLAB script file (m)
- Matlab Model (.mdl, .slx)
- Minitab (.mtw, .mpj)
- NPY and NPZ (NumPy)
- OPJ (Origin data format)
- PDL (Perl Data Language)
- Plaintext (cellular automata)
- RLE (cellular automata)
- Rule (Golly)
- Small Object Format
- Statistica
- CSS Software (Complete Statistical System)
- CSS STATISTICA
- WP2 WinPlot
[edit] Microscopy
- Amber ARR Bitmap Image
- Aperio SVS
- Bio
- BioRad confocal image
- DeltaVision
- DM2 (Gatan Digital Micrograph 2)
- DM3 (Gatan Digital Micrograph 3)
- DM4 (Gatan Digital Micrograph 4)
- GATAN
- HMSA (.msa)
- Image Cytometry Experiment (ICE)
- Image Cytometry Standard (ICS)
- KONTRON
- LIFF (Openlab Layered Image File Format)
- LSM (Zeiss Light Speed Microscope)
- MetaMorph Stack (.stk)
- MRC (Medical Research Council)
- OME-TIFF (Open Microscopy Imaging format)
- OME-XML (Open Microscopy Imaging format)
- SMV
- VGS-8
- Zeiss BIVAS
[edit] Neutron and X-ray Scattering
- canSAS (tools for small-angle scattering)
- CIF (Crystallographic Information File, standardised by IUCr)
- NeXus (NeXus is a common data format for neutron, x-ray, and muon science)
[edit] Oceanographic, Atmospheric and Meteorological
- GRIB (Gridded Binary)
- BUFR (Binary Universal Format Representation)
- IOAPI (netCDF augmented with metadata from the I/O API)
- Meteosat data
- PP (UK Met Office format for weather model data)
[edit] Physics
See subcategory Physics data
[edit] Scientific Signal data
- ACQ (AcqKnowledge File Format for Windows)
- BioSemi (BDF) data format
- BKR (EEG data format)
- CFWB (Chart Data File Format)
- EDF (European data format)
- FEF (File Exchange Format for Vital signs)
- General Data Format for Biosignals (GDF)
- GMS (Gesture And Motion Signal format)
- IROCK (intelliRock Sensor Data File Format)
- MFER (Medical waveform Format Encoding Rules)
- REC (ATI Vision recorder file)
- SCP-ECG (Standard Communication Protocol for Computer assisted electrocardiography)
- SIGIF (SIGnal Interchange Format)
[edit] Social Sciences
- Atlas.ti (Computer-assisted qualitative data analysis package)
- DDI (Data Documentation Initiative)
- DO ("DO file" command script for the Stata Statistical package)
- DTA (Binary data file for the Stata Statistical package)
- Linguistic Annotation Framework (LAF; used by computational linguists to annotate language samples)
- M2k (MAXQDA)
- NVivo (Computer-assisted qualitative data analysis package)
- R (Statistical package)
- SAS (Statistical package)
- SAS Transport File (.xpt)
- SAV (Binary "SPSS data format" for the SPSS Statistical package)
- SPO (Output file for the SPSS Statistical package - version 14)
- SPS ("Syntax file" (plain text command script) for the SPSS Statistical package)
- SPV (Output file for the SPSS Statistical package - version 17 and later)
- Statistix (.sx)
- Transana (Computer-assisted qualitative data analysis package)
[edit] Spectra
- Bruker (XRF software, .pdz)
- Niton (XRF software, .ndt)
- EDAX Spectrum (.spc)
- Thermo Scientific SPC (.spc)
- EMSA/MAS
- HMSA Hyper-Dimensional Data
[edit] Miscellaneous
- AIML (Artificial Intelligence Markup Language)
- EMD-DF64 (used for high frequency energy monitoring)
- IES (IESNA LM-63 Photometric Data File)
- Jupyter Notebook (.ipynb)