DICOM

DICOM (Digital Imaging and Communications in Medicine) is far and away the most widely-used (and probably the oldest) electronic file format in medical imaging. Nearly every device that acquires medical images – ultrasound, CT, PET, and MRI – acquires DICOM images in normal operation. There's a 20-part specification detailing the file format and its ecosystem. The IANA has assigned TCP and UDP port 104 to DICOM-related traffic.

It's kind of a big deal.

However, as with any sufficiently-adopted standard, there are splinter factions. The most common format is 2-dimensional images or "slices" that can be formed into a 3-dimensional image; however, some manufacturers have extended the standard to save 3 or even 4-dimensional images in a "mosaic" format.

The earliest versions of the standard were known as ACR/NEMA, after the American College of Radiology and National Electrical Manufacturers Association.

Format
While there are many complications involved in decoding a DICOM file, fundamentally it is simply a sequence of data blocks called attributes or elements. Each attribute contains a 16-bit group number and a 16-bit element number, conventionally written in hexadecimal and separated with a comma, e.g. "(0028,0011)".

Standard attributes
If an attribute's group number is even, then it is a standard attribute defined in the DICOM specification, and the group and element number together uniquely identify the meaning of the attribute.

Private attributes
If the group number is odd, then it is a private attribute, and it will have been preceded by a special attribute supplying a "private creator" identification string. A private attribute is uniquely identified by the combination of its creator identifier, group number, and the low byte of its element number.

Some examples of creator identifiers are  and. An identifier is usually specific to a manufacturer of medical equipment, not to a particular medical device. Unfortunately, instead of having one specification per manufacturer, private attributes are usually only documented in device-specific "DICOM Conformance Statements", which list only the attributes used by that one device.

Examples of DICOM Conformance Statements (search the documents for "private creator"):
 * GE Healthcare CT DICOM Conformance Statements
 * Philips MRI DICOM Conformance Statements

Compilations:
 * http://svn.sourceforge.jp/svnroot/pgctn/pgctn/trunk/main_tree/dicomviewer/univiewer/dcmdict.txt

Little-endian vs. big-endian
A DICOM file may use either little-endian or big-endian byte order for certain representations of numbers. Little-endian is more common.

Explicit VR vs. Implicit VR
A DICOM file may use either Explicit VR or Implicit VR format. VR stands for Value Representation.

Explicit VR means that each attribute has its data type stored in the file.

Implicit VR means that the attribute types are not stored in the file. The decoder will have to use a data dictionary of its own to figure them out.

With header vs. Without header
When stored on disk, DICOM files are supposed to begin with a header, though not all of them do. Files with a header are sometimes called Part 10 files.

When a header is present, the file begins with a 128-byte preamble that is usually set to all zero bytes, but which may be used for application-specific purposes. The next 4 bytes are the ASCII signature "DICM". Following the signature is a set of "Group 2" attributes, in little-endian, explicit-VR format. After the Group 2 attributes is the main part of the file, using the format given by the Transfer Syntax UID (0002,0010) attribute. ("Transfer Syntax" is the DICOM term for "file format".)

Files without a header usually use Implicit VR, little-endian format.

Modality
One of the most important attributes in a DICOM file is Modality (0008,0060). It indicates the type of data stored in the file, and often corresponds to the type of machine that created the file. For example, a modality of "MR" means MRI, and "US" means ultrasound. Different modalities have different required attributes, and may have different conventions for how to display images contained in the file, etc.

Identifiers
The most common filename extension is .dcm. Not all DICOM files have a filename extension.

Identification
DICOM files with a header have the ASCII signature " " at byte offset 128.

Files without a header cannot be readily identified, though many begin with bytes.

Image formats
If a DICOM file contains image data, it contains either a single image, or a video clip (usually composed of multiple still images all having the same size and color format). There is an extension called Papyrus that can store multiple different images in a single file.

The image format is determined by attribute (0002,0010): Transfer Syntax UID. If there is no such attribute, the image is uncompressed. Defined formats include:
 * Run-length encoding: UID 1.2.840.10008.1.2.5
 * DEFLATE: UID 1.2.840.10008.1.2.1.99
 * JPEG (lossy): UID 1.2.840.10008.1.2.4.50, etc.
 * Lossless JPEG: UID 1.2.840.10008.1.2.4.57, etc.
 * JPEG-LS: UID 1.2.840.10008.1.2.4.80 and .81
 * JPEG 2000: UID 1.2.840.10008.1.2.4.90, etc.
 * MPEG-2: UID 1.2.840.10008.1.2.4.100, etc.
 * MPEG-4 AVC/H.264: UID 1.2.840.10008.1.2.4.102, etc.

Specifications

 * The DICOM Standard

Software
Software that reads DICOM files is pretty much everywhere. Most neuroimaging analysis packages have some way of importing DICOMs and turning them in to a higher-dimensional file; open-source stand-alone libraries abound, as well.


 * Pydicom
 * dcm2nii
 * Grassroots DICOM
 * Philips DICOM Viewer For Microsoft Windows. Linked to in the sidebar of pages such as this one.
 * ImageMagick (read-only)
 * XnView
 * abydos

Sample files

 * Examples of DICOM images
 * https://telparia.com/fileFormatSamples/image/dicom/

Links

 * DICOM home page
 * Wikipedia article
 * A little bit of discussion
 * Understanding DICOM with Orthanc, "a gentle, informal, high-level introduction to DICOM"
 * DICOM File Format Basics