Microsoft Compound File

From Just Solve the File Format Problem
(Difference between revisions)
Jump to: navigation, search
m (Word and powerpoint 97-2003 not just mentioned in comments)
m (Identification: Correct CLSID offset when MajorVersion is 4.)
Line 25: Line 25:
 
Files begin with signature bytes {{magic|D0 CF 11 E0 A1 B1 1A E1}}.
 
Files begin with signature bytes {{magic|D0 CF 11 E0 A1 B1 1A E1}}.
  
Identifying the specific document type can be difficult. Some, but not all, document types can be identified by the [[CLSID]] field in the "root storage" directory entry. This field is usually located at file offset 512×(1 + {the 32-bit integer at offset 48}) + 80.
+
Identifying the specific document type can be difficult. Some, but not all, document types can be identified by the [[CLSID]] field in the "root storage" directory entry. This field is usually located at file offset {SectorSize}×(1 + {the 32-bit integer at offset 48}) + 80 with {SectorSize} equal to 512 when {the 16-byte integer at offset 26} is 3, and equal to 4096 if 4.
  
 
Some files have a stream named "<code>&lt;U+0005&gt;SummaryInformation</code>" containing metadata, which may include information about the creating application.
 
Some files have a stream named "<code>&lt;U+0005&gt;SummaryInformation</code>" containing metadata, which may include information about the creating application.

Revision as of 15:15, 31 July 2024

File Format
Name Microsoft Compound File
Ontology
LoCFDD fdd000380, fdd000392
PRONOM fmt/111
"OLE" redirects here. See also OLE 1.0 object.

Microsoft Compound File is a complex container format used by some versions of Microsoft Office, and other Windows-centric applications. It has features similar to those of a filesystem format.

Its name has many variations, including:

  • Compound File Binary File Format (CFBF or CFB)
  • Microsoft Compound Document File Format
  • OLE Compound Document Format
  • OLE2 Compound Document Format
  • Composite Document File
  • DocFile

The format was not publicly documented by Microsoft until 2008.

It is (or was?) unofficially known as LAOLA File Format.

Contents

Identification

Files begin with signature bytes D0 CF 11 E0 A1 B1 1A E1.

Identifying the specific document type can be difficult. Some, but not all, document types can be identified by the CLSID field in the "root storage" directory entry. This field is usually located at file offset {SectorSize}×(1 + {the 32-bit integer at offset 48}) + 80 with {SectorSize} equal to 512 when {the 16-byte integer at offset 26} is 3, and equal to 4096 if 4.

Some files have a stream named "<U+0005>SummaryInformation" containing metadata, which may include information about the creating application.

Root storage object CLSIDs

The table below lists some of the root storage object CLSIDs that have been observed in this type of file. Use this information at your own risk, as these identifiers can be unreliable.

Microsoft's documentation says this about the CLSID field:

This field contains an object class GUID. [...] If not [all zeroes], the object class GUID can be used as a parameter to start applications.

Although every storage object (think subdirectory) can have a CLSID, this table is only concerned with the file's root storage object.

Note that the CLSIDs are stored as GUIDs in little-endian binary format, so they have a strange byte order.

Root storage object CLSID Format
{00000000-0000-0000-0000-000000000000} Unspecified (could be Thumbs.db, SUO, PageMaker, Microsoft Access wizard template, Easy CD Creator 2 ...)
{00000257-0000-0000-0000-000000000000} Family Tree Maker FTW
{00020810-0000-0000-c000-000000000046} Excel 5-95 XLS
{00020820-0000-0000-c000-000000000046} Excel 97-2003 XLS
{00020900-0000-0000-c000-000000000046} Word 6-95 DOC
{00020906-0000-0000-c000-000000000046} Word 97-2003 DOC
{00020d0b-0000-0000-c000-000000000046} Outlook 97-2003 Item File
{00021200-0000-0000-00C0-000000000046} Microsoft Publisher 95 (2.0)
{00021201-0000-0000-00c0-000000000046} Microsoft Publisher 97-2013 (3.0-11.0)
{00021302-0000-0000-c000-000000000046} Microsoft Works 3-4 WordProcessor
{00021303-0000-0000-c000-000000000046} Microsoft Works 3-4 database
{00021a13-0000-0000-c000-000000000046} Visio 2000-2002
{00021a14-0000-0000-c000-000000000046} Visio 2003-2010
{00022c44-0000-0000-c000-000000000046} GST DTP formats
{00022C60-0000-0000-C000-000000000046} GST Art drawing
{00044851-0000-0000-c000-000000000046} PowerPoint 4.0 PPT
{0006f046-0000-0000-c000-000000000046} Outlook 97-2003 Item template
{000C1082-0000-0000-C000-000000000046} Windows Installer transform script MST
{000c1084-0000-0000-c000-000000000046} MSI
{000c1086-0000-0000-c000-000000000046} Windows Installer Patch MSP
{012d3cc0-4216-11d0-89cb-008029e4b0b1} StarImpress 4.0
{02b01c80-e03d-101a-b294-00dd010f2bf9} Microsoft fax At Work Document
{02b3b7e0-4225-11d0-89ca-008029e4b0b1} StarChart 4.0
{02b3b7e1-4225-11d0-89ca-008029e4b0b1} StarMath 4.0
{0ea45ab2-9e0a-11d1-a407-00c04fb932ba} Microsoft Works 5-6 WordProcessor
{18b8d021-b4fd-11d0-a97e-00a0c905410d} MIX (PhotoDraw)
{1cdd8c7b-81c0-45a0-9fed-04143144cc1e} MAX (3ds Max)
{28cddbc2-0ae2-11ce-a29a-00aa004a1a72} Microsoft Works 4 WordProcessor
{28cddbc3-0ae2-11ce-a29a-00aa004a1a72} Microsoft Works 4 database
{2e8905a0-85bd-11d1-89d0-008029e4b0b1} StarDraw 5.0
{31851f84-afe6-11d2-a3c9-00c04f72f340} Microsoft MapPoint
{340ac970-e30d-11d0-a53f-00a0249d57b1} Master 4.0
{3f543fa0-b6a6-101b-9961-04021c007002} StarCalc 3.0
{402efe60-1999-101b-99ae-04021c007002} WordPerfect 9 Graphic
{402efe62-1999-101b-99ae-04021c007002} Corel 7-X3 presentation
{4D29B490-49B2-11D0-93C3-7E0706000000} Autodesk Inventor Part
{519873FF-2DAD-0220-1937-0000929679CD} WordPerfect document
{565c7221-85bc-11d1-89d0-008029e4b0b1} StarImpress 5.0
{56616700-c154-11ce-8553-00aa00a1f95b} FlashPix
{56616800-c154-11ce-8553-00aa00a1f95b} MIX (PhotoDraw) or MIX (Picture It!)
{59850400-6664-101B-B21C-00AA004BA90B} Microsoft Office Binder
{6361d441-4235-11d0-89cb-008029e4b0b1} StarCalc 4.0
{64818d10-4f9b-11cf-86ea-00aa00b929e8} Powerpoint 97-2003 PPT
{6E26C7C0-8CB9-11D3-A1C8-00C04F612452} Microsoft Works portfolio
{74b78f3a-c8c8-11d1-be11-00c04fb6faf1} Microsoft Project
{817246F0-720A-11CF-8718-00AA0060263B} Microsoft PowerPoint Addin or Wizard
{4F4D4E49-464F-524D-AFDC-0020AF286206} OmniForm
{8b04e9b0-420e-11d0-a45e-00a0249d57b1} StarWriter 4.0
{A9C39302-770A-11D1-893F-00802964B632} Easy CD Creator 4
{af10aae0-b36d-101b-9961-04021c007002} StarDraw 3.0
{bf884321-85dd-11d1-89d0-008029e4b0b1} StarChart 5.0
{c20cf9d1-85ae-11d1-aab4-006097da561a} StarWriter 5.0
{c20cf9d3-85ae-11d1-aab4-006097da561a} Master 5.0
{c65e63e1-6c0e-11cf-842e-00aa006130ba} Softimage SCN
{c6a5b861-85d6-11d1-89cb-008029e4b0b1} StarCalc 5.0
{d4590460-35fd-101c-b12a-04021c007002} StarMath 3.0
{dc5c7e40-b35c-101b-9961-04021c007002} StarWriter 3.0
{de14f420-ac1c-11ce-be26-db67235e2689} CorelCAD Drawing or Template
{ea7bae70-fb3b-11cd-a903-00aa00510ea3} PowerPoint 95 PPT
{fb9c99e0-2c6d-101c-8e2c-00001b4cc711} StarChart 3.0
{ffb5e640-85de-11d1-89d0-008029e4b0b1} StarMath 5.0
{597CAA70-72AA-11CF-831E-524153480000} Adobe Flash
{48D026F3-C031-11D1-8FAB-00A0C96E3856} Melco Project File

Related formats

For formats based on this format, see Category:Microsoft Compound File.

Specifications

Programs, libraries, and utilities

Sample files

Links

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox