Microsoft Compound File

From Just Solve the File Format Problem
(Difference between revisions)
Jump to: navigation, search
(Programs, libraries, and utilities)
(Added sample file)
(10 intermediate revisions by 3 users not shown)
Line 6: Line 6:
 
|pronom={{PRONOM|fmt/111}}
 
|pronom={{PRONOM|fmt/111}}
 
}}
 
}}
'''Microsoft Compound File''' is a complex container format used by some versions of [[Microsoft Office]], and other Microsoft applications. It has features similar to those of a [[filesystem]] format.
+
:''"OLE" redirects here. See also [[OLE 1.0 object]].''
  
It is also known as '''Compound File Binary File Format''' ('''CFBF''' or '''CFB'''), '''Microsoft Compound Document File Format''', '''OLE Compound Document Format''', '''OLE2 Compound Document Format''', '''Composite Document File''', etc.
+
'''Microsoft Compound File''' is a complex container format used by some versions of [[Microsoft Office]], and other Windows-centric applications. It has features similar to those of a [[filesystem]] format.
 +
 
 +
Its name has many variations, including:
 +
* '''Compound File Binary File Format''' ('''CFBF''' or '''CFB''')
 +
* '''Microsoft Compound Document File Format'''
 +
* '''OLE Compound Document Format'''
 +
* '''OLE2 Compound Document Format'''
 +
* '''Composite Document File'''
 +
* '''DocFile'''
  
 
The format was not publicly documented by Microsoft until 2008.
 
The format was not publicly documented by Microsoft until 2008.
Line 17: Line 25:
 
Files begin with signature bytes {{magic|D0 CF 11 E0 A1 B1 1A E1}}.
 
Files begin with signature bytes {{magic|D0 CF 11 E0 A1 B1 1A E1}}.
  
Identifying the specific document type can be difficult. This is one of the few formats for which the [[file command]] resorts to a hard-coded identification algorithm (see [https://github.com/file/file/blob/master/src/readcdf.c readcdf.c]).
+
Identifying the specific document type can be difficult. Some, but not all, document types can be identified by the [[CLSID]] field in the "root storage" directory entry. This field is usually located at file offset 512×(1 + {the 32-bit integer at offset 48}) + 80.
 
+
Some, but not all, document types can be identified by the [[CLSID]] field in the "root storage" directory entry. This field is usually located at file offset 512×(1 + {the 32-bit integer at offset 48}) + 80.
+
  
 
Some files have a stream named "<code>&lt;U+0005&gt;SummaryInformation</code>" containing metadata, which may include information about the creating application.
 
Some files have a stream named "<code>&lt;U+0005&gt;SummaryInformation</code>" containing metadata, which may include information about the creating application.
Line 30: Line 36:
 
<blockquote>This field contains an object class GUID. [...] If not [all zeroes], the object class GUID can be used as a parameter to start applications.</blockquote>
 
<blockquote>This field contains an object class GUID. [...] If not [all zeroes], the object class GUID can be used as a parameter to start applications.</blockquote>
  
Although every ''storage object'' (think ''subdirectory'') can have a CLSID, this table is only concerned with ''root'' storage objects.
+
Although every ''storage object'' (think ''subdirectory'') can have a CLSID, this table is only concerned with the file's ''root'' storage object.
  
 
Note that the CLSIDs are stored as [[GUID]]s in little-endian binary format, so they have a strange byte order.
 
Note that the CLSIDs are stored as [[GUID]]s in little-endian binary format, so they have a strange byte order.
Line 37: Line 43:
 
! Root storage object CLSID !! Format
 
! Root storage object CLSID !! Format
 
|-
 
|-
|<code>{00000000-0000-0000-0000-000000000000}</code> || Unspecified (could be [[AVI]], [[Windows thumbnail cache|Thumbs.db]], ...)
+
|<code>{00000000-0000-0000-0000-000000000000}</code> || Unspecified (could be [[Windows thumbnail cache|Thumbs.db]], [[Visual Studio Solution Options file|SUO]], ...)
 
|-
 
|-
|<code>{00020810-0000-0000-c000-000000000046}</code> || [[XLS]]
+
|<code>{00020810-0000-0000-c000-000000000046}</code> || [[XLS|Excel 5-95 XLS]] <!-- Excel 5-95 worksheet, addin or template -->
 
|-
 
|-
|<code>{00020820-0000-0000-c000-000000000046}</code> || [[XLS]]
+
|<code>{00020820-0000-0000-c000-000000000046}</code> || [[XLS|Excel 97-2003 XLS]] <!-- Excel 97-2003 worksheet, addin or template -->
 
|-
 
|-
|<code>{00020906-0000-0000-c000-000000000046}</code> || [[DOC]]
+
|<code>{00021302-0000-0000-c000-000000000046}</code> || [[Microsoft Works Word Processor|Microsoft Works 3-4 WordProcessor]] <!-- Microsoft Works 4? document wps/ps/bps -->
 
|-
 
|-
|<code>{00020d0b-0000-0000-c000-000000000046}</code> || [[Outlook Item File]]
+
|<code>{00021303-0000-0000-c000-000000000046}</code> || [[Microsoft Works Database|Microsoft Works 3-4 database]]
 
|-
 
|-
|<code>{0006f046-0000-0000-c000-000000000046}</code> || [[Outlook Item File]]
+
|<code>{00020900-0000-0000-c000-000000000046}</code> || [[DOC|Word 6-95 DOC]]
 +
|-
 +
|<code>{00020906-0000-0000-c000-000000000046}</code> || [[DOC]] <!-- Word 97-2003 document or template -->
 +
|-
 +
|<code>{00020d0b-0000-0000-c000-000000000046}</code> || [[Outlook Item File|Outlook 97-2003 Item File]] <!-- Outlook 97-2003 item -->
 +
|-
 +
|<code>{00021201-0000-0000-00c0-000000000046}</code> || [[Microsoft Publisher]]
 +
|-
 +
|<code>{00021a13-0000-0000-c000-000000000046}</code> || [[Visio]] 2000-2002
 +
|-
 +
|<code>{00021a14-0000-0000-c000-000000000046}</code> || [[Visio]] 2003-2010
 +
|-
 +
|<code>{00044851-0000-0000-c000-000000000046}</code> || [[PPT|PowerPoint 4.0 PPT]] <!-- PowerPoint 4.0 presentation -->
 +
|-
 +
|<code>{0006f046-0000-0000-c000-000000000046}</code> || [[Outlook Item File|Outlook 97-2003 Item template]] <!-- Outlook 97-2003 item template -->
 
|-
 
|-
 
|<code>{000c1084-0000-0000-c000-000000000046}</code> || [[Windows Installer|MSI]]
 
|<code>{000c1084-0000-0000-c000-000000000046}</code> || [[Windows Installer|MSI]]
 +
|-
 +
|<code>{000c1086-0000-0000-c000-000000000046}</code> || [[Windows Installer|Windows Installer Patch MSP]]
 +
|-
 +
|<code>{012d3cc0-4216-11d0-89cb-008029e4b0b1}</code> || [[StarOffice binary formats|StarImpress 4.0]] <!-- StarOffice StarImpress 4.0 presentation or template sdd/vor -->
 +
|-
 +
|<code>{02b3b7e0-4225-11d0-89ca-008029e4b0b1}</code> || [[StarOffice binary formats|StarChart 4.0]] <!-- StarOffice StarChart 4.0 sds -->
 +
|-
 +
|<code>{02b3b7e1-4225-11d0-89ca-008029e4b0b1}</code> || [[StarOffice binary formats|StarMath 4.0]] <!-- StarOffice StarMath 4.0 smf -->
 +
|-
 +
|<code>{0ea45ab2-9e0a-11d1-a407-00c04fb932ba}</code> || [[Microsoft Works Word Processor|Microsoft Works 5-6 WordProcessor]] <!-- Microsoft Works 5-6 document wps -->
 
|-
 
|-
 
|<code>{1cdd8c7b-81c0-45a0-9fed-04143144cc1e}</code> || [[MAX (3ds Max)]]
 
|<code>{1cdd8c7b-81c0-45a0-9fed-04143144cc1e}</code> || [[MAX (3ds Max)]]
 
|-
 
|-
 
|<code>{18b8d021-b4fd-11d0-a97e-00a0c905410d}</code> || [[MIX (PhotoDraw)]]
 
|<code>{18b8d021-b4fd-11d0-a97e-00a0c905410d}</code> || [[MIX (PhotoDraw)]]
 +
|-
 +
|<code>{28cddbc2-0ae2-11ce-a29a-00aa004a1a72}</code> || [[Microsoft Works Word Processor|Microsoft Works 4 WordProcessor]] <!-- Microsoft Works 4 document wps -->
 +
|-
 +
|<code>{28cddbc3-0ae2-11ce-a29a-00aa004a1a72}</code> || [[Microsoft Works Database|Microsoft Works 4 database]] <!-- Microsoft Works 4 database wdb/bdb -->
 +
|-
 +
|<code>{2e8905a0-85bd-11d1-89d0-008029e4b0b1}</code> || [[StarOffice binary formats|StarDraw 5.0]] <!-- StarOffice StarDraw 5.0 drawing or template sda/vor -->
 +
|-
 +
|<code>{340ac970-e30d-11d0-a53f-00a0249d57b1}</code> || [[StarOffice binary formats|Master 4.0]] <!-- StarOffice Master 4.0 document sgl -->
 +
|-
 +
|<code>{3f543fa0-b6a6-101b-9961-04021c007002}</code> || [[StarOffice binary formats|StarCalc 3.0]] <!-- StarOffice StarCalc 3.0 spreadsheet or template sdc/vor -->
 +
|-
 +
|<code>{402efe60-1999-101b-99ae-04021c007002}</code> || [[WordPerfect_Graphics|WordPerfect 9 Graphic ]] <!-- WordPerfect 9 Graphic wpg -->
 +
|-
 +
|<code>{402efe62-1999-101b-99ae-04021c007002}</code> || [[SHW_(Corel)|Corel 7-X3 presentation]] <!-- WordPerfect 7-X3 presentation shw -->
 +
|-
 +
|<code>{565c7221-85bc-11d1-89d0-008029e4b0b1}</code> || [[StarOffice binary formats|StarImpress 5.0]] <!-- StarOffice StarImpress 5.0 presentation or template sdd/vor -->
 
|-
 
|-
 
|<code>{56616700-c154-11ce-8553-00aa00a1f95b}</code> || [[FlashPix]]
 
|<code>{56616700-c154-11ce-8553-00aa00a1f95b}</code> || [[FlashPix]]
Line 59: Line 105:
 
|<code>{56616800-c154-11ce-8553-00aa00a1f95b}</code> || [[MIX (PhotoDraw)]] or [[MIX (Picture It!)]]
 
|<code>{56616800-c154-11ce-8553-00aa00a1f95b}</code> || [[MIX (PhotoDraw)]] or [[MIX (Picture It!)]]
 
|-
 
|-
|<code>{64818d10-4f9b-11cf-86ea-00aa00b929e8}</code> || [[PPT]]
+
|<code>{6361d441-4235-11d0-89cb-008029e4b0b1}</code> || [[StarOffice binary formats|StarCalc 4.0]] <!-- StarOffice StarCalc 4.0 spreadsheet or template sdc/vor -->
 +
|-
 +
|<code>{64818d10-4f9b-11cf-86ea-00aa00b929e8}</code> || [[PPT]] <!-- PowerPoint 97-2003 presentation or template ppt/pps/pot -->
 +
|-
 +
|<code>{74b78f3a-c8c8-11d1-be11-00c04fb6faf1}</code> || Microsoft Project <!-- Microsoft Project mpp -->
 +
|-
 +
|<code>{8b04e9b0-420e-11d0-a45e-00a0249d57b1}</code> || [[StarOffice binary formats|StarWriter 4.0]] <!-- StarOffice StarWriter 4.0 document or template sdw/vor -->
 +
|-
 +
|<code>{af10aae0-b36d-101b-9961-04021c007002}</code> || [[StarOffice binary formats|StarDraw 3.0]] <!-- StarOffice StarDraw 3.0 drawing or template sdd/sda/vor -->
 +
|-
 +
|<code>{bf884321-85dd-11d1-89d0-008029e4b0b1}</code> || [[StarOffice binary formats|StarChart 5.0]] <!-- StarOffice StarChart 5.0 sds -->
 +
|-
 +
|<code>{c20cf9d1-85ae-11d1-aab4-006097da561a}</code> || [[StarOffice binary formats|StarWriter 5.0]] <!-- StarOffice StarWriter 5.0 document or template sdw/vor -->
 +
|-
 +
|<code>{c20cf9d3-85ae-11d1-aab4-006097da561a}</code> || [[StarOffice binary formats|Master 5.0]] <!-- StarOffice Master 5.0 document sgl -->
 
|-
 
|-
 
|<code>{c65e63e1-6c0e-11cf-842e-00aa006130ba}</code> || [[Softimage SCN]]
 
|<code>{c65e63e1-6c0e-11cf-842e-00aa006130ba}</code> || [[Softimage SCN]]
 +
|-
 +
|<code>{c6a5b861-85d6-11d1-89cb-008029e4b0b1}</code> || [[StarOffice binary formats|StarCalc 5.0]] <!-- StarOffice StarCalc 5.0 spreadsheet or template sdc/vor -->
 +
|-
 +
|<code>{d4590460-35fd-101c-b12a-04021c007002}</code> || [[StarOffice binary formats|StarMath 3.0]] <!-- StarOffice StarMath 3.0 smf -->
 +
|-
 +
|<code>{dc5c7e40-b35c-101b-9961-04021c007002}</code> || [[StarOffice binary formats|StarWriter 3.0]] <!-- StarOffice StarWriter 3.0 document or template sdw/vor -->
 +
|-
 +
|<code>{ea7bae70-fb3b-11cd-a903-00aa00510ea3}</code> || [[PPT|PowerPoint 95 PPT]] <!-- PowerPoint 95 presentation ppt/pot -->
 +
|-
 +
|<code>{fb9c99e0-2c6d-101c-8e2c-00001b4cc711}</code> || [[StarOffice binary formats|StarChart 3.0]] <!-- StarOffice StarChart 3.0 sds -->
 +
|-
 +
|<code>{ffb5e640-85de-11d1-89d0-008029e4b0b1}</code> || [[StarOffice binary formats|StarMath 5.0]] <!-- StarOffice StarMath 5.0 smf -->
 +
|-
 
|}
 
|}
  
 
== Related formats ==
 
== Related formats ==
See [[:Category:Microsoft Compound File]].
+
* [[OLE Property Set]]
 +
 
 +
For formats based on this format, see [[:Category:Microsoft Compound File]].
  
 
== Specifications ==
 
== Specifications ==
* [http://msdn.microsoft.com/en-us/library/dd942138.aspx MSDN: Compound File Binary File Format] → [MS-CFB] PDF
+
* [https://msdn.microsoft.com/en-us/library/dd942138.aspx MSDN: Compound File Binary File Format] → [MS-CFB] PDF
* [http://www.openoffice.org/sc/compdocfileformat.pdf OpenOffice.org's documentation]
+
* [https://www.openoffice.org/sc/compdocfileformat.pdf OpenOffice.org's documentation]
  
 
== Programs, libraries, and utilities ==
 
== Programs, libraries, and utilities ==
Line 79: Line 154:
 
* [http://decalage.info/python/oletools python-oletools - python tools to analyze OLE files]
 
* [http://decalage.info/python/oletools python-oletools - python tools to analyze OLE files]
 
* [https://sourceforge.net/projects/openmcdf/ OpenMCDF]
 
* [https://sourceforge.net/projects/openmcdf/ OpenMCDF]
 +
* [https://poi.apache.org/ Apache POI] - Java API for Microsoft documents
 +
* [https://github.com/renyxa/re-lab Re-lab / OLE Toy]
 +
* [[7-Zip]]
 +
 +
== Sample files ==
 +
* https://telparia.com/fileFormatSamples/archive/msCompound/travel.gal
  
 
== Links ==
 
== Links ==
Line 88: Line 169:
 
* [http://decalage.info/file_formats_security/office MS Office 97-2003 legacy/binary formats security] - article with lots of resources on MS Office formats, including analysis techniques, tools and parsing libraries
 
* [http://decalage.info/file_formats_security/office MS Office 97-2003 legacy/binary formats security] - article with lots of resources on MS Office formats, including analysis techniques, tools and parsing libraries
 
* [https://msdn.microsoft.com/en-us/library/aa295067(v=vs.60).aspx MSDN: Providing Summary Information]
 
* [https://msdn.microsoft.com/en-us/library/aa295067(v=vs.60).aspx MSDN: Providing Summary Information]
 
== Editors' notes ==
 
TODO: Explain the relationship between Compound File format and the format/technology called '''COM Structured Storage''' (or '''OLE Structured Storage''').
 
  
 
[[Category:Document]]
 
[[Category:Document]]
 
[[Category:Microsoft]]
 
[[Category:Microsoft]]

Revision as of 20:07, 26 July 2020

File Format
Name Microsoft Compound File
Ontology
LoCFDD fdd000380, fdd000392
PRONOM fmt/111
"OLE" redirects here. See also OLE 1.0 object.

Microsoft Compound File is a complex container format used by some versions of Microsoft Office, and other Windows-centric applications. It has features similar to those of a filesystem format.

Its name has many variations, including:

  • Compound File Binary File Format (CFBF or CFB)
  • Microsoft Compound Document File Format
  • OLE Compound Document Format
  • OLE2 Compound Document Format
  • Composite Document File
  • DocFile

The format was not publicly documented by Microsoft until 2008.

It is (or was?) unofficially known as LAOLA File Format.

Contents

Identification

Files begin with signature bytes D0 CF 11 E0 A1 B1 1A E1.

Identifying the specific document type can be difficult. Some, but not all, document types can be identified by the CLSID field in the "root storage" directory entry. This field is usually located at file offset 512×(1 + {the 32-bit integer at offset 48}) + 80.

Some files have a stream named "<U+0005>SummaryInformation" containing metadata, which may include information about the creating application.

Root storage object CLSIDs

The table below lists some of the root storage object CLSIDs that have been observed in this type of file. Use this information at your own risk, as these identifiers can be unreliable.

Microsoft's documentation says this about the CLSID field:

This field contains an object class GUID. [...] If not [all zeroes], the object class GUID can be used as a parameter to start applications.

Although every storage object (think subdirectory) can have a CLSID, this table is only concerned with the file's root storage object.

Note that the CLSIDs are stored as GUIDs in little-endian binary format, so they have a strange byte order.

Root storage object CLSID Format
{00000000-0000-0000-0000-000000000000} Unspecified (could be Thumbs.db, SUO, ...)
{00020810-0000-0000-c000-000000000046} Excel 5-95 XLS
{00020820-0000-0000-c000-000000000046} Excel 97-2003 XLS
{00021302-0000-0000-c000-000000000046} Microsoft Works 3-4 WordProcessor
{00021303-0000-0000-c000-000000000046} Microsoft Works 3-4 database
{00020900-0000-0000-c000-000000000046} Word 6-95 DOC
{00020906-0000-0000-c000-000000000046} DOC
{00020d0b-0000-0000-c000-000000000046} Outlook 97-2003 Item File
{00021201-0000-0000-00c0-000000000046} Microsoft Publisher
{00021a13-0000-0000-c000-000000000046} Visio 2000-2002
{00021a14-0000-0000-c000-000000000046} Visio 2003-2010
{00044851-0000-0000-c000-000000000046} PowerPoint 4.0 PPT
{0006f046-0000-0000-c000-000000000046} Outlook 97-2003 Item template
{000c1084-0000-0000-c000-000000000046} MSI
{000c1086-0000-0000-c000-000000000046} Windows Installer Patch MSP
{012d3cc0-4216-11d0-89cb-008029e4b0b1} StarImpress 4.0
{02b3b7e0-4225-11d0-89ca-008029e4b0b1} StarChart 4.0
{02b3b7e1-4225-11d0-89ca-008029e4b0b1} StarMath 4.0
{0ea45ab2-9e0a-11d1-a407-00c04fb932ba} Microsoft Works 5-6 WordProcessor
{1cdd8c7b-81c0-45a0-9fed-04143144cc1e} MAX (3ds Max)
{18b8d021-b4fd-11d0-a97e-00a0c905410d} MIX (PhotoDraw)
{28cddbc2-0ae2-11ce-a29a-00aa004a1a72} Microsoft Works 4 WordProcessor
{28cddbc3-0ae2-11ce-a29a-00aa004a1a72} Microsoft Works 4 database
{2e8905a0-85bd-11d1-89d0-008029e4b0b1} StarDraw 5.0
{340ac970-e30d-11d0-a53f-00a0249d57b1} Master 4.0
{3f543fa0-b6a6-101b-9961-04021c007002} StarCalc 3.0
{402efe60-1999-101b-99ae-04021c007002} WordPerfect 9 Graphic
{402efe62-1999-101b-99ae-04021c007002} Corel 7-X3 presentation
{565c7221-85bc-11d1-89d0-008029e4b0b1} StarImpress 5.0
{56616700-c154-11ce-8553-00aa00a1f95b} FlashPix
{56616800-c154-11ce-8553-00aa00a1f95b} MIX (PhotoDraw) or MIX (Picture It!)
{6361d441-4235-11d0-89cb-008029e4b0b1} StarCalc 4.0
{64818d10-4f9b-11cf-86ea-00aa00b929e8} PPT
{74b78f3a-c8c8-11d1-be11-00c04fb6faf1} Microsoft Project
{8b04e9b0-420e-11d0-a45e-00a0249d57b1} StarWriter 4.0
{af10aae0-b36d-101b-9961-04021c007002} StarDraw 3.0
{bf884321-85dd-11d1-89d0-008029e4b0b1} StarChart 5.0
{c20cf9d1-85ae-11d1-aab4-006097da561a} StarWriter 5.0
{c20cf9d3-85ae-11d1-aab4-006097da561a} Master 5.0
{c65e63e1-6c0e-11cf-842e-00aa006130ba} Softimage SCN
{c6a5b861-85d6-11d1-89cb-008029e4b0b1} StarCalc 5.0
{d4590460-35fd-101c-b12a-04021c007002} StarMath 3.0
{dc5c7e40-b35c-101b-9961-04021c007002} StarWriter 3.0
{ea7bae70-fb3b-11cd-a903-00aa00510ea3} PowerPoint 95 PPT
{fb9c99e0-2c6d-101c-8e2c-00001b4cc711} StarChart 3.0
{ffb5e640-85de-11d1-89d0-008029e4b0b1} StarMath 5.0

Related formats

For formats based on this format, see Category:Microsoft Compound File.

Specifications

Programs, libraries, and utilities

Sample files

Links

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox