Microsoft Compound File

From Just Solve the File Format Problem
(Difference between revisions)
Jump to: navigation, search
(Identification)
Line 20: Line 20:
  
 
Some, but not all, document types can be identified by the "CLSID" field (a 16-byte [[GUID]]) in the "root storage" directory entry. This field is usually located at file offset 512×(1 + {the 32-bit integer at offset 48}) + 80.
 
Some, but not all, document types can be identified by the "CLSID" field (a 16-byte [[GUID]]) in the "root storage" directory entry. This field is usually located at file offset 512×(1 + {the 32-bit integer at offset 48}) + 80.
 +
 +
Some files have a stream named "<code>&lt;U+0005&gt;SummaryInformation</code>" containing metadata, which may include information about the creating application.
  
 
== Related formats ==
 
== Related formats ==
Line 43: Line 45:
 
* [http://blog.avira.com/malicious-office-macros-dead/ Malicious Office macros are not dead]
 
* [http://blog.avira.com/malicious-office-macros-dead/ Malicious Office macros are not dead]
 
* [http://decalage.info/file_formats_security/office MS Office 97-2003 legacy/binary formats security] - article with lots of resources on MS Office formats, including analysis techniques, tools and parsing libraries
 
* [http://decalage.info/file_formats_security/office MS Office 97-2003 legacy/binary formats security] - article with lots of resources on MS Office formats, including analysis techniques, tools and parsing libraries
 +
* [https://msdn.microsoft.com/en-us/library/aa295067(v=vs.60).aspx MSDN: Providing Summary Information]
  
 
== Editors' notes ==
 
== Editors' notes ==

Revision as of 16:38, 23 October 2016

File Format
Name Microsoft Compound File
Ontology
LoCFDD fdd000380, fdd000392
PRONOM fmt/111

Microsoft Compound File is a complex container format used by some versions of Microsoft Office, and other Microsoft applications. It has features similar to those of a filesystem format.

It is also known as Compound File Binary File Format (CFBF or CFB), Microsoft Compound Document File Format, OLE Compound Document Format, OLE2 Compound Document Format, Composite Document File, etc.

The format was not publicly documented by Microsoft until 2008.

It is (or was?) unofficially known as LAOLA File Format.

Contents

Identification

Files begin with signature bytes D0 CF 11 E0 A1 B1 1A E1.

Identifying the specific document type can be difficult. This is one of the few formats for which the file command resorts to a hard-coded identification algorithm (see readcdf.c).

Some, but not all, document types can be identified by the "CLSID" field (a 16-byte GUID) in the "root storage" directory entry. This field is usually located at file offset 512×(1 + {the 32-bit integer at offset 48}) + 80.

Some files have a stream named "<U+0005>SummaryInformation" containing metadata, which may include information about the creating application.

Related formats

See Category:Microsoft Compound File.

Specifications

Programs, libraries, and utilities

Links

Editors' notes

TODO: Explain the relationship between Compound File format and the format/technology called COM Structured Storage (or OLE Structured Storage).

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox