From Just Solve the File Format Problem
(Difference between revisions)
Dan Tobias (Talk | contribs) |
(→Sample files: Added link to PDF test suites on Adobe Acrobat Engineering site) |
||
Line 38: | Line 38: | ||
== Sample files == | == Sample files == | ||
* [https://github.com/openplanets/format-corpus/tree/master/pdfCabinetOfHorrors PDF Cabinet of Horrors] - sample PDF files in corrupted or otherwise problematic formats | * [https://github.com/openplanets/format-corpus/tree/master/pdfCabinetOfHorrors PDF Cabinet of Horrors] - sample PDF files in corrupted or otherwise problematic formats | ||
+ | * [http://acroeng.adobe.com/wp/?page_id=10 Adobe PDF Test Suites] - various PDF test suites on Adobe Acrobat Engineering site | ||
== References == | == References == |
Revision as of 16:24, 4 February 2013
PDF, portable document format, based on PostScript and originally from Adobe, has many subsets.
As well as the 'full function' ISO 32000-1:2008 (or PDF 1.7), there are also PDF/X, PDF/A, PDF/E, PDF/VT and PDF/UA, all of which are ISO specifications.
PDF profiles (formalized subsets) include the following:
- PDF/A (optimized for preservation)
- PDF/A-1 (ISO 19005-1:2005)
- PDF/A-2 (ISO 19005-2:2011)
- PDF/A-3 (ISO 19005-3:2012) (extends PDF/A-2 by allowing embedded files of any type)
- PDF/E (ISO 24517-1:2008) (for engineering workflows)
- PDF/UA (ISO 14289-1) (making documents accessible through assistive technologies)
- PDF/VT (ISO 16612-2) (support for variable document printing)
- PDF/X (support for prepress graphics exchange)
- PDF/X-1 (ISO 15930-1:2001)
- PDF/X-1a (ISO 15930-4:2003)
- PDF/X-2 (ISO 15930-5:2003)
- PDF/X-3 (ISO 15930-6:2003)
- Tagged PDF
Also see: extension PDF
Identification
The majority of PDF files can be identified by a fixed header e.g. "%PDF-1.4", however, older documents have a number of variations.
- Some can start with "%!PS-Adobe-N.n PDF-M.m" instead, as described here.
- Since PDF 1.7, the major and minor version numbers have been fixed. i.e. the public version from Adobe after 1.7 was "1.7 Adobe Extension Level 3".
- For the PDF/A families of formats, their conformance is declared via an embedded (XMP) metadata fragment.
- Some older files from Mac OS may be wrapped up in the AppleSingle/AppleDouble formats. This is a general issue, so should perhaps be documented elsewhere. For more information, see:
Sample files
- PDF Cabinet of Horrors - sample PDF files in corrupted or otherwise problematic formats
- Adobe PDF Test Suites - various PDF test suites on Adobe Acrobat Engineering site
References
- PDF Reference and Adobe Extensions to the PDF Specification Adobe page linking to specification for PDF 1.7 (equivalent to ISO 32000-1:2008) and two Adobe extensions that are expected to be incorporated into ISO 32000-2. These extensions include support for geospatial features and for 3-D content using U3D and PRC formats.
- Adobe PDF Reference Archives. Archive of specifications for earlier Adobe versions of PDF, starting with Version 1.3.
- Portable Document Format (Wikipedia)
- PDF/A Competence Center
- The Network is the Format: PDF and the Long-term Use of Digital Content Article by Sheila Morrissey of ITHAKA on the challenges of preserving PDF files based on experience. She illustrates the challenge of defining a "sufficient sub-graph of the network of information about a digital object, for effective future use."
- PDF (Portable Document Format) from Library of Congress resource on Sustainability of Digital Formats Links to individual pages for Adobe chronological versions 1.3 through 1.7 and for several versions approved as ISO standards.