PDFXML
From Just Solve the File Format Problem
Adobe labs developed a plugin for Acrobat to create an "XML-friendly representation of PDF".[1] Originally called the MARS project[2], it was later renamed to PDFXML. Started in 2006, the project was shutdown in 2011 and removed from public access.[3]
File Information
The PDFXML file format uses ZIP as a container format, and SVG for each page and JPEG2000 for each image.
File format specifications and schema were published at the time, but unfortunately the archive.org captures of the website have truncated PDF's.[4] If anyone has copies of the original specification or schema please post a link here.
Basic container contains the following structure:
├── META-INF │ ├── compatibility.pdf │ ├── container.xml │ └── metadata.xml ├── backbone.xml ├── bookmarks.xml ├── color │ └── cs-0.icc ├── form │ └── form_data.xfdf ├── mimetype ├── page │ └── 0 │ ├── form_0.svg │ ├── form_1.svg │ ├── form_2.svg │ ├── info.xml │ ├── pg.can │ └── pg.svg └── script ├── javascripts.xml └── js_0
Further information
Information not immediately visible through the references above.
- MARS FAQ
- MARS via PDF Junkie
- Martin Kováč on PDF XML
- Eliot Kimber - Adobe Mars: Looks Interesting
- Microsoft.Fandom: Adobe Mars
References
- ↑ https://web.archive.org/web/20080919181131/https://blogs.adobe.com/mars/2008/09/pdfxml_plugin_prerelease.html
- ↑ http://csis.pace.edu/~marchese/CS835/Student_Readings/p161-hardy.pdf
- ↑ https://web.archive.org/web/20110902063203/http://labs.adobe.com/technologies/mars
- ↑ https://web.archive.org/web/20061222203316/http://labs.adobe.com/technologies/mars/