PDFXML

Adobe labs developed a plugin for Acrobat to create an "XML-friendly representation of PDF". Originally called the MARS project, it was later renamed to PDFXML. Started in 2006, the project was shutdown in 2011 and removed from public access.

File Information
The PDFXML file format uses ZIP as a container format, and SVG for each page and JPEG2000 for each image.

File format specifications and schema were published at the time, but unfortunately the archive.org captures of the website have truncated PDF's. If anyone has copies of the original specification or schema please post a link here.

Basic container contains the following structure: ├── META-INF │   ├── compatibility.pdf │   ├── container.xml │   └── metadata.xml ├── backbone.xml ├── bookmarks.xml ├── color │   └── cs-0.icc ├── form │   └── form_data.xfdf ├── mimetype ├── page │   └── 0 │      ├── form_0.svg │      ├── form_1.svg │      ├── form_2.svg │      ├── info.xml │      ├── pg.can │      └── pg.svg └── script ├── javascripts.xml └── js_0