WordPerfect
From Just Solve the File Format Problem
Revision as of 11:41, 22 October 2013 by Dan Tobias (Talk | contribs)
Contents |
Introduction
Name for both word processing application and file format.
Printer definitions
WordPerfect uses so called 'printer definitions' for "pretty printing".
Detecting WordPerfect files
The "signature bytes" at the beginning of a WordPerfect file are (hex) FF 57 50 43
, which spells "WPC" after a flag character #255.
Extracting plain-text content
If you're a programmer attempting to get a program to extract the plain text out of a WordPerfect document, and are not interested in the fancy formatting and other features, this is a fairly simple process; just make the program skip the parts that are not text. When reading through the characters of the file in order, this pseudocode manipulates them (using decimal values of the characters/bytes):
For each character c, if its value is: #128, #160: treat as space ' ' #169..#171, #173, #174: treat as dash '-' #192..#236: skip ahead and ignore all characters until another occurrence of character c is found; resume at the following character #0..#31, #129..#159, #161..#168, #172, #175..#255: ignore (control characters) else treat as regular text character
Developer utilities
- WordPerfect file format SDK (archived version at Internet Archive, original pages have been taken offline)
- libwpd - programmer library for dealing with WordPerfect files