WordStar

From Just Solve the File Format Problem
(Difference between revisions)
Jump to: navigation, search
Line 23: Line 23:
  
 
Extended characters (when they appeared in the special escaped sequence, consisting of character 1B hex, followed by the special character, followed by character 1C hex) were generally of the [[MS-DOS encodings]], at least if the file was created in a DOS version of WordStar.
 
Extended characters (when they appeared in the special escaped sequence, consisting of character 1B hex, followed by the special character, followed by character 1C hex) were generally of the [[MS-DOS encodings]], at least if the file was created in a DOS version of WordStar.
 +
 +
There was also a [[WordStar 2000]] program, with its own different file format not compatible with other WordStar versions; this program (which, despite its name, was released in the 1980s, nowhere near the year 2000) was intended to be a new-generation word-processor to compete with the newer programs that were starting to catch on at the time, but didn't succeed and actually went out of use earlier than the original WordStar, which continued to get updated through the 1990s.
  
 
== Converting WordStar files with high bits set ==
 
== Converting WordStar files with high bits set ==

Revision as of 06:54, 14 November 2012

File Formats > Electronic File Formats > Document > WordStar
File Format
Name WordStar
Ontology
Extension(s) .ws, .ws3, .wsd, others

WordStar was a word processor originally released in 1978 which was extremely popular in the early 1980s before losing ground to other word processors (particularly WordStar). Many professional writers used it in that era, and given their notorious conservatism regarding tools used for their writing, some are still using it to this day. This means that many original manuscripts are stored in this format.

The original version was for the CP/M operating system, but it was later ported to a number of other systems; the PC/MS-DOS version became the most popular one. The particular set of control keys used for accessing various functions (often requiring multiple keypresses) were widely imitated in other programs at the time, making a "de-facto standard" for editing keys that got even wider use than WordStar itself.

As with many early word processors, its files were basically plain text, with optional special functions causing control characters to be inserted. Files could be created or edited with any extension, but .wp (sometimes with an appended number to mark versions, like .wp3) was commonly used.

One quirk present in versions prior to 5.0 was its use of the high bit of each byte of its files to denote the last letter of a word. This limited the character set to 7-bit ASCII, where all characters in the document that were not the last letter of a word had a clear high bit (and thus had values from 00-7F hex corresponding to the ASCII values), while last letters had the high bit set (giving them values from 80-FF hex, but actually representing the corresponding characters from 00-7F). This interfered with internationalization, since it prevented the use of extended character sets beyond ASCII, and also resulted in WordStar files having characters at the end of words that looked like gibberish in other programs which interpreted the characters via some 8-bit encoding. Eventually this "feature" was dropped, but even in late versions extended characters were marked in the saved files by control characters both preceding and following them, making an 8-bit character take three bytes to store, which was necessary to preserve file compatibility (old WordStar files with high bits set at the end of words still needed to load correctly meaning that the program couldn't interpret high-bit characters as other characters in extended character sets without a special marker).

Extended characters (when they appeared in the special escaped sequence, consisting of character 1B hex, followed by the special character, followed by character 1C hex) were generally of the MS-DOS encodings, at least if the file was created in a DOS version of WordStar.

There was also a WordStar 2000 program, with its own different file format not compatible with other WordStar versions; this program (which, despite its name, was released in the 1980s, nowhere near the year 2000) was intended to be a new-generation word-processor to compete with the newer programs that were starting to catch on at the time, but didn't succeed and actually went out of use earlier than the original WordStar, which continued to get updated through the 1990s.

Contents

Converting WordStar files with high bits set

Some other programs have special "WordStar import" features which handle high-bit characters, but if you need to deal with such files without a conversion utility, it's helpful to change high-bit characters to their corresponding 7-bit characters in order to have standard ASCII. This can be done simply in most programming or scripting languages; here's a Perl example, for instance.

open OUTFILE, ">out.txt";
open INFILE, "<in.ws";
while (<INFILE>)
{
  tr [\200-\377] [\000-\177];
  print OUTFILE $_;
}
close INFILE;
close OUTFILE;

Control characters

These are the control characters as stored in WordStar documents, and their meanings. Most of them are program-specific, not corresponding to the standard ASCII control meanings, though some of these are preserved. The toggle options were used at the start and end of blocks of text intended to be formatted in a particular way (e.g., bold).

Hex Dec ASCII Char Ctrl Key WordStar Key WordStar meaning
00 0 NUL ^@ Control-PZ In some versions right-aligns text; in others fixes print head to absolute position of character in line
01 1 SOH ^A Control-PA Toggles alternate font
02 2 STX ^B Control-PB Toggles Bold mode
03 3 ETX ^C Control-PC Pause print for user response
04 4 EOT ^D Control-PD Toggles double-strike mode
05 5 ENQ ^E Control-PE Custom print control
06 6 ACK ^F Control-PF Phantom space
07 7 BEL ^G Control-PG Phantom rubout
08 8 BS ^H Control-PH Overprint previous character (backspace)
09 9 HT ^I Control-PI Tab
0A 10 LF ^J Control-PJ Linefeed: follows Carriage Return for line break. (Enter/Return inserts two-character sequence ^M^J)
0B 11 VT ^K Control-PK In some versions, centers text; in others marks text to be indexed (placed both before and after the text sequence)
0C 12 FF ^L Control-PL Form feed (page break)
0D 13 CR ^M Control-PM Carriage Return: precedes Linefeed for line break. (Enter/Return inserts two-character sequence ^M^J)
0E 14 SO ^N Control-PN Return to normal character width
0F 15 SI ^O Control-PO Non-breaking space
10 16 DLE ^P Control-PP Unused
11 17 DC1 ^Q Control-PQ Custom print control
12 18 DC2 ^R Contorl-PR Custom print control
13 19 DC3 ^S Control-PS Toggles underline mode
14 20 DC4 ^T Control-PT Toggles superscript mode
15 21 NAK ^U Control-PU Unused
16 22 SYN ^V Control-PV Toggles subscript mode
17 23 ETB ^W Control-PW Custom print control
18 24 CAN ^X Control-PX Toggles overstrike mode
19 25 EM ^Y Control-PY Toggles italic mode
1A 26 SUB ^Z End-of-file character
1B 27 ESC ^[ Marks that following character is extended character
1C 28 FS ^\ Marks that previous character is extended character (you need both 1B and 1C to delimit extended characters)
1D 29 GS ^] Symmetrical sequence start/stop character
1E 30 RS ^^ Inactive Soft Hyphen
1F 31 US ^_ Active Soft Hyphen
8D 141 Soft Carriage Return (inserted, followed by normal linefeed 0A, to mark soft line break at word-wrap)
A0 160 Soft Space

Dot commands

These commands are intended to be on a line by themselves, and started with the dot (.). This meant that regular text lines couldn't start with dots. Many other early word processors emulated WordStar in their use of "dot lines" for commands, though some of them required a control character to precede the dot in order to allow dots at the start of normal text lines. The specific commands varied a lot between programs, however.

.. Comment line (followed by comment text; not printed)

References

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox