WordStar

From Just Solve the File Format Problem
Revision as of 14:11, 26 February 2014 by AndyJackson (Talk | contribs)

Jump to: navigation, search
File Format
Name WordStar
Ontology
Extension(s) .ws, .ws3, .wsd, others
Released 1978

WordStar was a word processor originally released in 1978 which was extremely popular in the early 1980s before losing ground to other word processors (particularly WordPerfect). Many professional writers used it in that era, and given their notorious conservatism regarding tools used for their writing, some are still using it to this day. This means that many original manuscripts are stored in this format.

The original version was for the CP/M operating system, but it was later ported to a number of other systems; the PC/MS-DOS version became the most popular one. The particular set of control keys used for accessing various functions (often requiring multiple keypresses) were widely imitated in other programs at the time, making a "de-facto standard" for editing keys that got even wider use than WordStar itself.

As with many early word processors, its files were basically plain text, with optional special functions causing control characters to be inserted. Files could be created or edited with any extension, but .ws (sometimes with an appended number to mark versions, like .ws3) was commonly used.

One quirk present in versions prior to 5.0 was its use of the high bit of each byte of its files to denote the last letter of a word. This limited the character set to 7-bit ASCII, where all characters in the document that were not the last letter of a word had a clear high bit (and thus had values from 00-7F hex corresponding to the ASCII values), while last letters had the high bit set (giving them values from 80-FF hex, but actually representing the corresponding characters from 00-7F). This interfered with internationalization, since it prevented the use of extended character sets beyond ASCII, and also resulted in WordStar files having characters at the end of words that looked like gibberish in other programs which interpreted the characters via some 8-bit encoding. Eventually this "feature" was dropped, but even in late versions extended characters were marked in the saved files by control characters both preceding and following them, making an 8-bit character take three bytes to store, which was necessary to preserve file compatibility (old WordStar files with high bits set at the end of words still needed to load correctly meaning that the program couldn't interpret high-bit characters as other characters in extended character sets without a special marker).

Extended characters (when they appeared in the special escaped sequence, consisting of character 1B hex, followed by the special character, followed by character 1C hex) were generally of the MS-DOS encodings, at least if the file was created in a DOS version of WordStar.

There was also a WordStar 2000 program, with its own different file format not compatible with other WordStar versions; this program (which, despite its name, was released in the 1980s, nowhere near the year 2000) was intended to be a new-generation word-processor to compete with the newer programs that were starting to catch on at the time, but didn't succeed and actually went out of use earlier than the original WordStar, which continued to get updated through the 1990s.

Contents

Converting WordStar files with high bits set

Some other programs have special "WordStar import" features which handle high-bit characters, but if you need to deal with such files without a conversion utility, it's helpful to change high-bit characters to their corresponding 7-bit characters in order to have standard ASCII. This can be done simply in most programming or scripting languages; here's a Perl example, for instance.

open OUTFILE, ">out.txt";
open INFILE, "<in.ws";
while (<INFILE>)
{
  tr [\200-\377] [\000-\177];
  print OUTFILE $_;
}
close INFILE;
close OUTFILE;

Control characters

These are the control characters as stored in WordStar documents, and their meanings. Most of them are program-specific, not corresponding to the standard ASCII control meanings, though some of these are preserved. The toggle options were used at the start and end of blocks of text intended to be formatted in a particular way (e.g., bold).

Hex Dec ASCII Char Ctrl Key WordStar Key WordStar meaning
00 0 NUL ^@ Control-PZ In some versions right-aligns text; in others fixes print head to absolute position of character in line
01 1 SOH ^A Control-PA Toggles alternate font
02 2 STX ^B Control-PB Toggles Bold mode
03 3 ETX ^C Control-PC Pause print for user response
04 4 EOT ^D Control-PD Toggles double-strike mode
05 5 ENQ ^E Control-PE Custom print control
06 6 ACK ^F Control-PF Phantom space
07 7 BEL ^G Control-PG Phantom rubout
08 8 BS ^H Control-PH Overprint previous character (backspace)
09 9 HT ^I Control-PI Tab
0A 10 LF ^J Control-PJ Linefeed: follows Carriage Return for line break. (Enter/Return inserts two-character sequence ^M^J)
0B 11 VT ^K Control-PK In some versions, centers text; in others marks text to be indexed (placed both before and after the text sequence)
0C 12 FF ^L Control-PL Form feed (page break)
0D 13 CR ^M Control-PM Carriage Return: precedes Linefeed for line break. (Enter/Return inserts two-character sequence ^M^J)
0E 14 SO ^N Control-PN Return to normal character width
0F 15 SI ^O Control-PO Non-breaking space
10 16 DLE ^P Control-PP Unused
11 17 DC1 ^Q Control-PQ Custom print control
12 18 DC2 ^R Contorl-PR Custom print control
13 19 DC3 ^S Control-PS Toggles underline mode
14 20 DC4 ^T Control-PT Toggles superscript mode
15 21 NAK ^U Control-PU Unused
16 22 SYN ^V Control-PV Toggles subscript mode
17 23 ETB ^W Control-PW Custom print control
18 24 CAN ^X Control-PX Toggles overstrike mode
19 25 EM ^Y Control-PY Toggles italic mode
1A 26 SUB ^Z End-of-file character
1B 27 ESC ^[ Marks that following character is extended character
1C 28 FS ^\ Marks that previous character is extended character (you need both 1B and 1C to delimit extended characters)
1D 29 GS ^] Symmetrical sequence start/stop character
1E 30 RS ^^ Inactive Soft Hyphen
1F 31 US ^_ Active Soft Hyphen
8D 141 Soft Carriage Return (inserted, followed by normal linefeed 0A, to mark soft line break at word-wrap)
A0 160 Soft Space

Dot commands

These commands are intended to be on a line by themselves, and started with the dot (.). This meant that regular text lines couldn't start with dots. Many other early word processors emulated WordStar in their use of "dot lines" for commands, though some of them required a control character to precede the dot in order to allow dots at the start of normal text lines. The specific commands varied a lot between programs, however.

.. Comment line (followed by comment text; not printed)
.av Pause to ask user for value of variable
.aw Turn aligning/word-wrap on or off
.bn Select sheet feeder bin
.cc n Conditional column break if n lines won't fit on page
.co Specifies number of columns and optionally gutter width
.cp n Conditional page break if n lines won't fit on page
.cs string Clear screen and display message
.cv Convert note type (convert first type specified to second type)
.cw n Set character width to given number of 1/120 inch increments
.df filename Insert data file (CSV, dBase, etc)
.dm string Display message
.e# Set new value for endnote numbering
.ei End .if block (must be paired up with .if command)
.el Begins optional else clause after .if command
.f# Set new value for footnote numbering, and optionally set whether it restarts each page or is consecutive
.fi filename Insert text file (ASCII, WordStar, Lotus 1-2-3, etc.)
.fm n Set footer margin (number of lines left blank between main text and first footer line; default 2; footer margin and footer lines must fit within bottom margin)
.fo If followed by text string, sets footer line; without a string resets footer. Can optionally specify odd or even pages.
.f1 First footer line (if using multiple-line footer)
.f2 Second footer line
.f3 Third footer line
.f4 Fourth footer line
.f5 Fifth footer line
.go Go to top or bottom of document
.he If followed by text string, sets header line; without a string resets header. Can optionally specify odd or even pages.
.h1 First header line (if using multiple-line header)
.h2 Second header line
.h3 Third header line
.h4 Fourth header line
.h5 Fifth header line
.hm n Set header margin (number of lines left blank between last header line and top of main text; default 2; header margin and header lines must fit within top margin)
.hy Turn auto-hyphenating on or off
.if condition Conditional clause: can test string variables with = < > <> and number variables with #= #<> #< #> and print/execute following lines up to .ei if true, and optional else clause starting with .el if false. May be nested up to 255 levels.
.ig string Ignore text on remainder of line (same as .. for comments).
.ix Puts text on remainder of line in index. Main entries and subentries can be separated with comma. If text starts with - it's used as cross-reference, and if it starts with + the page number is boldfaced.
.kr Adjust kerning
.l# Turns line numbering on/off or specifies attributes of line numbering
.lh n Set line height to n 1/48-inch increments. Argument of a sets auto-leading.
.lm n Set left margin to n characters
.lq Letter quality on/off
.ls n Set line spacing to n (1-9), where 1 is single-spaced, 2 double-spaced, etc.
.ma Math: Store result of a calculation in a variable
.mb n Set bottom margin (must be big enough to include all footers)
.mt n Set top margin (must be big enough to include all headers)
.oc Turn centering on or off
.oj Turn output justification on or off
.op Omit page numbers
.p# Set paragraph number or formatting of paragraph numbers.
.pa Page break
.pc n Put page numbers at column n, or centered if 0 is used.
.pe Print endnotes at this point in document
.pf Turn paragraph realignment on or off
.pg Turn on page numbering (reverses .op)
.pl n Set page length to n lines (usually 66 for normal letter paper)
.pm n Set paragraph margin (indenting)
.pn n Set current page number
.po n Set page offset (added to left margin), optionally separately for even or odd pages
.pr Set print orientation (.pr or=l for landscape, .pr or=p for portrait)
.ps Turn proportional spacing on or off
.rm n Set right margin
.rp Repeat-print multiple copies (may not work with .df command)
.rr Embed ruler line (may add number from 0 to 9 to specify preformatted ruler from user area)
.rv Read variable
.sb Suppress blank lines (on/off)
.sr n subscript/superscript roll (in 1/48ths of an inch), default 3
.sv Set variable
.tb Set tab stops
.tc string The string is set as a table of contents entry; # is used to indicate where page number is inserted
.t1 string Table of contents entry 1
.t2 string Table of contents entry 2
.t3 string Table of contents entry 3
.t4 string Table of contents entry 4
.t5 string Table of contents entry 5
.t6 string Table of contents entry 6
.t7 string Table of contents entry 7
.t8 string Table of contents entry 8
.t9 string Table of contents entry 9
.uj Turn on or off micro-justify (spreads right-justify space in very fine increments)
.ul Turn on or off continuous underlining (of blanks between words)
.xe Up to 5 bytes following command define custom print control for Ctrl-E
.xq Up to 5 bytes following command define custom print control for Ctrl-Q
.xr Up to 5 bytes following command define custom print control for Ctrl-R
.xw Up to 5 bytes following command define custom print control for Ctrl-W
.xl Form feed with controls defined by hex pairs following command
.xx c Set strikeout character

Format documentation

Software

Manuals

Other links

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox