Plain text

From Just Solve the File Format Problem
(Difference between revisions)
Jump to: navigation, search
(infobox cleanup)
Line 7: Line 7:
 
Plain text in no particular format. See [[Text-based data]] for some structured formats that are stored in plain text (and hence can be opened in a plain text editor if no more specific program is available).
 
Plain text in no particular format. See [[Text-based data]] for some structured formats that are stored in plain text (and hence can be opened in a plain text editor if no more specific program is available).
  
Text files might be in any [[Character Encoding]].  Traditionally, [[ASCII]] was used much of the time for maximum interoperability, but for non-English text an encoding supporting a broader character repertoire is needed, often [[UTF-8]] nowadays. Another point of contention or incompatibility in text-file formats is the conventions for line and paragraph breaks. Traditionally, text files were designed to fit on an 80-column-wide screen (or 40-column in the case of some early personal computers), with hard line breaks between each line, but different operating systems varied in how a line break was stored, between CR+LF (ASCII 13 and 10 decimal) or just LF alone or CR alone.  Newer files are often stored with no line breaks except between paragraphs, expecting the editing/viewing programs to automaticaly wrap the lines for display.
+
Text files might be in any [[Character Encoding]].  Traditionally, [[ASCII]] was used much of the time for maximum interoperability, but for non-English text an encoding supporting a broader character repertoire is needed, often [[UTF-8]] nowadays.
 +
 
 +
Another point of contention or incompatibility in text-file formats is the conventions for line and paragraph breaks. Depending on what system the file was created on or intended to be viewed on, line breaks may be done as Carriage Return (ASCII 0D hex) and Linefeed (ASCII 0A hex) together (usually in that order, though in rare cases in the opposite order), or just one of those characters alone. Some text viewing or editing programs that are not cross-platform-friendly will really mess up badly in attempting to view/edit files using a different line break convention than the program expects, so you might see lines overwriting one another instead of going to the next line, or peculiar control characters show up within the file, or other strangeness.
 +
 
 +
Files may also use hard line breaks at a fixed number of columns (usually 80, but other values such as 40 or 65 are used sometimes), or just have line breaks at the end of paragraphs and expect systems to word-wrap long lines; encountering files of a different convention than you expect may result in lines running way off to the right of the screen and requiring horizontal scrolling, or else short, choppy lines. Many text editors have a "paragraph reformat" command to bring paragraphs into compliance with your desired conventions.
  
 
Most operating systems include a simple text editor (e.g., Windows Notepad) which can open text files, but many other text editors exist (and computer people sometimes have "holy wars" over which one is best).  Some text editors include EMACS, vi, and UltraEdit.
 
Most operating systems include a simple text editor (e.g., Windows Notepad) which can open text files, but many other text editors exist (and computer people sometimes have "holy wars" over which one is best).  Some text editors include EMACS, vi, and UltraEdit.

Revision as of 16:46, 15 November 2012

File Format
Name Plain text
Ontology
Extension(s) .txt
MIME Type(s) text/plain

Plain text in no particular format. See Text-based data for some structured formats that are stored in plain text (and hence can be opened in a plain text editor if no more specific program is available).

Text files might be in any Character Encoding. Traditionally, ASCII was used much of the time for maximum interoperability, but for non-English text an encoding supporting a broader character repertoire is needed, often UTF-8 nowadays.

Another point of contention or incompatibility in text-file formats is the conventions for line and paragraph breaks. Depending on what system the file was created on or intended to be viewed on, line breaks may be done as Carriage Return (ASCII 0D hex) and Linefeed (ASCII 0A hex) together (usually in that order, though in rare cases in the opposite order), or just one of those characters alone. Some text viewing or editing programs that are not cross-platform-friendly will really mess up badly in attempting to view/edit files using a different line break convention than the program expects, so you might see lines overwriting one another instead of going to the next line, or peculiar control characters show up within the file, or other strangeness.

Files may also use hard line breaks at a fixed number of columns (usually 80, but other values such as 40 or 65 are used sometimes), or just have line breaks at the end of paragraphs and expect systems to word-wrap long lines; encountering files of a different convention than you expect may result in lines running way off to the right of the screen and requiring horizontal scrolling, or else short, choppy lines. Many text editors have a "paragraph reformat" command to bring paragraphs into compliance with your desired conventions.

Most operating systems include a simple text editor (e.g., Windows Notepad) which can open text files, but many other text editors exist (and computer people sometimes have "holy wars" over which one is best). Some text editors include EMACS, vi, and UltraEdit.

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox