ARC (compression format)

From Just Solve the File Format Problem
(Difference between revisions)
Jump to: navigation, search
(See also)
(Format details)
(22 intermediate revisions by 3 users not shown)
Line 6: Line 6:
 
|released=1985
 
|released=1985
 
}}
 
}}
'''ARC''' is a compressed archive format. It supports a number of different compression schemes, the most common of which are based on [[LZW]].
+
'''ARC''' is a compressed archive format, mostly used in MS/PC-DOS, though a CP/M version also existed. It supports a number of different compression schemes, the most common of which are based on [[LZW]].
  
 
== Discussion ==
 
== Discussion ==
Line 19: Line 19:
  
 
== Format details ==
 
== Format details ==
An ARC file consists of a sequence of zero or more archive members, followed by an end-of-archive marker: the bytes <code>0x1a 0x00</code>. It is common for ARC files to have padding or other irrelevant data after the end-of-archive marker.
+
An ARC file consists of a sequence of zero or more archive members, followed by an end-of-archive marker: the bytes <code>0x1a 0x00</code>. It is common for ARC files to have padding or other data after the end-of-archive marker.
  
Each member begins with a 0x1a byte, then a byte indicating the compression method used for that member file. The usual compression methods are in the range 0x01 through 0x09.
+
Each member begins with a <code>0x1a</code> byte, then a byte indicating the compression method used for that member file. (For files beginning with <code>0x1b</code>, see [[ArcMac]].)
 +
 
 +
=== Compression methods ===
 +
The compression method byte identifies a member's compression method, and/or other information about the type of member. The usual compression methods are in the range 1 through 9.
 +
 
 +
Unfortunately, there are several different compression methods named "crunched" or "Crunched".
 +
 
 +
{| class="wikitable"
 +
! ID !! Name(s) !! Description and remarks
 +
|-
 +
|0 || End-of-archive marker ||
 +
|-
 +
|1 || Uncompressed || With old-style header.
 +
|-
 +
|2 || Uncompressed || With new-style header.
 +
|-
 +
|3 || Packed || [[RLE90]]
 +
|-
 +
|4 || Squeezed,<br>Packed+Squeezed || [[RLE90]] + [[Huffman coding|Huffman]]. See [[Squeeze#Compressed data section|Squeeze]] for more information.
 +
|-
 +
|5 || crunched || Hashed [[LZW]] (old hash). Derived from [[LZWCOM]]. Introduced in ARC v4.00.
 +
|-
 +
|6 || crunched,<br>Packed+crunched || [[RLE90]] + method 5. Similar to [[Crunch|CP/M Crunch]] v1.x format. Introduced in ARC v4.10.
 +
|-
 +
|7 || crunched,<br>Packed+crunched || [[RLE90]] + hashed [[LZW]] (new hash). Source code comment says "inadvertent release of a developmental copy forces us to leave [method 7] in".
 +
|-
 +
|8 || Crunched,<br>Packed+Crunched || [[RLE90]] + [[LZW]]. The LZW layer has a dynamic code size. There is a header byte giving the maximum LZW code size, but only 12 bits is generally supported. Introduced in ARC v5.00. This is probably the most common compression method.
 +
|-
 +
|9 || Squashed,<br>Deviant || [[LZW]]. Used by PKARC/PKPAK.
 +
|-
 +
|10 || Trimmed || [[RLE90]] +  {[[LZ77 with Huffman coding|LZH]] with [[adaptive Huffman coding]]}. Supported by ARC 7.x.
 +
|-
 +
|10 || Crushed ||rowspan="2"| [[PAK (ARC extension)|PAK]] extensions: Refer to [[PAK (ARC extension)#Compression methods]].
 +
|-
 +
|11 || Distilled
 +
|-
 +
|20-29 || || Used/reserved for informational items
 +
|-
 +
|20 || Archive info ||
 +
|-
 +
|21 || Extended file info ||
 +
|-
 +
|22 || OS-specific info ||
 +
|-
 +
|30-39 || || Used/reserved for "control" items
 +
|-
 +
|30 || Subdir || Nested ARC-like format. Created by the "z" option introduced in ARC v6.
 +
|-
 +
|31 || End-of-subdir marker ||
 +
|-
 +
|72 || || No known use in ARC, but see [[Hyper archive]].
 +
|-
 +
|83 || || No known use in ARC, but see [[Hyper archive]].
 +
|-
 +
|≥128 || || Refer to [[Spark]].
 +
|}
 +
 
 +
=== ARC Plus ===
 +
The ARC v7 software is named "ARC Plus" or "ARC+Plus". By default, it uses Trimmed compression, and the files it creates begin with an archive info item ("compression method" 20). Some format identification tools identify files beginning this way as "ARC+" format.
 +
 
 +
ARC v6.02 understands archive info items, though it's unclear if it ever creates them. It does not support Trimmed decompression.
  
 
=== PK-style comments ===
 
=== PK-style comments ===
Line 27: Line 87:
  
 
''Information based on reverse engineering:'' An ARC file with comments ends with an 8-byte trailer that begins with the signature {{magic|'P' 'K' 0xaa 0x55}}. This is preceded by a sequence of 32-byte ''records'', each containing a comment, except for one that has a special purpose. The last 4 bytes of the file contain the offset of the special record. The special record somehow indicates whether an archive comment and/or file comments are present. An archive comment, if present, is in the record preceding the special record. File comment records come after the special record, in the same order as the members appear in the ARC file.
 
''Information based on reverse engineering:'' An ARC file with comments ends with an 8-byte trailer that begins with the signature {{magic|'P' 'K' 0xaa 0x55}}. This is preceded by a sequence of 32-byte ''records'', each containing a comment, except for one that has a special purpose. The last 4 bytes of the file contain the offset of the special record. The special record somehow indicates whether an archive comment and/or file comments are present. An archive comment, if present, is in the record preceding the special record. File comment records come after the special record, in the same order as the members appear in the ARC file.
 +
 +
=== PAK extended records ===
 +
This is another kind of data that can appear after the end-of-archive marker. Refer to [[PAK (ARC extension)]].
  
 
== Identifiers ==
 
== Identifiers ==
Line 36: Line 99:
 
* [[PAK (ARC extension)]]
 
* [[PAK (ARC extension)]]
 
* [[Spark]]
 
* [[Spark]]
 +
* [[ArcMac]]
 
* [[RLE90]]
 
* [[RLE90]]
 +
* [[AXE (executable compression)]] - Another SEA product
 +
 +
Other formats called ARC (or something similar) are listed at [[ARC]].
  
 
== Specifications ==
 
== Specifications ==
* [http://www.fileformat.info/format/arc/index.dir Page at FileFormat.info]
+
* [https://www.fileformat.info/format/arc/corion.htm The ARC Archive File Format], from Corion.net and FileFormat.Info.
 
* [http://www.textfiles.com/programming/FORMATS/arc_fmts.txt ARC file header format (among other archive types)]
 
* [http://www.textfiles.com/programming/FORMATS/arc_fmts.txt ARC file header format (among other archive types)]
 
* [http://apple2.org.za/gswv/a2zine/GS.WorldView/Resources/The.MacShrinkIt.Project/ARCHIVES.TXT Archive format info, including ARC]
 
* [http://apple2.org.za/gswv/a2zine/GS.WorldView/Resources/The.MacShrinkIt.Project/ARCHIVES.TXT Archive format info, including ARC]
 
* {{CdTextfiles|simtel/simtel20/MSDOS/ARCHIVRS/ARC_FILE.INF|ARC-FILE.INF}}
 
* {{CdTextfiles|simtel/simtel20/MSDOS/ARCHIVRS/ARC_FILE.INF|ARC-FILE.INF}}
 +
* {{CdTextfiles|currentsw199407/compress/tm0402.zip|SEA Technical Memorandum #0402}}: ARC 6.02 Extended Data
  
 
== Sample files ==
 
== Sample files ==
 
* {{CdTextfilesURL|powerprogramming/PROGTOOL/}} ...
 
* {{CdTextfilesURL|powerprogramming/PROGTOOL/}} ...
* [http://www.dan.info/sampledata/CHRISTIE.ARC CHRISTIE.ARC]
+
* [https://www.dan.info/sampledata/CHRISTIE.ARC CHRISTIE.ARC]
* [http://www.dan.info/sampledata/POLIPREF.ARC POLIPREF.ARC]
+
* [https://www.dan.info/sampledata/POLIPREF.ARC POLIPREF.ARC]
 
* {{CdTextfilesURL|pcblue/}}
 
* {{CdTextfilesURL|pcblue/}}
 +
* https://telparia.com/fileFormatSamples/archive/arc/
  
 
== Programs and Utilities ==
 
== Programs and Utilities ==
 
* [http://www.svgalib.org/rus/nomarch.html nomarch] by Russell Marks, c. 2001 (Unix/GPL2) -- extract only.
 
* [http://www.svgalib.org/rus/nomarch.html nomarch] by Russell Marks, c. 2001 (Unix/GPL2) -- extract only.
 
** Packaged for Debian-based Linux distributions: <tt>apt-get install nomarch</tt>
 
** Packaged for Debian-based Linux distributions: <tt>apt-get install nomarch</tt>
* [https://sourceforge.net/projects/arc/ ARC] - Portable (Unix, etc.) software based on ARC source code
+
* [https://sourceforge.net/projects/arc/ ARC (for Unix)] - Portable software based on ARC source code
 
* ARC - DOS binaries
 
* ARC - DOS binaries
 
** {{CdTextfiles|rbbsv3n1/atnu/arc51.exe|v5.10}} (1986-01-31) (bare executable)
 
** {{CdTextfiles|rbbsv3n1/atnu/arc51.exe|v5.10}} (1986-01-31) (bare executable)
Line 61: Line 130:
 
** {{CdTextfiles|hof91/arc/arc602.exe|v6.02}} (Dated "January of 1989", but timestamps suggest 1989-03-14.)
 
** {{CdTextfiles|hof91/arc/arc602.exe|v6.02}} (Dated "January of 1989", but timestamps suggest 1989-03-14.)
 
** {{CdTextfiles|rbbsv3n1/atnu/arc602.exe|v6.02}} - Same ARC.EXE as above, but has updated documentation (1989-04-21) and other differences.
 
** {{CdTextfiles|rbbsv3n1/atnu/arc602.exe|v6.02}} - Same ARC.EXE as above, but has updated documentation (1989-04-21) and other differences.
 +
** [http://old-dos.ru/index.php?page=files&mode=files&do=show&id=699 Various versions, at old-dos.ru]
 
* {{CdTextfiles|pcmedic/utils/compress/arc520s.zip|ARC v5.20 source code}}
 
* {{CdTextfiles|pcmedic/utils/compress/arc520s.zip|ARC v5.20 source code}}
 +
* XARC - A minimal ARC extractor from SEA (DOS binaries)
 +
** {{CdTextfiles|carousel/013C/XARC.ZIP|v4.31}} (1985-10-10)
 +
** {{CdTextfiles|microhaus/mhblackbox3/ARCHIVER/XARC500.ZIP|v5.00}} (1986-01-23)
 +
** {{CdTextfiles|microhaus/mhblackbox3/ARCHIVER/XARC71.ZIP|v7.1}} (1990-10)
 +
** {{CdTextfiles|pier/pier01/001a/xarc712.zip|v7.12}} (1990-10?)
 +
* ARCE (or ARC-E), by Wayne Chin and Vernon D. Buerg - An optimized ARC extractor for DOS
 +
** {{CdTextfiles|simtel/simtel9510/disk2/DISC2/ARCHIVER/ARCE41A.ZIP|ARCE v4.1a}} (1992-04-12)
 +
** Included with some versions of ARC.
 
* PKARC/PKPAK (DOS binaries)
 
* PKARC/PKPAK (DOS binaries)
 
** {{CdTextfiles|microhaus/mhblackbox3/ARCHIVER/PK36.EXE|PKARC 3.6}}
 
** {{CdTextfiles|microhaus/mhblackbox3/ARCHIVER/PK36.EXE|PKARC 3.6}}
 
** {{CdTextfiles|microhaus/mhblackbox3/ARCHIVER/PK361.EXE|PKPAK 3.61}}
 
** {{CdTextfiles|microhaus/mhblackbox3/ARCHIVER/PK361.EXE|PKPAK 3.61}}
 +
* PAK - See [[PAK (ARC extension)#Software]].
 
* {{CdTextfiles|simtel/simtel9510/disk2/DISC2/ARCHIVER/SQUASH.ZIP|SQUASH.ZIP}} - Public domain decompression code for one of the compression methods
 
* {{CdTextfiles|simtel/simtel9510/disk2/DISC2/ARCHIVER/SQUASH.ZIP|SQUASH.ZIP}} - Public domain decompression code for one of the compression methods
  

Revision as of 14:41, 13 June 2021

File Format
Name ARC (compression format)
Ontology
Extension(s) .arc, .ark
UTI public.archive.arc
Wikidata ID Q296496
Released 1985

ARC is a compressed archive format, mostly used in MS/PC-DOS, though a CP/M version also existed. It supports a number of different compression schemes, the most common of which are based on LZW.

Contents

Discussion

ARC was for a time (1985-89) the leading file archiving and file compression format in the BBS world, replacing the formats used by earlier utilities which generally only did one of the two functions (either combining multiple files in one file for convenient download, or shortening the file length to take less download time and disk space). Combining the two functions in one utility simplified the process of preparing files for download and extracting them at the other end, leading to a rapid rise in popularity for the utility (also called ARC) and format both.

However, the ARC format suffered an equally rapid decline in its popularity after the company that published the ARC utility (called System Enhancement Associates or SEA, run by Thom Henderson who was very active in FidoNet) brought a successful trademark and copyright suit against rival Phil Katz, whose PKARC and PKXARC utilities were compatible with the ARC file format. The lawsuit was widely regarded by the BBS community as being a "David vs. Goliath" case of a faceless corporation bullying a "little guy", though in fact both companies were small, home-based operations. Nevertheless, the fallout from the suit led to rapid adoption of the competing ZIP format, introduced by Katz in 1989, and ARC files are no longer commonly encountered.

The fact that archives from an early period of BBSing are often in this format encourages bad puns referring to those who trawl such old archives as "Raiders of the lost ARC."

Disambiguation

There are, unfortunately, also several other incompatible file formats that have been used over the years with an "ARC" designation or file extension, so it's possible that a data set that is purportedly of type "ARC" is not actually of this format. Others include the FreeArc format and the Internet Archive ARC format, as well as a Commodore ARC that's similar in concept but not compatible to any of the other ARCs.

Format details

An ARC file consists of a sequence of zero or more archive members, followed by an end-of-archive marker: the bytes 0x1a 0x00. It is common for ARC files to have padding or other data after the end-of-archive marker.

Each member begins with a 0x1a byte, then a byte indicating the compression method used for that member file. (For files beginning with 0x1b, see ArcMac.)

Compression methods

The compression method byte identifies a member's compression method, and/or other information about the type of member. The usual compression methods are in the range 1 through 9.

Unfortunately, there are several different compression methods named "crunched" or "Crunched".

ID Name(s) Description and remarks
0 End-of-archive marker
1 Uncompressed With old-style header.
2 Uncompressed With new-style header.
3 Packed RLE90
4 Squeezed,
Packed+Squeezed
RLE90 + Huffman. See Squeeze for more information.
5 crunched Hashed LZW (old hash). Derived from LZWCOM. Introduced in ARC v4.00.
6 crunched,
Packed+crunched
RLE90 + method 5. Similar to CP/M Crunch v1.x format. Introduced in ARC v4.10.
7 crunched,
Packed+crunched
RLE90 + hashed LZW (new hash). Source code comment says "inadvertent release of a developmental copy forces us to leave [method 7] in".
8 Crunched,
Packed+Crunched
RLE90 + LZW. The LZW layer has a dynamic code size. There is a header byte giving the maximum LZW code size, but only 12 bits is generally supported. Introduced in ARC v5.00. This is probably the most common compression method.
9 Squashed,
Deviant
LZW. Used by PKARC/PKPAK.
10 Trimmed RLE90 + {LZH with adaptive Huffman coding}. Supported by ARC 7.x.
10 Crushed PAK extensions: Refer to PAK (ARC extension)#Compression methods.
11 Distilled
20-29 Used/reserved for informational items
20 Archive info
21 Extended file info
22 OS-specific info
30-39 Used/reserved for "control" items
30 Subdir Nested ARC-like format. Created by the "z" option introduced in ARC v6.
31 End-of-subdir marker
72 No known use in ARC, but see Hyper archive.
83 No known use in ARC, but see Hyper archive.
≥128 Refer to Spark.

ARC Plus

The ARC v7 software is named "ARC Plus" or "ARC+Plus". By default, it uses Trimmed compression, and the files it creates begin with an archive info item ("compression method" 20). Some format identification tools identify files beginning this way as "ARC+" format.

ARC v6.02 understands archive info items, though it's unclear if it ever creates them. It does not support Trimmed decompression.

PK-style comments

The PKARC/PKPAK software supports comments, apparently using a custom ARC format extension that appears after the end-of-archive marker.

Information based on reverse engineering: An ARC file with comments ends with an 8-byte trailer that begins with the signature 'P' 'K' 0xaa 0x55. This is preceded by a sequence of 32-byte records, each containing a comment, except for one that has a special purpose. The last 4 bytes of the file contain the offset of the special record. The special record somehow indicates whether an archive comment and/or file comments are present. An archive comment, if present, is in the record preceding the special record. File comment records come after the special record, in the same order as the members appear in the ARC file.

PAK extended records

This is another kind of data that can appear after the end-of-archive marker. Refer to PAK (ARC extension).

Identifiers

  • File extension: .ARC (or conventionally .ARK on CP/M)
  • MIME type (Internet media type): Has no specific registered type; generic binary application/octet-stream is generally used, or perhaps unregistered custom types with an x- prefix
  • Uniform Type Identifier (Apple): public.archive.arc

See also

Other formats called ARC (or something similar) are listed at ARC.

Specifications

Sample files

Programs and Utilities

  • nomarch by Russell Marks, c. 2001 (Unix/GPL2) -- extract only.
    • Packaged for Debian-based Linux distributions: apt-get install nomarch
  • ARC (for Unix) - Portable software based on ARC source code
  • ARC - DOS binaries
    • v5.10 (1986-01-31) (bare executable)
    • v5.12 (1986-02-05) (For self-extraction to work, must be renamed to "ARC51.COM".)
    • v5.20 (1986-10-24)
    • v6.01 (Dated "January of 1989", but timestamps suggest 1989-02-23.)
    • v6.02 (Dated "January of 1989", but timestamps suggest 1989-03-14.)
    • v6.02 - Same ARC.EXE as above, but has updated documentation (1989-04-21) and other differences.
    • Various versions, at old-dos.ru
  • ARC v5.20 source code
  • XARC - A minimal ARC extractor from SEA (DOS binaries)
  • ARCE (or ARC-E), by Wayne Chin and Vernon D. Buerg - An optimized ARC extractor for DOS
    • ARCE v4.1a (1992-04-12)
    • Included with some versions of ARC.
  • PKARC/PKPAK (DOS binaries)
  • PAK - See PAK (ARC extension)#Software.
  • SQUASH.ZIP - Public domain decompression code for one of the compression methods

References

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox