LHA

From Just Solve the File Format Problem
(Difference between revisions)
Jump to: navigation, search
(27 intermediate revisions by 4 users not shown)
Line 1: Line 1:
 
{{FormatInfo
 
{{FormatInfo
 
|subcat=Archiving
 
|subcat=Archiving
|extensions={{ext|lha}}, {{ext|lzh}}
+
|extensions={{ext|lha}}, {{ext|lzh}}, {{ext|lzs}}
 
|mimetypes={{mimetype|application/x-lzh-compressed}}
 
|mimetypes={{mimetype|application/x-lzh-compressed}}
 
|pronom={{PRONOM|fmt/626}}
 
|pronom={{PRONOM|fmt/626}}
 +
|wikidata={{wikidata|Q368782}}
 
|kaitai struct=lzh
 
|kaitai struct=lzh
 
|released=1988
 
|released=1988
 
}}
 
}}
'''LHA''' is an archiving program and file format created by Haruyasu Yoshizaki in 1988. It was originally called LHArc, then was briefly LH before settling on LHA. In the 1990s it was the most popular archiving format on the Amiga platform, and also got some use on the PC platform including in the installers for id Software games such as Doom and Quake, because [[ZIP]] compression was inferior until the release of PKZIP 2.0, which brought the formats to parity. At present, it is mostly used in Japan.
+
'''LHA''' is an archiving program and file format created by Haruyasu Yoshizaki (a.k.a. Yoshi) in 1988. It was originally called LHarc, then was briefly LH before settling on LHA. In the 1990s, it was the most popular archiving format on the Amiga platform. It also got some use on the PC platform including in the installers for id Software games such as Doom and Quake, because [[ZIP]] compression was inferior until the release of PKZIP 2.0, which brought the formats to parity.
  
The file format is also known as '''LZH'''.
+
It was particularly popular in Japan. Most of the best information about it is in Japanese.
 +
 
 +
It supports a number of different compression schemes, most of which use [[LZ77]] combined with [[Huffman coding]].
 +
 
 +
The file format is also known as '''LZH'''. See the [[LZH|LZH disambiguation page]] for other "LZH" formats.
  
 
== Format details ==
 
== Format details ==
LHA consists of a sequence of elements, each representing a member file or directory. There is no global archive-level header.
+
An LHA file consists of a sequence of elements, each representing a member file or directory. There is no global archive-level header.
  
There are at least four different formats that an element can have. (Note that this is independent of compression schemes.) In LHA jargon, the formats are known as "header levels". The formats are sometimes called "header level 0", "1", "2", and "3".
+
There are at least four different formats that an element can have. (Note that this is independent of compression schemes.) In LHA jargon, the formats are known as "header levels", and are usually called "header level 0", "... 1", "... 2", and "... 3".
  
 
The format of an element is determined by the byte at offset 20 from the beginning of that element. It is possible for different formats to be used in the same LHA file.
 
The format of an element is determined by the byte at offset 20 from the beginning of that element. It is possible for different formats to be used in the same LHA file.
  
The formats are similar, but infuriatingly different. They don't even follow the same principles with respect to parsing logic.
+
The formats are similar, but irritatingly different. They don't even follow the same principles with respect to how they must be parsed.
 +
 
 +
=== Compression schemes ===
 +
The compression scheme of an element is identified by the alphanumeric bytes of its ''compression method'' field. Known compression schemes:
 +
 
 +
{| class="wikitable"
 +
! ID !! Category !! Description and remarks
 +
|-
 +
|<code>lh0</code> || || Uncompressed
 +
|-
 +
|<code>lh1</code> || || LZ77+Huffman, 4k window, dynamic Huffman for ''codes'' (a code can be a literal or a length, depending on its value), offsets use a pre-defined Huffman tree.
 +
|-
 +
|<code>lh2</code> || || LZ77+Huffman, 8k window, dynamic Huffman for codes and offsets. Considered obsolete.
 +
|-
 +
|<code>lh3</code> || || LZ77+Huffman, 8k window, static Huffman for codes, offsets can use static Huffman or a pre-defined Huffman tree. Considered obsolete.
 +
|-
 +
|<code>lh4</code> || || Like lh5, but 4k window
 +
|-
 +
|<code>lh5</code> || || LZ77+Huffman, 8k window, static Huffman for codes and offsets
 +
|-
 +
|<code>lh6</code> || || Like lh5, but 32k window
 +
|-
 +
|<code>lh7</code> || || Like lh5, but 64k window
 +
|-
 +
|<code>lh7</code> || LHARK extension || Refer to [[LHARK]].
 +
|-
 +
|<code>lh8</code> ||rowspan="5"| Joe Jared extensions || Like lh5, but 64k window. (Same as lh7.)
 +
|-
 +
|<code>lh9</code> || Like lh5, but 128k window. Probably never used.
 +
|-
 +
|<code>lha</code> || Like lh5, but 256k window. Probably never used.
 +
|-
 +
|<code>lhb</code> || Like lh5, but 512k window. Probably never used.
 +
|-
 +
|<code>lhc</code> || Like lh5, but 1M window. Probably never used.
 +
|-
 +
|<code>lhd</code> || Special || Not a compression scheme. Indicates that the element represents a subdirectory.
 +
|-
 +
|<code>lhe</code> || Joe Jared extensions || Like lh5, but 2M window. Probably never used.
 +
|-
 +
|<code>lhx</code> || UNLHA32 extension ||
 +
|-
 +
|<code>lz2</code> ||rowspan="7"| LArc methods ||
 +
|-
 +
|<code>lz3</code> ||
 +
|-
 +
|<code>lz4</code> || Uncompressed
 +
|-
 +
|<code>lz5</code> || LZ77/[[LZSS]], 4k window. Almost identical to "SZDD" used in [[MS-DOS installation compression]].
 +
|-
 +
|<code>lz7</code> ||
 +
|-
 +
|<code>lz8</code> ||
 +
|-
 +
|<code>lzs</code> || LZ77/[[LZSS]], 2k window
 +
|-
 +
|<code>lZ0</code> ||rowspan="3"| PUT/GET variants ||rowspan="3"| Refer to [[PUT]].
 +
|-
 +
|<code>lZ1</code>
 +
|-
 +
|<code>lZ5</code>
 +
|-
 +
|<code>pc1</code> ||rowspan="5"| PMarc extensions ||rowspan="5"| Refer to [[PMA]].
 +
|-
 +
|<code>pm0</code>
 +
|-
 +
|<code>pm1</code>
 +
|-
 +
|<code>pm2</code>
 +
|-
 +
|<code>pms</code>
 +
|}
 +
 
 +
The Wikipedia article has [[Wikipedia:LHA (file format)#Compression methods|more information]] about some of the schemes.
 +
 
 +
=== Extended headers ===
 +
For header levels 1 and higher, each member file has an associated list of "extended headers", similar to [[ZIP#Extensible data fields|ZIP's extensible data fields]]. Each extended header is tagged with a single byte indicating its type. Extended headers are used to store platform-specific metadata, and to extend the format in other ways.
 +
 
 +
* [https://web.archive.org/web/20110912035449/http://homepage1.nifty.com:80/dangan/en/Content/Program/Java/jLHA/Notes/ExtHeaderList.html List of extended headers] (from archive.org)
 +
* [https://github.com/libarchive/libarchive/blob/master/libarchive/archive_read_support_format_lha.c libarchive: archive_read_support_format_lha.c] (look for "EXT_HEADER_CRC")
 +
 
 +
Header level 0 supports extended data in a more limited way. It allows for just one set of extended header fields (called the "extended area"), the content of which is determined by the initial one-byte "OS type" field.
 +
 
 +
* [https://web.archive.org/web/20110909114523/http://homepage1.nifty.com:80/dangan/en/Content/Program/Java/jLHA/Notes/ExtendArea.html Extended area] (from archive.org)
  
 
== Identification ==
 
== Identification ==
 
Bytes {{magic|'-' 'l' ?? ?? '-'}} appear at offset 2. This is not a global file signature, but represents the compression scheme of the first member file of the archive.
 
Bytes {{magic|'-' 'l' ?? ?? '-'}} appear at offset 2. This is not a global file signature, but represents the compression scheme of the first member file of the archive.
 +
 +
If you consider [[PMA]] to be a form of LHA, then the second of these bytes can also be <code>'p'</code>.
 +
 +
== See also ==
 +
* [[PMA]]
 +
* [[LHARK]]
 +
* [[LHice]]
 +
* [[PUT]]
  
 
== Format documentation ==
 
== Format documentation ==
Line 29: Line 125:
 
* [http://apple2.org.za/gswv/a2zine/GS.WorldView/Resources/The.MacShrinkIt.Project/ARCHIVES.TXT Archive format info]
 
* [http://apple2.org.za/gswv/a2zine/GS.WorldView/Resources/The.MacShrinkIt.Project/ARCHIVES.TXT Archive format info]
 
* [http://www.textfiles.com/programming/FORMATS/arc_fmts.txt LZH file header format (among other archive types)]
 
* [http://www.textfiles.com/programming/FORMATS/arc_fmts.txt LZH file header format (among other archive types)]
 +
* [http://www33146ue.sakura.ne.jp/staff/iz/formats/lzh.html LZH format]
 +
* [https://hwiegman.home.xs4all.nl/fileformats/lzh/lzhformat.html LZH format] (Aeco Systems)
 
* [https://github.com/libarchive/libarchive/blob/master/libarchive/archive_read_support_format_lha.c libarchive: archive_read_support_format_lha.c] - Has comments with information about the header formats
 
* [https://github.com/libarchive/libarchive/blob/master/libarchive/archive_read_support_format_lha.c libarchive: archive_read_support_format_lha.c] - Has comments with information about the header formats
  
Line 36: Line 134:
 
* [http://www.ponsoftware.com/en/ Explzh for Windows]
 
* [http://www.ponsoftware.com/en/ Explzh for Windows]
 
* [https://web.archive.org/web/20130906133859/http://homepage1.nifty.com/dangan/en/Content/Program/Java/jLHA/jLHA.html Java library] (from archive.org)
 
* [https://web.archive.org/web/20130906133859/http://homepage1.nifty.com/dangan/en/Content/Program/Java/jLHA/jLHA.html Java library] (from archive.org)
* [https://www.libarchive.org/ libarchive]
+
* [https://www.libarchive.org libarchive]
 +
* [http://lha.osdn.jp/ LHa for Unix] · [https://github.com/jca02266/lha GitHub project]
 +
** [https://web.archive.org/web/20200301124852/http://www2m.biglobe.ne.jp/~dolphin/lha/lha.htm LHa for Unix (Tsukao Okamoto)] (from archive.org)
 +
* [https://micco.mars.jp/mysoft/unlha32.htm UNLHA32.DLL] and [https://micco.mars.jp/mysoft/lhmelt.htm LHMelt]
 +
* LHarc/LHA
 +
** For DOS
 +
*** {{CdTextfiles|hof91/ARC/LH113C.EXE|LHarc v1.13c}} (1989-05-31) - English
 +
*** {{CdTextfiles|simtel/simtel20/MSDOS/ARCHIVRS/LHA213.EXE|LHA v2.13}} (1991-07-20) - English
 +
*** {{CdTextfiles|simtel/simtel9703/disk2/DISC2/ARCERS/LHA255E.EXE|LHA v2.55 English translation}} (1992-11-15) - (unofficial?)
 +
*** {{CdTextfiles|pdos9606/ARCHIVER/TOOLS/LHA255B.EXE|LHA v2.55b}} (1992-11-24) - Japanese (LHA.EXE) and English (LHA_E.EXE)
 +
*** [http://info.elf.stuba.sk/packages/pub/pc/pack/lha266.exe LHA v2.66 test version] (1994-12-30) - Japanese
 +
** For Windows console
 +
*** [http://info.elf.stuba.sk/packages/pub/pc/pack/lha267.exe LHA v2.67 test version] (1995-10-07) - Japanese
 +
** Source code
 +
*** {{CdTextfiles|simtel/simtel20/MSDOS/ARCHIVRS/LHARCSRC.ZIP|v1.13b}}
 +
*** {{CdTextfiles|simtel/simtel20/MSDOS/ARCHIVRS/LHA211SR.ZIP|v2.11}}
 +
* [http://www.manmrk.net/tutorials/compress/downloads/larc333.exe LArc v3.33] (May 19, 1989) (DOS software)
 +
* {{CdTextfiles|sourcecode/msdos/arc_lbr/ar002.zip|ar version "002" by Haruhiko Okumura}} (1990) (DOS binary + source code)
 +
* {{Deark}} (might be useful for analysis; doesn't decompress the format)
  
 
== Sample files ==
 
== Sample files ==
 +
* [https://github.com/fragglet/lhasa/tree/master/test/archives lhasa test files]
 
* [https://github.com/libarchive/libarchive/tree/master/libarchive/test libarchive test files] → test_read_format_lha_*.lzh.uu
 
* [https://github.com/libarchive/libarchive/tree/master/libarchive/test libarchive test files] → test_read_format_lha_*.lzh.uu
 +
* [http://aminet.net/ aminet]
 +
* {{CdTextfiles|hof91/}} ...
 +
* https://telparia.com/fileFormatSamples/archive/lha/hexify.lha
  
 
== Other links ==
 
== Other links ==
 
* [[Wikipedia:LHA (file format)|Wikipedia article]]
 
* [[Wikipedia:LHA (file format)|Wikipedia article]]

Revision as of 11:16, 9 August 2020

File Format
Name LHA
Ontology
Extension(s) .lha, .lzh, .lzs
MIME Type(s) application/x-lzh-compressed
PRONOM fmt/626
Wikidata ID Q368782
Kaitai Struct Spec lzh.ksy
Released 1988

LHA is an archiving program and file format created by Haruyasu Yoshizaki (a.k.a. Yoshi) in 1988. It was originally called LHarc, then was briefly LH before settling on LHA. In the 1990s, it was the most popular archiving format on the Amiga platform. It also got some use on the PC platform including in the installers for id Software games such as Doom and Quake, because ZIP compression was inferior until the release of PKZIP 2.0, which brought the formats to parity.

It was particularly popular in Japan. Most of the best information about it is in Japanese.

It supports a number of different compression schemes, most of which use LZ77 combined with Huffman coding.

The file format is also known as LZH. See the LZH disambiguation page for other "LZH" formats.

Contents

Format details

An LHA file consists of a sequence of elements, each representing a member file or directory. There is no global archive-level header.

There are at least four different formats that an element can have. (Note that this is independent of compression schemes.) In LHA jargon, the formats are known as "header levels", and are usually called "header level 0", "... 1", "... 2", and "... 3".

The format of an element is determined by the byte at offset 20 from the beginning of that element. It is possible for different formats to be used in the same LHA file.

The formats are similar, but irritatingly different. They don't even follow the same principles with respect to how they must be parsed.

Compression schemes

The compression scheme of an element is identified by the alphanumeric bytes of its compression method field. Known compression schemes:

ID Category Description and remarks
lh0 Uncompressed
lh1 LZ77+Huffman, 4k window, dynamic Huffman for codes (a code can be a literal or a length, depending on its value), offsets use a pre-defined Huffman tree.
lh2 LZ77+Huffman, 8k window, dynamic Huffman for codes and offsets. Considered obsolete.
lh3 LZ77+Huffman, 8k window, static Huffman for codes, offsets can use static Huffman or a pre-defined Huffman tree. Considered obsolete.
lh4 Like lh5, but 4k window
lh5 LZ77+Huffman, 8k window, static Huffman for codes and offsets
lh6 Like lh5, but 32k window
lh7 Like lh5, but 64k window
lh7 LHARK extension Refer to LHARK.
lh8 Joe Jared extensions Like lh5, but 64k window. (Same as lh7.)
lh9 Like lh5, but 128k window. Probably never used.
lha Like lh5, but 256k window. Probably never used.
lhb Like lh5, but 512k window. Probably never used.
lhc Like lh5, but 1M window. Probably never used.
lhd Special Not a compression scheme. Indicates that the element represents a subdirectory.
lhe Joe Jared extensions Like lh5, but 2M window. Probably never used.
lhx UNLHA32 extension
lz2 LArc methods
lz3
lz4 Uncompressed
lz5 LZ77/LZSS, 4k window. Almost identical to "SZDD" used in MS-DOS installation compression.
lz7
lz8
lzs LZ77/LZSS, 2k window
lZ0 PUT/GET variants Refer to PUT.
lZ1
lZ5
pc1 PMarc extensions Refer to PMA.
pm0
pm1
pm2
pms

The Wikipedia article has more information about some of the schemes.

Extended headers

For header levels 1 and higher, each member file has an associated list of "extended headers", similar to ZIP's extensible data fields. Each extended header is tagged with a single byte indicating its type. Extended headers are used to store platform-specific metadata, and to extend the format in other ways.

Header level 0 supports extended data in a more limited way. It allows for just one set of extended header fields (called the "extended area"), the content of which is determined by the initial one-byte "OS type" field.

Identification

Bytes '-' 'l' ?? ?? '-' appear at offset 2. This is not a global file signature, but represents the compression scheme of the first member file of the archive.

If you consider PMA to be a form of LHA, then the second of these bytes can also be 'p'.

See also

Format documentation

Software

Sample files

Other links

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox