LHA

From Just Solve the File Format Problem
(Difference between revisions)
Jump to: navigation, search
 
(50 intermediate revisions by 5 users not shown)
Line 1: Line 1:
 
{{FormatInfo
 
{{FormatInfo
 
|subcat=Archiving
 
|subcat=Archiving
|extensions={{ext|lha}}, {{ext|lzh}}, {{ext|lzs}}
+
|extensions={{ext|lzh}}, {{ext|lha}}
 
|mimetypes={{mimetype|application/x-lzh-compressed}}
 
|mimetypes={{mimetype|application/x-lzh-compressed}}
 
|pronom={{PRONOM|fmt/626}}
 
|pronom={{PRONOM|fmt/626}}
 
|wikidata={{wikidata|Q368782}}
 
|wikidata={{wikidata|Q368782}}
 
|kaitai struct=lzh
 
|kaitai struct=lzh
|released=1988
+
|released=≤1989
 
}}
 
}}
'''LHA''' is an archiving program and file format created by Haruyasu Yoshizaki (a.k.a. Yoshi) in 1988. It was originally called LHarc, then was briefly LH before settling on LHA. In the 1990s, it was the most popular archiving format on the Amiga platform. It also got some use on the PC platform including in the installers for id Software games such as Doom and Quake, because [[ZIP]] compression was inferior until the release of PKZIP 2.0, which brought the formats to parity.
+
'''LHA''' is a family of archiving programs, and their associated file format, created by Haruyasu Yoshizaki (a.k.a. Yoshi). The software was originally called '''LHarc''', then was briefly '''LH''' (v2.02–2.04), then '''LHa''' (v2.05–2.06), before settling on '''LHA''' (v2.10+). The file format is also known as '''LZH'''. See the [[LZH|LZH disambiguation page]] for other "LZH" formats.
 +
 
 +
== Discussion ==
 +
In the 1990s, LHA was the most popular archiving format on the Amiga platform. It also got some use on the PC platform, including in the installers for id Software games such as Doom and Quake, because [[ZIP]] compression was inferior until the release of PKZIP 2.0, which brought the formats to parity.
  
 
It was particularly popular in Japan. Most of the best information about it is in Japanese.
 
It was particularly popular in Japan. Most of the best information about it is in Japanese.
  
It supports a number of different compression schemes, most of which use [[LZ77]] combined with [[Huffman coding]].
+
It supports a number of different compression schemes, most of which use [[LZ77 with Huffman coding|LZ77 combined with Huffman coding]].
 +
 
 +
This article covers the format used by LHarc/LHA, as well as "generalized" LHA format: the same file format, but with other compression schemes. The generalized format was possibly designed by Kazuhiko Miki for [[LArc]], but confirmation of this is needed. If so, it was soon borrowed by LHarc, with new compression schemes.
  
The file format is also known as '''LZH'''. See the [[LZH|LZH disambiguation page]] for other "LZH" formats.
+
The format may have been released in 1988, but conclusive evidence is lacking. The earliest confirmed release date is 1989-02 for the [[LArc]] compression schemes (LArc v3.33), and 1989-03 for LHarc (v1.00).
  
 
== Format details ==
 
== Format details ==
An LHA file consists of a sequence of elements, each representing a member file or directory. There is no global archive-level header.
+
=== File structure ===
 +
An LHA file consists mainly of a sequence of elements, each representing a member file or directory. The sequence is usually terminated by an end-of-archive marker consisting of a single 0x00 byte (but take care, as level 2 headers could start with 0x00). There is no global archive-level header.
  
There are at least four different formats that an element can have. (Note that this is independent of compression schemes.) In LHA jargon, the formats are known as "header levels", and are usually called "header level 0", "... 1", "... 2", and "... 3".
+
=== Member format ===
 +
There are at least four different formats that an element can have. (Note that this is independent of compression schemes.) In LHA jargon, the formats are known as "header levels", and are usually called "header level 0", "... 1", "... 2", and "... 3". The header level is determined by the byte at offset 20 from the beginning of that element.
  
The format of an element is determined by the byte at offset 20 from the beginning of that element. It is possible for different formats to be used in the same LHA file.
+
The header levels are similar, but irritatingly different. They don't even follow the same principles with respect to how they must be parsed.
  
The formats are similar, but irritatingly different. They don't even follow the same principles with respect to how they must be parsed.
+
=== LZH compression overview ===
 +
From a decompression perspective, the LZ77+Huffman schemes work roughly as follows. (This is oversimplified.) There is a ''codes'' Huffman tree, and a separate ''offsets'' tree. A symbol is read using the codes tree which, depending on its value, represents either a literal byte value, or a ''length''. If it is a length, then an additional symbol is read using the offsets tree. Based on the offset and length, a run of recently-decompressed bytes is repeated.
  
 
=== Compression schemes ===
 
=== Compression schemes ===
The compression scheme of an element is identified by the alphanumeric bytes of its ''compression method'' field. Known compression schemes:
+
Each member file has a 5-byte ''compression method'' field, composed of ASCII characters. The first and last characters are virtually always dashes ("<code>-</code>"), and might be left off when discussing LHA compression schemes. Known schemes:
  
 
{| class="wikitable"
 
{| class="wikitable"
 
! ID !! Category !! Description and remarks
 
! ID !! Category !! Description and remarks
 
|-
 
|-
|<code>lh0</code> || || Uncompressed
+
|<code>-lh0-</code> || || Uncompressed
 
|-
 
|-
|<code>lh1</code> || || LZ77+Huffman, 4k window, dynamic Huffman for ''codes'' (a code can be a literal or a length, depending on its value), offsets use a pre-defined Huffman tree.
+
|<code>-lh1-</code> || || [[LZ77 with Huffman coding|LZ77+Huffman]], 4k window, [[Adaptive Huffman coding|adaptive Huffman]] for codes, offsets use a pre-defined Huffman tree. See also [[LZHUF]].
 
|-
 
|-
|<code>lh2</code> || || LZ77+Huffman, 8k window, dynamic Huffman for codes and offsets. Considered obsolete.
+
|<code>-lh2-</code> || || LZ77+Huffman, 8k window, adaptive Huffman. Considered experimental/obsolete.
 
|-
 
|-
|<code>lh3</code> || || LZ77+Huffman, 8k window, static Huffman for codes, offsets can use static Huffman or a pre-defined Huffman tree. Considered obsolete.
+
|<code>-lh3-</code> || || LZ77+Huffman, 8k window, segmented, static Huffman for codes, offsets can use static Huffman or a pre-defined Huffman tree. Considered experimental/obsolete.
 
|-
 
|-
|<code>lh4</code> || || Like lh5, but 4k window
+
|<code>-lh4-</code> || || Like lh5, but 4k window. Rare.
 
|-
 
|-
|<code>lh5</code> || || LZ77+Huffman, 8k window, static Huffman for codes and offsets
+
|<code>-lh5-</code> || || [[LZ77 with Huffman coding|LZ77+Huffman]], 8k window, segmented, static Huffman. See also [[ar (Haruhiko Okumura)]].
 
|-
 
|-
|<code>lh6</code> || || Like lh5, but 32k window
+
|<code>-lh6-</code> || || Like lh5, but 32k window
 
|-
 
|-
|<code>lh7</code> || || Like lh5, but 64k window
+
|<code>-lh7-</code> || || Like lh5, but 64k window
 
|-
 
|-
|<code>lh8</code> ||rowspan="5"| Joe Jared extensions || Same as lh7.
+
|<code>-lh7-</code> || LHARK extension || Refer to [[LHARK]].
 
|-
 
|-
|<code>lh9</code> || Like lh5, but 128k window. Probably never used.
+
|<code>-lh8-</code> ||rowspan="5"| Joe Jared extensions || Like lh5, but 64k window. (Same as lh7.)
 
|-
 
|-
|<code>lha</code> || Like lh5, but 256k window. Probably never used.
+
|<code>-lh9-</code> || Like lh5, but 128k window. Probably never used.
 
|-
 
|-
|<code>lhb</code> || Like lh5, but 512k window. Probably never used.
+
|<code>-lha-</code> || Like lh5, but 256k window. Probably never used.
 
|-
 
|-
|<code>lhc</code> || Like lh5, but 1M window. Probably never used.
+
|<code>-lhb-</code> || Like lh5, but 512k window. Probably never used.
 
|-
 
|-
|<code>lhd</code> || Special || Not a compression scheme. Indicates that the element represents a subdirectory.
+
|<code>-lhc-</code> || Like lh5, but 1M window. Probably never used.
 
|-
 
|-
|<code>lhe</code> || Joe Jared extensions || Like lh5, but 2M window. Probably never used.
+
|<code>-lhd-</code> || Special || Not a compression scheme. Indicates that the element represents a subdirectory.
 
|-
 
|-
|<code>lhx</code> || UNLHA32 extension ||  
+
|<code>-lhe-</code> || Joe Jared extensions || Like lh5, but 2M window. Probably never used.
 
|-
 
|-
|<code>lz2</code> ||rowspan="7"| LArc extensions ||  
+
|<code>-lhx-</code> ||rowspan="2"| UNLHA32 extensions ||  
 
|-
 
|-
|<code>lz3</code> ||  
+
|<code>-lx1-</code> ||  
 
|-
 
|-
|<code>lz4</code> || Uncompressed
+
|<code>-lz2-</code> ||rowspan="7"| LArc methods ||rowspan="7"| Refer to [[LArc]].
 
|-
 
|-
|<code>lz5</code> || LZ77/[[LZSS]], 4k window. Almost identical to "SZDD" used in [[MS-DOS installation compression]].
+
|<code>-lz3-</code>
 
|-
 
|-
|<code>lz7</code> ||
+
|<code>-lz4-</code>
 
|-
 
|-
|<code>lz8</code> ||
+
|<code>-lz5-</code>
 
|-
 
|-
|<code>lzs</code> || LZ77/[[LZSS]], 2k window
+
|<code>-lz7-</code>
 
|-
 
|-
|<code>pc1</code> ||rowspan="5"| PMarc extensions ||rowspan="5"| Refer to [[PMA]].
+
|<code>-lz8-</code>
 
|-
 
|-
|<code>pm0</code>
+
|<code>-lzs-</code>
 
|-
 
|-
|<code>pm1</code>
+
|<code>-pm0-</code> ||rowspan="3"| PMarc extensions ||rowspan="3"| Refer to [[PMA]].
 
|-
 
|-
|<code>pm2</code>
+
|<code>-pm1-</code>
 
|-
 
|-
|<code>pms</code>
+
|<code>-pm2-</code>
 +
|-
 +
|<code>-ah0-</code> ||rowspan="3"| MAR extensions ||rowspan="3"| Refer to [[Micrognosis Compression Archiver]].
 +
|-
 +
|<code>-ari-</code>
 +
|-
 +
|<code>-hf0-</code>
 +
|-
 +
|<code>-lZ0-</code> ||rowspan="3"| PUT/GET variants ||rowspan="3"| Refer to [[PUT]].
 +
|-
 +
|<code>-lZ1-</code>
 +
|-
 +
|<code>-lZ5-</code>
 +
|-
 +
|<code>␠LH0␠</code> ||rowspan="2"| SAR variants ||rowspan="2"| Refer to [[SAR (Streamline Design)]]. The compression IDs begin and end with a space (0x20).
 +
|-
 +
|<code>␠LH5␠</code>
 
|}
 
|}
  
 
The Wikipedia article has [[Wikipedia:LHA (file format)#Compression methods|more information]] about some of the schemes.
 
The Wikipedia article has [[Wikipedia:LHA (file format)#Compression methods|more information]] about some of the schemes.
 +
 +
For reference, here are some other LHA-like identifiers:
 +
 +
{| class="wikitable"
 +
! ID !! References and remarks
 +
|-
 +
|<code>-afx-</code> || Refer to [[AFX (Atari ST)]].
 +
|-
 +
|<code>-arn-</code> ||rowspan="2"| Possibly used by [[Micrognosis Compression Archiver]].
 +
|-
 +
|<code>-lzw-</code>
 +
|-
 +
|<code>-LD6-</code> ||rowspan="2"| Refer to [[LDArc and LDIFF]].
 +
|-
 +
|<code>-lz6-</code>
 +
|-
 +
|<code>-ll0-</code> ||rowspan="2"| Refer to [[PAKLEO]].
 +
|-
 +
|<code>-ll1-</code>
 +
|-
 +
|<code>-pc1-</code> || Used by [[PopCom!]].
 +
|-
 +
|<code>-pms-</code> || Used by [[PMsfx]] and [[PMexe]].
 +
|-
 +
|<code>-sqx-</code> || Refer to [[SQX]].
 +
|-
 +
|<code>-sw0-</code> ||rowspan="2"| Refer to [[SWG]].
 +
|-
 +
|<code>-sw1-</code>
 +
|-
 +
|<code>-TK1-</code> || Unknown. (Recognized by [[IDArc]].)
 +
|}
  
 
=== Extended headers ===
 
=== Extended headers ===
Line 101: Line 157:
  
 
== Identification ==
 
== Identification ==
Bytes {{magic|'-' 'l' ?? ?? '-'}} appear at offset 2. This is not a global file signature, but represents the compression scheme of the first member file of the archive.
+
LHA can be identified with high accuracy, but doing so can be laborious, due to the lack of a signature, and other complicating factors.
  
If you consider [[PMA]] to be a form of LHA, then the second of these bytes can also be <code>'p'</code>.
+
Identification logic could be based on the header of the first member file. Check that the compression method (offset 2–6) and header level (offset 20) fields have valid values. When suitable and possible, validate the header checksum field -- this depends on the header level.
 +
 
 +
See also the "[[#See also]]" section, for some formats that could masquerade as LHA.
  
 
== See also ==
 
== See also ==
 +
* [[LHA/LHarc self-extracting archive]]
 +
* [[LArc]]
 
* [[PMA]]
 
* [[PMA]]
 +
* [[LHARK]]
 
* [[LHice]]
 
* [[LHice]]
 +
* [[PUT]]
 +
* [[Micrognosis Compression Archiver]]
 +
* [[SAR (Streamline Design)]]
 +
* [[LZHUF]]
 +
* [[ar (Haruhiko Okumura)]]
 +
 +
Other LHA-like formats to be aware of:
 +
* [[AFX (Atari ST)]]
 +
* [[ARX]]
 +
* [[CAR (MylesHi!)]]
 +
* [[LDArc and LDIFF]]
 +
* [[SWG]]
  
 
== Format documentation ==
 
== Format documentation ==
Line 118: Line 191:
 
* [https://hwiegman.home.xs4all.nl/fileformats/lzh/lzhformat.html LZH format] (Aeco Systems)
 
* [https://hwiegman.home.xs4all.nl/fileformats/lzh/lzhformat.html LZH format] (Aeco Systems)
 
* [https://github.com/libarchive/libarchive/blob/master/libarchive/archive_read_support_format_lha.c libarchive: archive_read_support_format_lha.c] - Has comments with information about the header formats
 
* [https://github.com/libarchive/libarchive/blob/master/libarchive/archive_read_support_format_lha.c libarchive: archive_read_support_format_lha.c] - Has comments with information about the header formats
 +
* [https://web.archive.org/web/20021005080911/http://www.osirusoft.com/joejared/lzhformat.html Joe Jared's LHA specification] (from archive.org)
  
 
== Software ==
 
== Software ==
Line 126: Line 200:
 
* [https://www.libarchive.org libarchive]
 
* [https://www.libarchive.org libarchive]
 
* [http://lha.osdn.jp/ LHa for Unix] · [https://github.com/jca02266/lha GitHub project]
 
* [http://lha.osdn.jp/ LHa for Unix] · [https://github.com/jca02266/lha GitHub project]
* [https://web.archive.org/web/20200301124852/http://www2m.biglobe.ne.jp/~dolphin/lha/lha.htm LHa for Unix (Tsukao Okamoto)] (from archive.org)
+
** [https://web.archive.org/web/20200301124852/http://www2m.biglobe.ne.jp/~dolphin/lha/lha.htm LHa for Unix (Tsukao Okamoto)] (from archive.org)
 
* [https://micco.mars.jp/mysoft/unlha32.htm UNLHA32.DLL] and [https://micco.mars.jp/mysoft/lhmelt.htm LHMelt]
 
* [https://micco.mars.jp/mysoft/unlha32.htm UNLHA32.DLL] and [https://micco.mars.jp/mysoft/lhmelt.htm LHMelt]
 
* LHarc/LHA
 
* LHarc/LHA
 
** For DOS
 
** For DOS
*** {{CdTextfiles|hof91/ARC/LH113C.EXE|LHarc v1.13c}} (1989-05-31) - English
+
*** LHarc v1.00 - English (1989-03-04): [https://archive.org/details/RbbsInABoxVol1No2_640 RBBS in a Box, vol 1 no 2] → 014r/lharc10e.com
*** {{CdTextfiles|simtel/simtel20/MSDOS/ARCHIVRS/LHA213.EXE|LHA v2.13}} (1991-07-20) - English
+
*** {{CdTextfiles|carousel344/013/LHARC12.ZIP|LHarc v1.12 Test version - English}} (1989-04-23)
*** {{CdTextfiles|simtel/simtel9703/disk2/DISC2/ARCERS/LHA255E.EXE|LHA v2.55 English translation}} (1992-11-15) - (unofficial?)
+
*** LHarc v1.12b - English (1989-04-29): [https://archive.org/details/RbbsInABoxVol1No2_640 RBBS in a Box, vol 1 no 2] → add2/lharc12b.exe
 +
*** {{CdTextfiles|bbox4/archiver/lharc113.exe|LHarc v1.13 Test version - English}} (1989-05-04)
 +
*** {{CdTextfiles|hof91/ARC/LH113C.EXE|LHarc v1.13c - English}} (1989-05-31)
 +
*** {{CdTextfiles|garbo/PC/GOLDIES/LH113DE.COM|LHarc v1.13d - English}} (1989-12-22)
 +
*** LHarc v1.13d - Japanese: [https://archive.org/download/FMTownsFreeSoftwareCollection3 FM Towns Free Software Collection 3] → FREEWARE.{BIN,CUE} → ms_dos/lharc/*
 +
*** {{CdTextfiles|californiacollect/his008/lha205e.exe|LHa v2.05 test version - English}} (1991-01-27)
 +
*** {{CdTextfiles|hof91/ARC/LH205.EXE|LHa v2.05 test version - Japanese}}
 +
*** {{CdTextfiles|hof91/ARC/LHA206E.EXE|LHa v2.06 - English}} (1991-02-14)
 +
*** {{CdTextfiles|hof91/COMP/LHA210.EXE|LHA v2.10 - English}} (1991-02-24)
 +
*** {{CdTextfiles|californiacollect/his008/lha211.exe|LHA v2.11 - English}} (1991-03-03)
 +
*** {{CdTextfiles|californiacollect/his008/lha212.exe|LHA v2.12 - English}} (1991-03-21)
 +
*** {{CdTextfiles|simtel/simtel20/MSDOS/ARCHIVRS/LHA213.EXE|LHA v2.13 - English}} (1991-07-20)
 +
*** LHA v2.13 - Japanese: [https://archive.org/details/Nova_Win50GameVol7_Japan Win 50 Game+ Vol. 7 (Japan)] → Win 50 Game+ Vol. 7 (Japan).7z → Win 50 Game+ Vol. 7 (Japan).{bin,cue} → lha_file/lha/lha213.exe
 +
*** {{CdTextfiles|pier02/002a/lha252.exe|LHA v2.52 - Japanese}} (1992-09-07)
 +
*** LHA v2.54 - Japanese (1992-10-04): [https://archive.org/details/cg-network-4 CG Network 4] → pc/program/lha/lha.exe
 +
*** LHA v2.55 - Japanese (1992-11-15): [https://archive.org/details/2014.03.ftp.eri.u-tokyo.ac.jp] → ftp.eri.u-tokyo.ac.jp/pub/DOS/tools/lha255.exe
 
*** {{CdTextfiles|pdos9606/ARCHIVER/TOOLS/LHA255B.EXE|LHA v2.55b}} (1992-11-24) - Japanese (LHA.EXE) and English (LHA_E.EXE)
 
*** {{CdTextfiles|pdos9606/ARCHIVER/TOOLS/LHA255B.EXE|LHA v2.55b}} (1992-11-24) - Japanese (LHA.EXE) and English (LHA_E.EXE)
*** [http://info.elf.stuba.sk/packages/pub/pc/pack/lha266.exe LHA v2.66 test version] (1994-12-30) - Japanese
+
*** [http://info.elf.stuba.sk/packages/pub/pc/pack/lha266.exe LHA v2.66 test version - Japanese] (1994-12-30)
 +
**** [{{SACFTPURL|pack|lha266e.exe}} lha266e.exe] - Official(?) patch to convert error messages to English
 +
*** Various versions at old-dos.ru: [http://old-dos.ru/index.php?page=files&mode=files&do=show&id=3432 LHarc], [http://old-dos.ru/index.php?page=files&mode=files&do=show&id=713 LHA]
 
** For Windows console
 
** For Windows console
*** [http://info.elf.stuba.sk/packages/pub/pc/pack/lha267.exe LHA v2.67 test version] (1995-10-07) - Japanese
+
*** [http://info.elf.stuba.sk/packages/pub/pc/pack/lha267.exe LHA32 v2.67.00 test version - Japanese] (1995-10-07)
 +
**** [{{SACFTPURL|pack|lha267e.exe}} lha267e.exe] - Official(?) patch to convert error messages to English
 
** Source code
 
** Source code
 
*** {{CdTextfiles|simtel/simtel20/MSDOS/ARCHIVRS/LHARCSRC.ZIP|v1.13b}}
 
*** {{CdTextfiles|simtel/simtel20/MSDOS/ARCHIVRS/LHARCSRC.ZIP|v1.13b}}
 
*** {{CdTextfiles|simtel/simtel20/MSDOS/ARCHIVRS/LHA211SR.ZIP|v2.11}}
 
*** {{CdTextfiles|simtel/simtel20/MSDOS/ARCHIVRS/LHA211SR.ZIP|v2.11}}
* [http://www.manmrk.net/tutorials/compress/downloads/larc333.exe LArc v3.33] (May 19, 1989) (DOS software)
+
* [http://old-dos.ru/index.php?page=files&mode=files&do=show&id=2836 Lha32] - by "Take"
* {{CdTextfiles|sourcecode/msdos/arc_lbr/ar002.zip|ar version "002" by Haruhiko Okumura}} (1990) (DOS binary + source code)
+
* [[LZHUF]] - Source code related to "lh1" compression
* {{Deark}} (might be useful for analysis; doesn't decompress the format)
+
* [[ar (Haruhiko Okumura)]] - Implementation of "lh5" compression
 +
* [https://github.com/PascalVault/Lazarus_Unpacker Open-source library in Free Pascal]
 +
* [https://github.com/temisu/ancient Ancient] - Has modern C++ code for decompressing most LHA schemes, but as of this writing there's no easy way to use it.
 +
* {{Deark}} (e.g. with <code>-zip</code> option)
 +
* {{XAD}}
 +
* Huffman Compression Engine II, a.k.a. LH7, by Joe Jared
 +
** v0.21q for DOS: [https://encode.su/threads/1563-DOS-Archiver-Benchmark?p=31913&viewfull=1#post31913] → [https://encode.su/attachment.php?attachmentid=2155&d=1357879541 DLH7021Q.ZIP]
 +
** v0.21q for Windows: [https://encode.su/threads/1658-LH7-archiver?p=31932&viewfull=1#post31932] → [https://encode.su/attachment.php?attachmentid=2158&d=1358062452 WLH7021Q.7z]
 +
** v0.21q for Linux: [https://web.archive.org/web/20010901000000*/http://www.osirusoft.com/llh7021q.zip llh7021q.zip] (from archive.org)
 +
 
 +
=== Software oddities ===
 +
There are many customized versions of LHarc/LHA floating around. Some of them are listed here, either because they are notable, or because they are potentially misleading. (For DOS, unless otherwise indicated.)
 +
 
 +
Worth noting is that LHA 2.x has a tamper-detection feature, invoked by running "LHA t LHA.EXE" (or "LHA_E t LHA_E.EXE"). Most (but not all) modified files fail the test, and print "No file found" or "Broken archive".
 +
 
 +
* "LHarc v1.13" (1989-05-14): {{CdTextfiles|simtel0595/DISC1/CITADEL/K2NE608A.ZIP|K2NE608A.ZIP}} → LHARC.EXE - Suspect this is the v1.13 test version, edited to make it look like a full release.
 +
* "LHarc v1.131c" by Steve Hoglund: [https://archive.org/details/bbs-1 BBS# 1] → DOCUMENT/TURBOBAS.LZH → LHARC.COM
 +
* [[LHice]] - A hack of v1.13c.
 +
* {{CdTextfiles|hof91/COMP/LHA114A.COM|"LHarc v1.14a"}} - A hack of v1.13c and/or LHice.
 +
* {{CdTextfiles|hof91/ARC/LH114B.EXE|"LHARC v1.14β"}} - A hack of v1.13c and/or LHice.
 +
* {{CdTextfiles|animfestival/SBPRO/LHARC.EXE|"LHarc v2.01a"}} - Apparently a hack of v1.13c.
 +
* {{CdTextfiles|simtel/simtel9703/disk2/DISC2/ARCERS/LHA255E.EXE|"LHA v2.55E"}} (1992-11-15/1996-01-10) - English translation of v2.55, by Hitoshi Ozawa
  
 
== Sample files ==
 
== Sample files ==
 +
* [https://github.com/fragglet/lhasa/tree/master/test/archives lhasa test files]
 
* [https://github.com/libarchive/libarchive/tree/master/libarchive/test libarchive test files] → test_read_format_lha_*.lzh.uu
 
* [https://github.com/libarchive/libarchive/tree/master/libarchive/test libarchive test files] → test_read_format_lha_*.lzh.uu
 
* [http://aminet.net/ aminet]
 
* [http://aminet.net/ aminet]
 
* {{CdTextfiles|hof91/}} ...
 
* {{CdTextfiles|hof91/}} ...
* https://telparia.com/fileFormatSamples/archive/hexify.lha
+
* {{DexvertSamples|archive/lha}}
  
 
== Other links ==
 
== Other links ==
 
* [[Wikipedia:LHA (file format)|Wikipedia article]]
 
* [[Wikipedia:LHA (file format)|Wikipedia article]]

Latest revision as of 15:35, 3 February 2024

File Format
Name LHA
Ontology
Extension(s) .lzh, .lha
MIME Type(s) application/x-lzh-compressed
PRONOM fmt/626
Wikidata ID Q368782
Kaitai Struct Spec lzh.ksy
Released ≤1989

LHA is a family of archiving programs, and their associated file format, created by Haruyasu Yoshizaki (a.k.a. Yoshi). The software was originally called LHarc, then was briefly LH (v2.02–2.04), then LHa (v2.05–2.06), before settling on LHA (v2.10+). The file format is also known as LZH. See the LZH disambiguation page for other "LZH" formats.

Contents

[edit] Discussion

In the 1990s, LHA was the most popular archiving format on the Amiga platform. It also got some use on the PC platform, including in the installers for id Software games such as Doom and Quake, because ZIP compression was inferior until the release of PKZIP 2.0, which brought the formats to parity.

It was particularly popular in Japan. Most of the best information about it is in Japanese.

It supports a number of different compression schemes, most of which use LZ77 combined with Huffman coding.

This article covers the format used by LHarc/LHA, as well as "generalized" LHA format: the same file format, but with other compression schemes. The generalized format was possibly designed by Kazuhiko Miki for LArc, but confirmation of this is needed. If so, it was soon borrowed by LHarc, with new compression schemes.

The format may have been released in 1988, but conclusive evidence is lacking. The earliest confirmed release date is 1989-02 for the LArc compression schemes (LArc v3.33), and 1989-03 for LHarc (v1.00).

[edit] Format details

[edit] File structure

An LHA file consists mainly of a sequence of elements, each representing a member file or directory. The sequence is usually terminated by an end-of-archive marker consisting of a single 0x00 byte (but take care, as level 2 headers could start with 0x00). There is no global archive-level header.

[edit] Member format

There are at least four different formats that an element can have. (Note that this is independent of compression schemes.) In LHA jargon, the formats are known as "header levels", and are usually called "header level 0", "... 1", "... 2", and "... 3". The header level is determined by the byte at offset 20 from the beginning of that element.

The header levels are similar, but irritatingly different. They don't even follow the same principles with respect to how they must be parsed.

[edit] LZH compression overview

From a decompression perspective, the LZ77+Huffman schemes work roughly as follows. (This is oversimplified.) There is a codes Huffman tree, and a separate offsets tree. A symbol is read using the codes tree which, depending on its value, represents either a literal byte value, or a length. If it is a length, then an additional symbol is read using the offsets tree. Based on the offset and length, a run of recently-decompressed bytes is repeated.

[edit] Compression schemes

Each member file has a 5-byte compression method field, composed of ASCII characters. The first and last characters are virtually always dashes ("-"), and might be left off when discussing LHA compression schemes. Known schemes:

ID Category Description and remarks
-lh0- Uncompressed
-lh1- LZ77+Huffman, 4k window, adaptive Huffman for codes, offsets use a pre-defined Huffman tree. See also LZHUF.
-lh2- LZ77+Huffman, 8k window, adaptive Huffman. Considered experimental/obsolete.
-lh3- LZ77+Huffman, 8k window, segmented, static Huffman for codes, offsets can use static Huffman or a pre-defined Huffman tree. Considered experimental/obsolete.
-lh4- Like lh5, but 4k window. Rare.
-lh5- LZ77+Huffman, 8k window, segmented, static Huffman. See also ar (Haruhiko Okumura).
-lh6- Like lh5, but 32k window
-lh7- Like lh5, but 64k window
-lh7- LHARK extension Refer to LHARK.
-lh8- Joe Jared extensions Like lh5, but 64k window. (Same as lh7.)
-lh9- Like lh5, but 128k window. Probably never used.
-lha- Like lh5, but 256k window. Probably never used.
-lhb- Like lh5, but 512k window. Probably never used.
-lhc- Like lh5, but 1M window. Probably never used.
-lhd- Special Not a compression scheme. Indicates that the element represents a subdirectory.
-lhe- Joe Jared extensions Like lh5, but 2M window. Probably never used.
-lhx- UNLHA32 extensions
-lx1-
-lz2- LArc methods Refer to LArc.
-lz3-
-lz4-
-lz5-
-lz7-
-lz8-
-lzs-
-pm0- PMarc extensions Refer to PMA.
-pm1-
-pm2-
-ah0- MAR extensions Refer to Micrognosis Compression Archiver.
-ari-
-hf0-
-lZ0- PUT/GET variants Refer to PUT.
-lZ1-
-lZ5-
␠LH0␠ SAR variants Refer to SAR (Streamline Design). The compression IDs begin and end with a space (0x20).
␠LH5␠

The Wikipedia article has more information about some of the schemes.

For reference, here are some other LHA-like identifiers:

ID References and remarks
-afx- Refer to AFX (Atari ST).
-arn- Possibly used by Micrognosis Compression Archiver.
-lzw-
-LD6- Refer to LDArc and LDIFF.
-lz6-
-ll0- Refer to PAKLEO.
-ll1-
-pc1- Used by PopCom!.
-pms- Used by PMsfx and PMexe.
-sqx- Refer to SQX.
-sw0- Refer to SWG.
-sw1-
-TK1- Unknown. (Recognized by IDArc.)

[edit] Extended headers

For header levels 1 and higher, each member file has an associated list of "extended headers", similar to ZIP's extensible data fields. Each extended header is tagged with a single byte indicating its type. Extended headers are used to store platform-specific metadata, and to extend the format in other ways.

Header level 0 supports extended data in a more limited way. It allows for just one set of extended header fields (called the "extended area"), the content of which is determined by the initial one-byte "OS type" field.

[edit] Identification

LHA can be identified with high accuracy, but doing so can be laborious, due to the lack of a signature, and other complicating factors.

Identification logic could be based on the header of the first member file. Check that the compression method (offset 2–6) and header level (offset 20) fields have valid values. When suitable and possible, validate the header checksum field -- this depends on the header level.

See also the "#See also" section, for some formats that could masquerade as LHA.

[edit] See also

Other LHA-like formats to be aware of:

[edit] Format documentation

[edit] Software

[edit] Software oddities

There are many customized versions of LHarc/LHA floating around. Some of them are listed here, either because they are notable, or because they are potentially misleading. (For DOS, unless otherwise indicated.)

Worth noting is that LHA 2.x has a tamper-detection feature, invoked by running "LHA t LHA.EXE" (or "LHA_E t LHA_E.EXE"). Most (but not all) modified files fail the test, and print "No file found" or "Broken archive".

  • "LHarc v1.13" (1989-05-14): K2NE608A.ZIP → LHARC.EXE - Suspect this is the v1.13 test version, edited to make it look like a full release.
  • "LHarc v1.131c" by Steve Hoglund: BBS# 1 → DOCUMENT/TURBOBAS.LZH → LHARC.COM
  • LHice - A hack of v1.13c.
  • "LHarc v1.14a" - A hack of v1.13c and/or LHice.
  • "LHARC v1.14β" - A hack of v1.13c and/or LHice.
  • "LHarc v2.01a" - Apparently a hack of v1.13c.
  • "LHA v2.55E" (1992-11-15/1996-01-10) - English translation of v2.55, by Hitoshi Ozawa

[edit] Sample files

[edit] Other links

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox