LHA

LHA is an archiving program and file format created by Haruyasu Yoshizaki (a.k.a. Yoshi) in 1988. It was originally called LHarc, then was briefly LH (v2.02–2.04), then LHa (v2.05–2.06), before settling on LHA (v2.10+). In the 1990s, it was the most popular archiving format on the Amiga platform. It also got some use on the PC platform including in the installers for id Software games such as Doom and Quake, because ZIP compression was inferior until the release of PKZIP 2.0, which brought the formats to parity.

It was particularly popular in Japan. Most of the best information about it is in Japanese.

It supports a number of different compression schemes, most of which use LZ77 combined with Huffman coding.

The file format is also known as LZH. See the LZH disambiguation page for other "LZH" formats.

This article covers the format used by LHarc/LHA, as well as "generalized" LHA format: the same file format, but with other compression schemes. The generalized format was possibly designed by Kazuhiko Miki in 1988 for LArc, but confirmation of this is needed. If so, it was soon borrowed by LHarc, with new compression schemes.

File structure
An LHA file consists mainly of a sequence of elements, each representing a member file or directory. The sequence is usually terminated by an end-of-archive marker consisting of a single 0x00 byte (but take care, as level 2 headers could start with 0x00). There is no global archive-level header.

Member format
There are at least four different formats that an element can have. (Note that this is independent of compression schemes.) In LHA jargon, the formats are known as "header levels", and are usually called "header level 0", "... 1", "... 2", and "... 3". The header level is determined by the byte at offset 20 from the beginning of that element.

The header levels are similar, but irritatingly different. They don't even follow the same principles with respect to how they must be parsed.

LZH compression overview
From a decompression perspective, the LZ77+Huffman schemes work roughly as follows. (This is oversimplified.) There is a codes Huffman tree, and a separate offsets tree. A symbol is read using the codes tree which, depending on its value, represents either a literal byte value, or a length. If it is a length, then an additional symbol is read using the offsets tree. Based on the offset and length, a run of recently-decompressed bytes is repeated.

Compression schemes
Each member file has a 5-byte compression method field, composed of ASCII characters. The first and last characters are virtually always dashes (" "), and might be left off when discussing LHA compression schemes. Known schemes:

The Wikipedia article has more information about some of the schemes.

For reference, here are some other LHA-like identifiers:

Extended headers
For header levels 1 and higher, each member file has an associated list of "extended headers", similar to ZIP's extensible data fields. Each extended header is tagged with a single byte indicating its type. Extended headers are used to store platform-specific metadata, and to extend the format in other ways.


 * List of extended headers (from archive.org)
 * libarchive: archive_read_support_format_lha.c (look for "EXT_HEADER_CRC")

Header level 0 supports extended data in a more limited way. It allows for just one set of extended header fields (called the "extended area"), the content of which is determined by the initial one-byte "OS type" field.


 * Extended area (from archive.org)

Identification
LHA can be identified with high accuracy, but doing so can be laborious, due to the lack of a signature, and other complicating factors.

Identification logic could be based on the header of the first member file. Check that the compression method (offset 2–6) and header level (offset 20) fields have valid values. When suitable and possible, validate the header checksum field -- this depends on the header level.

See also the "" section, for some formats that could masquerade as LHA.

Format documentation

 * jLHA software: LHA Notes
 * Japanese
 * English (translation?) (from archive.org)
 * Archive format info
 * LZH file header format (among other archive types)
 * LZH format
 * LZH format (Aeco Systems)
 * libarchive: archive_read_support_format_lha.c - Has comments with information about the header formats

Software

 * lhasa
 * 7-Zip
 * Explzh for Windows
 * Java library (from archive.org)
 * libarchive
 * LHa for Unix · GitHub project
 * LHa for Unix (Tsukao Okamoto) (from archive.org)
 * UNLHA32.DLL and LHMelt
 * LHarc/LHA
 * For DOS
 * LHarc v1.00 - English (1989-03-04): RBBS in a Box, vol 1 no 2 → 014r/lharc10e.com (or )
 * (1989-04-23)
 * LHarc v1.12b - English (1989-04-29): RBBS in a Box, vol 1 no 2 → add2/lharc12b.exe (or )
 * (1989-05-04)
 * (1989-05-31)
 * (1989-12-22)
 * LHarc v1.13d - Japanese: FM Towns Free Software Collection 3 → FREEWARE.{BIN,CUE} → ms_dos/lharc/* (or )
 * (1991-01-27)
 * (1991-02-14)
 * (1991-02-24)
 * (1991-03-03)
 * (1991-03-21)
 * (1991-07-20)
 * LHA v2.13 - Japanese: Win 50 Game+ Vol. 7 (Japan) → Win 50 Game+ Vol. 7 (Japan).7z → Win 50 Game+ Vol. 7 (Japan).{bin,cue} → lha_file/lha/lha213.exe
 * (1992-09-07)
 * LHA v2.54 - Japanese (1992-10-04): CG Network 4 → pc/program/lha/lha.exe
 * LHA v2.55 - Japanese (1992-11-15): → ftp.eri.u-tokyo.ac.jp/pub/DOS/tools/lha255.exe
 * (1992-11-24) - Japanese (LHA.EXE) and English (LHA_E.EXE)
 * LHA v2.66 test version - Japanese (1994-12-30)
 * [ lha266e.exe] - Official(?) patch to convert error messages to English
 * Various versions at old-dos.ru: LHarc, LHA
 * For Windows console
 * LHA32 v2.67.00 test version - Japanese (1995-10-07)
 * [ lha267e.exe] - Official(?) patch to convert error messages to English
 * Source code
 * Lha32 - by "Take"
 * LZHUF - Source code related to "lh1" compression
 * ar (Haruhiko Okumura) - Implementation of "lh5" compression
 * Ancient - Has modern C++ code for decompressing most LHA schemes, but as of this writing there's no easy way to use it.
 * (e.g. with  option)
 * ar (Haruhiko Okumura) - Implementation of "lh5" compression
 * Ancient - Has modern C++ code for decompressing most LHA schemes, but as of this writing there's no easy way to use it.
 * (e.g. with  option)

Software oddities
There are many customized versions of LHarc/LHA floating around. Some of them are listed here, either because they are notable, or because they are potentially misleading. (For DOS, unless otherwise indicated.)

Worth noting is that LHA 2.x has a tamper-detection feature, invoked by running "LHA t LHA.EXE" (or "LHA_E t LHA_E.EXE"). Most (but not all) modified files fail the test, and print "No file found" or "Broken archive".


 * "LHarc v1.13" (1989-05-14): → LHARC.EXE - Suspect this is the v1.13 test version, edited to make it look like a full release.
 * "LHarc v1.131c" by Steve Hoglund: BBS# 1 → DOCUMENT/TURBOBAS.LZH → LHARC.COM
 * LHice - A hack of v1.13c.
 * - A hack of v1.13c and/or LHice.
 * - A hack of v1.13c and/or LHice.
 * - Apparently a hack of v1.13c.
 * (1992-11-15/1996-01-10) - English translation of v2.55, by Hitoshi Ozawa

Sample files

 * lhasa test files
 * libarchive test files → test_read_format_lha_*.lzh.uu
 * aminet
 * https://telparia.com/fileFormatSamples/archive/lha/hexify.lha
 * https://telparia.com/fileFormatSamples/archive/lha/hexify.lha

Other links

 * Wikipedia article