LHA

From Just Solve the File Format Problem
(Difference between revisions)
Jump to: navigation, search
(adding my memory of why we used LZH instead of ZIP in the early 1990s.)
(Software)
(17 intermediate revisions by 4 users not shown)
Line 1: Line 1:
 
{{FormatInfo
 
{{FormatInfo
 
|subcat=Archiving
 
|subcat=Archiving
|extensions={{ext|lha}}, {{ext|lzh}}
+
|extensions={{ext|lha}}, {{ext|lzh}}, {{ext|lzs}}
 
|mimetypes={{mimetype|application/x-lzh-compressed}}
 
|mimetypes={{mimetype|application/x-lzh-compressed}}
 
|pronom={{PRONOM|fmt/626}}
 
|pronom={{PRONOM|fmt/626}}
 +
|kaitai struct=lzh
 
|released=1988
 
|released=1988
 
}}
 
}}
'''LHA''' is an archiving program and file format created by Haruyasu Yoshizaki in 1988. It was originally called LHArc, then was briefly LH before settling on LHA. In the 1990s it was the most popular archiving format on the Amiga platform, and also got some use on the PC platform including in the installers for id Software games such as Doom and Quake, because [[ZIP]] compression was inferior until the release of PKZip 2.04 (I think 2.04g), which brought the formats to parity (I think by adding DEFLATE compression? can anyone verify this?). At present, it is mostly used in Japan.
+
'''LHA''' is an archiving program and file format created by Haruyasu Yoshizaki in 1988. It was originally called LHarc, then was briefly LH before settling on LHA. In the 1990s, it was the most popular archiving format on the Amiga platform. It also got some use on the PC platform including in the installers for id Software games such as Doom and Quake, because [[ZIP]] compression was inferior until the release of PKZIP 2.0, which brought the formats to parity. At present, it is mostly used in Japan.
  
The file format is also known as '''LZH'''.
+
The file format is also known as '''LZH''' (not to be confused with [[CrLZH]], which is also sometimes called this).
 +
 
 +
== Format details ==
 +
An LHA file consists of a sequence of elements, each representing a member file or directory. There is no global archive-level header.
 +
 
 +
There are at least four different formats that an element can have. (Note that this is independent of compression schemes.) In LHA jargon, the formats are known as "header levels", and are usually called "header level 0", "... 1", "... 2", and "... 3".
 +
 
 +
The format of an element is determined by the byte at offset 20 from the beginning of that element. It is possible for different formats to be used in the same LHA file.
 +
 
 +
The formats are similar, but irritatingly different. They don't even follow the same principles with respect to how they must be parsed.
 +
 
 +
=== Compression schemes ===
 +
Wikipedia has a [[Wikipedia:LHA (file format)#Compression methods|list of LHA compression methods]], as identified by the alphanumeric bytes of the ''compression method'' field. It includes the following:
 +
lh0, lh1, lh2, lh4, lh5, lh6, lh7, lh8, lh9,
 +
lha, lhb, lhc, lhd, lhe, lhx,
 +
lz2, lz3, lz4, lz5, lz7, lz8, lzs,
 +
pc1, pm0, pm1, pm2, pms
 +
 
 +
<code>lhd</code> is not actually a compression scheme, but indicates that the element represents a subdirectory.
 +
 
 +
=== Extended headers ===
 +
For header levels 1 and higher, each member file has an associated list of "extended headers", similar to [[ZIP#Extensible data fields|ZIP's extensible data fields]]. Each extended header is tagged with a single byte indicating its type. Extended headers are used to store platform-specific metadata, and to extend the format in other ways.
 +
 
 +
* [https://web.archive.org/web/20110912035449/http://homepage1.nifty.com:80/dangan/en/Content/Program/Java/jLHA/Notes/ExtHeaderList.html List of extended headers] (from archive.org)
 +
* [https://github.com/libarchive/libarchive/blob/master/libarchive/archive_read_support_format_lha.c libarchive: archive_read_support_format_lha.c] (look for "EXT_HEADER_CRC")
 +
 
 +
Header level 0 supports extended data in a more limited way. It allows for just one set of extended header fields (called the "extended area"), the content of which is determined by the initial one-byte "OS type" field.
 +
 
 +
* [https://web.archive.org/web/20110909114523/http://homepage1.nifty.com:80/dangan/en/Content/Program/Java/jLHA/Notes/ExtendArea.html Extended area] (from archive.org)
 +
 
 +
== Identification ==
 +
Bytes {{magic|'-' 'l' ?? ?? '-'}} appear at offset 2. This is not a global file signature, but represents the compression scheme of the first member file of the archive.
 +
 
 +
If you consider [[PMA]] to be a form of LHA, then the second of these bytes can also be <code>'p'</code>.
 +
 
 +
== See also ==
 +
* [[PMA]]
 +
* [[LHice]]
  
 
== Format documentation ==
 
== Format documentation ==
* [http://homepage1.nifty.com/dangan/en/Content/Program/Java/jLHA/Notes/Notes.html Notes on header format]
+
* [http://dangan.g.dgdg.jp/ jLHA software]: LHA Notes
 +
** [http://dangan.g.dgdg.jp/Content/Program/Java/jLHA/Notes/Notes.html Japanese]
 +
** [https://web.archive.org/web/20120211104049/http://homepage1.nifty.com/dangan/en/Content/Program/Java/jLHA/Notes/Notes.html English (translation?)] (from archive.org)
 
* [http://apple2.org.za/gswv/a2zine/GS.WorldView/Resources/The.MacShrinkIt.Project/ARCHIVES.TXT Archive format info]
 
* [http://apple2.org.za/gswv/a2zine/GS.WorldView/Resources/The.MacShrinkIt.Project/ARCHIVES.TXT Archive format info]
 
* [http://www.textfiles.com/programming/FORMATS/arc_fmts.txt LZH file header format (among other archive types)]
 
* [http://www.textfiles.com/programming/FORMATS/arc_fmts.txt LZH file header format (among other archive types)]
 +
* [https://github.com/libarchive/libarchive/blob/master/libarchive/archive_read_support_format_lha.c libarchive: archive_read_support_format_lha.c] - Has comments with information about the header formats
  
 
== Software ==
 
== Software ==
* [http://homepage1.nifty.com/dangan/en/Content/Program/Java/jLHA/jLHA.html Java library]
+
* [https://github.com/fragglet/lhasa lhasa]
* [http://www.ponsoftware.com/en/ Explzh for Windows]
+
 
* [[7-Zip]]
 
* [[7-Zip]]
 +
* [http://www.ponsoftware.com/en/ Explzh for Windows]
 +
* [https://web.archive.org/web/20130906133859/http://homepage1.nifty.com/dangan/en/Content/Program/Java/jLHA/jLHA.html Java library] (from archive.org)
 +
* [https://www.libarchive.org libarchive]
 +
* LHarc/LHA (DOS software)
 +
** {{CdTextfiles|hof91/ARC/LH113C.EXE|LHarc v1.13c}} (1989-05-31) - English
 +
** {{CdTextfiles|simtel/simtel20/MSDOS/ARCHIVRS/LHA213.EXE|LHA v2.13}} (1991-07-20) - English
 +
** {{CdTextfiles|pdos9606/ARCHIVER/TOOLS/LHA255B.EXE|LHA v2.55b}} (1992-11-24) - Japanese (LHA.EXE) and English (LHA_E.EXE)
 +
* LHarc/LHA source code: {{CdTextfiles|simtel/simtel20/MSDOS/ARCHIVRS/LHARCSRC.ZIP|v1.13b}} · {{CdTextfiles|simtel/simtel20/MSDOS/ARCHIVRS/LHA211SR.ZIP|v2.11}}
 +
* [http://www.manmrk.net/tutorials/compress/downloads/larc333.exe LArc v3.33] (May 19, 1989) (DOS software)
 +
* {{Deark}} (might be useful for analysis; doesn't decompress the format)
 +
 +
Note: There is an LHA version "2.55e" ({{CdTextfiles|simtel/simtel9703/disk2/DISC2/ARCERS/LHA255E.EXE|LHA255E.EXE}}), but it is an (unauthorized?) English translation of v2.55, and v2.55 is older than v2.55b.
 +
 +
== Sample files ==
 +
* [https://github.com/libarchive/libarchive/tree/master/libarchive/test libarchive test files] → test_read_format_lha_*.lzh.uu
 +
* [http://aminet.net/ aminet]
  
 
== Other links ==
 
== Other links ==
 
* [[Wikipedia:LHA (file format)|Wikipedia article]]
 
* [[Wikipedia:LHA (file format)|Wikipedia article]]

Revision as of 23:03, 7 January 2020

File Format
Name LHA
Ontology
Extension(s) .lha, .lzh, .lzs
MIME Type(s) application/x-lzh-compressed
PRONOM fmt/626
Kaitai Struct Spec lzh.ksy
Released 1988

LHA is an archiving program and file format created by Haruyasu Yoshizaki in 1988. It was originally called LHarc, then was briefly LH before settling on LHA. In the 1990s, it was the most popular archiving format on the Amiga platform. It also got some use on the PC platform including in the installers for id Software games such as Doom and Quake, because ZIP compression was inferior until the release of PKZIP 2.0, which brought the formats to parity. At present, it is mostly used in Japan.

The file format is also known as LZH (not to be confused with CrLZH, which is also sometimes called this).

Contents

Format details

An LHA file consists of a sequence of elements, each representing a member file or directory. There is no global archive-level header.

There are at least four different formats that an element can have. (Note that this is independent of compression schemes.) In LHA jargon, the formats are known as "header levels", and are usually called "header level 0", "... 1", "... 2", and "... 3".

The format of an element is determined by the byte at offset 20 from the beginning of that element. It is possible for different formats to be used in the same LHA file.

The formats are similar, but irritatingly different. They don't even follow the same principles with respect to how they must be parsed.

Compression schemes

Wikipedia has a list of LHA compression methods, as identified by the alphanumeric bytes of the compression method field. It includes the following:

lh0, lh1, lh2, lh4, lh5, lh6, lh7, lh8, lh9,
lha, lhb, lhc, lhd, lhe, lhx,
lz2, lz3, lz4, lz5, lz7, lz8, lzs,
pc1, pm0, pm1, pm2, pms

lhd is not actually a compression scheme, but indicates that the element represents a subdirectory.

Extended headers

For header levels 1 and higher, each member file has an associated list of "extended headers", similar to ZIP's extensible data fields. Each extended header is tagged with a single byte indicating its type. Extended headers are used to store platform-specific metadata, and to extend the format in other ways.

Header level 0 supports extended data in a more limited way. It allows for just one set of extended header fields (called the "extended area"), the content of which is determined by the initial one-byte "OS type" field.

Identification

Bytes '-' 'l' ?? ?? '-' appear at offset 2. This is not a global file signature, but represents the compression scheme of the first member file of the archive.

If you consider PMA to be a form of LHA, then the second of these bytes can also be 'p'.

See also

Format documentation

Software

Note: There is an LHA version "2.55e" (LHA255E.EXE), but it is an (unauthorized?) English translation of v2.55, and v2.55 is older than v2.55b.

Sample files

Other links

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox