Tape Archive

From Just Solve the File Format Problem
Revision as of 05:19, 28 December 2023 by Dexvertbot (Talk | contribs)

Jump to: navigation, search
File Format
Name Tape Archive
Ontology
Extension(s) .tar, .tgz, .tbz, .txz, .tlz, .tsz, .taz, .tz
MIME Type(s) application/x-tar
LoCFDD fdd000531
PRONOM x-fmt/265
Wikidata ID Q283579
Released 1979
This article is about the electronic archive format. For physical tape archives, see Magnetic tape or Punched tape.

Tape Archive (tar) is a traditional UNIX archive format, defined in POSIX.1-1988 and later POSIX.1-2001. Its original purpose was to archive files on backup tapes. Archived data in the tar format is sometimes referred to as a "tarball".

Contents

Compression

While tar itself does not offer any compression, it's frequently used together with a stream compression format such as gzip, bzip2, or XZ to provide file archiving plus compression. Most modern implementations of tar, present in UNIX/Linux systems, offer built-in support for this combined operation by using a modifier such as z (gzip) or j (bzip2). When extracting files, the compression format can sometimes be detected and handled automatically.

Files compressed this way should have a dual file extension such as .tar.gz or tar.bz2. Sometimes the .tgz extension is used in place of .tar.gz. Rarely, other shortened extensions are used:

  • .tbz and .tbz2 instead of .tar.bz2 (bzip2)
  • .txz instead of .tar.xz (XZ)
  • .tlz instead of .tar.lz (Lzip) or .tar.lzma (LZMA_Alone)
  • .tsz instead of .tar.sz (Sunzip)
  • .taz instead of .tar.Z (compress) (or possibly some other compressed format)
  • .tz instead of .tar.Z (compress) (or possibly some other compressed format)

Variants

There exist some variants to the TAR archive format. The original POSIX.1-1988 TAR format had limitations on the type of files it could contain and the length of filenames. That's why the USTAR format was later developed and standardized as POSIX IEEE P1003.1. Jörg Schilling has collected some information about the different implementations; see the references section. There's also an old version (often referred to as "non-ANSI Tar" or simply "old Tar") which both GNU Tar and STar can read and write.

Pax is a system of extensions to USTAR format.

Identification

Most modern tar files use a format based on either GNU Tar or POSIX/USTAR. GNU Tar files have the signature "ustar " (with a trailing space) at offset 257, and POSIX/USTAR files have "ustar\0" at offset 257.

Some other variants have the signature "tar\0" at offset 508.

But most older tar files have no signature, and must be identified by other means. Validating that the checksum field at offset 148 is well-formed and accurate is a possibility.

Note that most tar files have no global file header, so the tests suggested here are actually looking at the first member file in the archive.

See also

  • AR - comparable format
  • cpio - comparable format
  • Disk Archiver (DAR) was intended by its authors as a replacement for TAR, supporting file compression among other features.
  • Pax - extension

Examples

To list the contents of a .tar.gz archive:

tar tvzf example.tar.gz

To extract a .tar.gz archive to the current directory:

tar xvzf example.tar.gz

With some versions of tar, the "z" flag can be omitted when extracting or listing ("tar xvf ...").

To compress two files into a .tar.gz archive:

tar cvzf example2.tar.gz inputfile1 inputfile2

Specifications

Software

Sample files

References

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox