- This article is about the electronic archive format. For physical tape archives, see Magnetic tape or Punched tape.
Tape Archive (tar) is a traditional UNIX archive format, defined in POSIX.1-1988 and later POSIX.1-2001. Its original purpose was to archive files on backup tapes. Archived data in the tar format is sometimes referred to as a "tarball".
While tar itself does not offer any compression, it's frequently used together with a stream compression format such as gzip, bzip2, or XZ to provide file archiving plus compression. Most modern implementations of tar, present in UNIX/Linux systems, offer built-in support for this combined operation by using a modifier such as z (gzip) or j (bzip2). When extracting files, the compression format can sometimes be detected and handled automatically.
Files compressed this way should have a dual file extension such as .tar.gz or tar.bz2. Sometimes the .tgz extension is used in place of .tar.gz. Rarely, other shortened extensions are used:
- .tbz instead of .tar.bz2 (bzip2)
- .txz instead of .tar.xz (XZ)
- .tlz instead of .tar.lz (Lzip) or .tar.lzma (LZMA_Alone)
- .taz instead of .tar.Z (compress) (or possibly some other compressed format)
- .tz instead of .tar.Z (compress) (or possibly some other compressed format)
There exist some variants to the TAR archive format. The original POSIX.1-1988 TAR format had limitations on the type of files it could contain and the length of filenames. That's why the USTAR format was later developed and standardized as POSIX IEEE P1003.1. Jörg Schilling has collected some information about the different implementations; see the references section. There's also an old version (often referred to as "non-ANSI Tar" or simply "old Tar") which both GNU Tar and STar can read and write.
Pax is a system of extensions to USTAR format.
Most modern tar files use a format based on either GNU Tar or POSIX/USTAR. GNU Tar files have the signature "
ustar " (with a trailing space) at offset 257, and POSIX/USTAR files have "
ustar\0" at offset 257.
Some other variants have the signature "
tar\0" at offset 508.
But most older tar files have no signature, and must be identified by other means. Validating that the checksum field at offset 148 is well-formed and accurate is a possibility.
Note that most tar files have no global file header, so the tests suggested here are actually looking at the first member file in the archive.
- AR - comparable format
- cpio - comparable format
- Disk Archiver (DAR) was intended by its authors as a replacement for TAR, supporting file compression among other features.
- Pax - extension
To list the contents of a .tar.gz archive:
tar tvzf example.tar.gz
To extract a .tar.gz archive to the current directory:
tar xvzf example.tar.gz
With some versions of tar, the "
z" flag can be omitted when extracting or listing ("
tar xvf ...").
To compress two files into a .tar.gz archive:
tar cvzf example2.tar.gz inputfile1 inputfile2
- Linux man page for tar
- man page for bsdtar
- GNU Tar: Tar Internals
- star man page by Jörg Schilling
- Chart of TAR format