Tape Archive

From Just Solve the File Format Problem
(Difference between revisions)
Jump to: navigation, search
(Compression by the (fairly obscure) Sunzip format)
(19 intermediate revisions by 4 users not shown)
Line 1: Line 1:
:''This article is about the electronic archive format. For physical tape archives, see [[Magnetic tape]] or [[Punched tape]].''
 
 
{{FormatInfo
 
{{FormatInfo
 
|formattype=electronic
 
|formattype=electronic
 
|subcat=Archiving
 
|subcat=Archiving
|extensions={{ext|tar}}, {{ext|tgz}}
+
|extensions={{ext|tar}}, {{ext|tgz}}, {{ext|tbz}}, {{ext|txz}}, {{ext|tlz}}, {{ext|tsz}}, {{ext|taz}}, {{ext|tz}}
 
|mimetypes={{mimetype|application/x-tar}}
 
|mimetypes={{mimetype|application/x-tar}}
 
|pronom={{PRONOM|x-fmt/265}}
 
|pronom={{PRONOM|x-fmt/265}}
 +
|released=1979
 
}}
 
}}
'''Tape Archive''' ('''tar''') is a traditional UNIX archive format, defined in POSIX.1-1988 and later POSIX.1-2001. Its original purpose was to archive files on backup tapes. While tar itself does not offer any compression, it's frequently used together with an stream compression format such as [[gzip]], [[bzip2]] and sometimes [[XZ]] to provide file archiving plus compression. Most modern implementations of tar, present in UNIX/Linux systems, offer built-in support for this combined operation by using a modifier such as z (GZip) or j (BZip2). Files compressed this way should have a dual file extension such a .tar.gz or tar.bz2 (but sometimes the .tgz extension is used in place of .tar.gz). Archived data in the tar format is sometimes referred to as a "tarball".
+
:''This article is about the electronic archive format. For physical tape archives, see [[Magnetic tape]] or [[Punched tape]].''
 +
 
 +
'''Tape Archive''' ('''tar''') is a traditional UNIX archive format, defined in POSIX.1-1988 and later POSIX.1-2001. Its original purpose was to archive files on backup tapes. Archived data in the tar format is sometimes referred to as a "tarball".
 +
 
 +
== Compression ==
 +
While tar itself does not offer any compression, it's frequently used together with a stream compression format such as [[gzip]], [[bzip2]], or [[XZ]] to provide file archiving plus compression. Most modern implementations of tar, present in UNIX/Linux systems, offer built-in support for this combined operation by using a modifier such as '''z''' (gzip) or '''j''' (bzip2). When extracting files, the compression format can sometimes be detected and handled automatically.
 +
 
 +
Files compressed this way should have a dual file extension such as .tar.gz or tar.bz2. Sometimes the .tgz extension is used in place of .tar.gz. Rarely, other shortened extensions are used:
 +
* .tbz instead of .tar.bz2 ([[bzip2]])
 +
* .txz instead of .tar.xz ([[XZ]])
 +
* .tlz instead of .tar.lz ([[Lzip]]) or .tar.lzma ([[LZMA Alone|LZMA_Alone]])
 +
* .tsz instead of .tar.sz ([[Sunzip]])
 +
* .taz instead of .tar.Z ([[compress]]) (or possibly some other compressed format)
 +
* .tz instead of .tar.Z ([[compress]]) (or possibly some other compressed format)
  
 
== Variants ==
 
== Variants ==
  
There exist actually some variants to the TAR archive. The original POSIX.1-1988 TAR format had limitations on the type of files it could contain and the length of filenames. That's why the USTAR format was later developed and standardized as POSIX IEEE P1003.1. Jörg Schilling has collected some information about the different implementations, see the references section. There's also an old version (often referred to as "non-ANSI Tar" or simply "old Tar") which both GNU Tar and STar can read and write
+
There exist some variants to the TAR archive format. The original POSIX.1-1988 TAR format had limitations on the type of files it could contain and the length of filenames. That's why the USTAR format was later developed and standardized as POSIX IEEE P1003.1. Jörg Schilling has collected some information about the different implementations; see the references section. There's also an old version (often referred to as "non-ANSI Tar" or simply "old Tar") which both GNU Tar and STar can read and write.
  
[[Disk Archiver]] (DAR) was intended by its authors as a replacement for TAR, supporting file compression among other features.
+
[[Pax]] is a system of extensions to USTAR format.
 +
 
 +
== Identification ==
 +
Most modern tar files use a format based on either GNU Tar or POSIX/USTAR. GNU Tar files have the signature "{{magic|ustar }}" (with a trailing space) at offset 257, and POSIX/USTAR files have "{{magic|ustar\0}}" at offset 257.
 +
 
 +
Some other variants have the signature "{{magic|tar\0}}" at offset 508.
 +
 
 +
But most older tar files have no signature, and must be identified by other means. Validating that the checksum field at offset 148 is well-formed and accurate is a possibility.
 +
 
 +
Note that most tar files have no global file header, so the tests suggested here are actually looking at the first member file in the archive.
 +
 
 +
== See also ==
 +
* [[AR]] - comparable format
 +
* [[cpio]] - comparable format
 +
* [[Disk Archiver]] (DAR) was intended by its authors as a replacement for TAR, supporting file compression among other features.
 +
* [[Pax]] - extension
  
 
== Examples ==
 
== Examples ==
 +
To list the contents of a .tar.gz archive:
  
Compressing two files into a .tar.gz archive
+
tar tvzf example.tar.gz
  
tar cvf output.tar.gz inputfile1 inputfile2
+
To extract a .tar.gz archive to the current directory:
  
Extracting a .tar.gz archive to the current directory.
+
tar xvzf example.tar.gz
  
tar xvf output.tar.gz
+
With some versions of tar, the "<code>z</code>" flag can be omitted when extracting or listing ("<code>tar xvf ...</code>").
  
== References ==
+
To compress two files into a .tar.gz archive:
  
 +
tar cvzf example2.tar.gz inputfile1 inputfile2
 +
 +
== Specifications ==
 
* [http://linux.die.net/man/1/tar Linux man page for tar]
 
* [http://linux.die.net/man/1/tar Linux man page for tar]
* [http://www.freebsd.org/cgi/man.cgi?query=tar&sektion=5&manpath=FreeBSD+8-current FreeBSD man page giving format info]
+
* [https://github.com/libarchive/libarchive/wiki/ManPageTar5 man page for bsdtar]
* [http://en.wikipedia.org/wiki/Tar_%28file_format%29 tar (file format) (Wikipedia)]
+
** [http://www.freebsd.org/cgi/man.cgi?query=tar&sektion=5&manpath=FreeBSD+8-current FreeBSD man page]
* [http://cdrecord.berlios.de/private/man/star/star.4.html star man page] by Jörg Schilling
+
* [https://www.gnu.org/software/tar/manual/html_node/Tar-Internals.html GNU Tar: Tar Internals]
* [http://xkcd.com/1168/ XKCD comic]
+
* [http://cdrtools.sourceforge.net/private/man/star/star.4.html star man page] by Jörg Schilling
 +
* [https://twitter.com/angealbertini/status/532263011952513024/photo/1 Chart of TAR format]
 +
 
 +
== Software ==
 +
* [https://www.gnu.org/software/tar/ GNU Tar] · [https://www.gnu.org/software/tar/manual/ Documentation] · [https://savannah.gnu.org/projects/tar Development]
 +
* [https://www.libarchive.org libarchive / bsdtar]
 +
* [[7-Zip]]
 +
* {{Deark}}
 +
 
 +
== Sample files ==
 +
* https://github.com/mgorny/tar-test-inputs
 +
* https://telparia.com/fileFormatSamples/archive/tar/sm.tar
 +
 
 +
== References ==
 +
 
 +
* [[Wikipedia: tar (computing)]]
 +
* [https://xkcd.com/1168/ XKCD comic]
 
* [http://superuser.com/questions/234649/how-to-extract-a-tar-file-tgz-in-windows Discussion on extracting a .tgz file]
 
* [http://superuser.com/questions/234649/how-to-extract-a-tar-file-tgz-in-windows Discussion on extracting a .tgz file]
 +
* [https://dev.gentoo.org/~mgorny/articles/portability-of-tar-features.html Portability of tar features]
 +
 +
[[Category:Backup]]

Revision as of 09:05, 7 September 2020

File Format
Name Tape Archive
Ontology
Extension(s) .tar, .tgz, .tbz, .txz, .tlz, .tsz, .taz, .tz
MIME Type(s) application/x-tar
PRONOM x-fmt/265
Released 1979
This article is about the electronic archive format. For physical tape archives, see Magnetic tape or Punched tape.

Tape Archive (tar) is a traditional UNIX archive format, defined in POSIX.1-1988 and later POSIX.1-2001. Its original purpose was to archive files on backup tapes. Archived data in the tar format is sometimes referred to as a "tarball".

Contents

Compression

While tar itself does not offer any compression, it's frequently used together with a stream compression format such as gzip, bzip2, or XZ to provide file archiving plus compression. Most modern implementations of tar, present in UNIX/Linux systems, offer built-in support for this combined operation by using a modifier such as z (gzip) or j (bzip2). When extracting files, the compression format can sometimes be detected and handled automatically.

Files compressed this way should have a dual file extension such as .tar.gz or tar.bz2. Sometimes the .tgz extension is used in place of .tar.gz. Rarely, other shortened extensions are used:

  • .tbz instead of .tar.bz2 (bzip2)
  • .txz instead of .tar.xz (XZ)
  • .tlz instead of .tar.lz (Lzip) or .tar.lzma (LZMA_Alone)
  • .tsz instead of .tar.sz (Sunzip)
  • .taz instead of .tar.Z (compress) (or possibly some other compressed format)
  • .tz instead of .tar.Z (compress) (or possibly some other compressed format)

Variants

There exist some variants to the TAR archive format. The original POSIX.1-1988 TAR format had limitations on the type of files it could contain and the length of filenames. That's why the USTAR format was later developed and standardized as POSIX IEEE P1003.1. Jörg Schilling has collected some information about the different implementations; see the references section. There's also an old version (often referred to as "non-ANSI Tar" or simply "old Tar") which both GNU Tar and STar can read and write.

Pax is a system of extensions to USTAR format.

Identification

Most modern tar files use a format based on either GNU Tar or POSIX/USTAR. GNU Tar files have the signature "ustar " (with a trailing space) at offset 257, and POSIX/USTAR files have "ustar\0" at offset 257.

Some other variants have the signature "tar\0" at offset 508.

But most older tar files have no signature, and must be identified by other means. Validating that the checksum field at offset 148 is well-formed and accurate is a possibility.

Note that most tar files have no global file header, so the tests suggested here are actually looking at the first member file in the archive.

See also

  • AR - comparable format
  • cpio - comparable format
  • Disk Archiver (DAR) was intended by its authors as a replacement for TAR, supporting file compression among other features.
  • Pax - extension

Examples

To list the contents of a .tar.gz archive:

tar tvzf example.tar.gz

To extract a .tar.gz archive to the current directory:

tar xvzf example.tar.gz

With some versions of tar, the "z" flag can be omitted when extracting or listing ("tar xvf ...").

To compress two files into a .tar.gz archive:

tar cvzf example2.tar.gz inputfile1 inputfile2

Specifications

Software

Sample files

References

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox