WARC

From Just Solve the File Format Problem
(Difference between revisions)
Jump to: navigation, search
(References: Prioritised directl link to the most recent warctools package over the COPTR link.)
Line 11: Line 11:
 
== Sample files ==
 
== Sample files ==
 
* [http://archive.org/details/testWARCfiles Test WARC Files] warc.gz file from Internet Archive.
 
* [http://archive.org/details/testWARCfiles Test WARC Files] warc.gz file from Internet Archive.
 +
 +
== Tools ==
 +
* [https://pypi.python.org/pypi/warctools/ WARC Tools (in Python)]
 +
** Some history on the Python tools is available on [http://coptr.digipres.org/Warctools here on the COPTR wiki].
 +
* [https://github.com/chfoo/warcat warcat: Tool and library for handling Web ARChive (WARC) files.]
  
 
== References ==
 
== References ==
Line 17: Line 22:
 
* [http://archive-access.sourceforge.net/warc/ Working drafts for WARC specification]
 
* [http://archive-access.sourceforge.net/warc/ Working drafts for WARC specification]
 
* [http://bibnum.bnf.fr/WARC/ The WARC File Format (ISO 28500) - Information, Maintenance, Drafts]
 
* [http://bibnum.bnf.fr/WARC/ The WARC File Format (ISO 28500) - Information, Maintenance, Drafts]
* [https://pypi.python.org/pypi/warctools/ WARC Tools (in Python)]
 
** Some history on the Python tools is available on [http://coptr.digipres.org/Warctools here on the COPTR wiki].
 
 
* [http://www.hanzoarchives.com/learning/warc_files Slide show on WARC]
 
* [http://www.hanzoarchives.com/learning/warc_files Slide show on WARC]
 
* [http://archiveteam.org/index.php?title=The_WARC_Ecosystem The WARC Ecosystem (Archive Team)]
 
* [http://archiveteam.org/index.php?title=The_WARC_Ecosystem The WARC Ecosystem (Archive Team)]
  
 
[[Category:Internet Archive]]
 
[[Category:Internet Archive]]

Revision as of 01:16, 7 July 2014

File Format
Name WARC
Ontology
Extension(s) .warc
.warc.gz
PRONOM fmt/289

Successor to the ARC (Internet Archive) format. Standardized as ISO 28500:2009, Information and documentation -- WARC file format. Developed under the auspices of the International Internet Preservation Consortium. WARC was developed as an extension to ARC in part to provide better capabilities for managing Web archives for the long term, allowing for capture of more metadata about the circumstances of archiving.

WARC files are often compressed using gzip, resulting in a .warc.gz extension.

Sample files

Tools

References

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox