WARC
From Just Solve the File Format Problem
(Difference between revisions)
AndyJackson (Talk | contribs) (→References) |
Dan Tobias (Talk | contribs) |
||
Line 20: | Line 20: | ||
* [http://www.hanzoarchives.com/learning/warc_files Slide show on WARC] | * [http://www.hanzoarchives.com/learning/warc_files Slide show on WARC] | ||
* [http://archiveteam.org/index.php?title=The_WARC_Ecosystem The WARC Ecosystem (Archive Team)] | * [http://archiveteam.org/index.php?title=The_WARC_Ecosystem The WARC Ecosystem (Archive Team)] | ||
+ | |||
+ | [[Category:Internet Archive]] |
Revision as of 05:19, 10 November 2013
Successor to the ARC (Internet Archive) format. Standardized as ISO 28500:2009, Information and documentation -- WARC file format. Developed under the auspices of the International Internet Preservation Consortium. WARC was developed as an extension to ARC in part to provide better capabilities for managing Web archives for the long term, allowing for capture of more metadata about the circumstances of archiving.
WARC files are often compressed using gzip, resulting in a .warc.gz extension.
Sample files
- Test WARC Files warc.gz file from Internet Archive.
References
- Draft of ISO-DIS 28500 As circulated for ISO ballot and approval.
- WARC, Web ARChive file format, from Library of Congress resource on Sustainability of Digital Formats
- Working drafts for WARC specification
- The WARC File Format (ISO 28500) - Information, Maintenance, Drafts
- WARC Tools (in Python)
- Slide show on WARC
- The WARC Ecosystem (Archive Team)