Parity Volume Set

From Just Solve the File Format Problem
(Difference between revisions)
Jump to: navigation, search
m (Discussion: Modified bracket usage for reference link to Par3 specification to avoid causing display issues in References section.)
 
(7 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 
{{FormatInfo
 
{{FormatInfo
 
|formattype=electronic
 
|formattype=electronic
|thiscat=Error detection and correction
+
|subcat=Error detection and correction
 
|extensions={{ext|par}}, {{ext|pxx}}, {{ext|par2}}, {{ext|pa3}}
 
|extensions={{ext|par}}, {{ext|pxx}}, {{ext|par2}}, {{ext|pa3}}
 +
|released=2001<ref>[https://sourceforge.net/projects/parchive/files/OldFiles/ parchive Files - SourceForge.net]</ref>
 
}}
 
}}
  
'''Parity Volume Set''' (also known as '''parity archive''' or '''parchive''') is a file format for storing redundant data for one or more input files. These data can be used to repair the input files if they get damaged. The error correction is based on the [[Reed-Solomon_error_correction|Reed-Solomon algorithm]]. Three versions of the format exist: ''Par1'', ''Par2'' and ''Par3''. The ''Par3'' format never made it beyond the proposal stage, but it is used by the MultiPar tool.  
+
'''Parity Volume Set''' (also known as '''parity archive''' or '''parchive''') is a file format for storing redundant data for one or more input files. These data can be used to repair the input files if they get damaged. The error correction is based on the [[Reed-Solomon_error_correction|Reed-Solomon algorithm]]. Three versions of the format exist: ''Par1'', ''Par2'' and ''Par3''. The ''Par3'' format is in "near-final form"<ref>[https://github.com/Parchive/par3cmdline/commit/4c1b780ebe9f083eccf8a668435613c348eae313 Commit 4c1b780 - 2022-01-29 - par3cmdline - GitHub]</ref>, it is used by an old version of MultiPar tool,<ref>[https://github.com/Yutaka-Sawada/MultiPar/issues/46#issuecomment-948179335 Par3 support? #46 - MultiPar - GitHub]</ref> as well as <code>par3cmdline</code>.<ref>[https://github.com/Parchive/par3cmdline par3cmdline - GitHub]</ref>
 +
 
 +
== Discussion ==
 +
Historically, these were multi-part archives that was distributed in Usenet (a.k.a., "network news"), but can still be used in prevention of complete data loss during transit or storage. Parchive is like RAID for files instead of a whole file system.
 +
 
 +
The technology is based on a 'Reed-Solomon Code' implementation that allows for recovery of any 'X' real data-blocks for 'X' parity data-blocks present. (Data-blocks referring to files OR much smaller virtual slices of files).<ref>[https://parchive.sourceforge.net/ Parchive: Parity Archive Tool - SourceForge.net]</ref>
 +
 
 +
Modern <code>Par2</code> software can take advantage of GPU to speed up recovery file creation.<ref>[https://github.com/Yutaka-Sawada/MultiPar/issues/40 GPU Acceleration via par2j64.exe??? Is it possible? How do I do it? #40 - MultiPar - GitHub]</ref><ref>[https://github.com/Parchive/par2cmdline/pull/176 Added support for GPU acceleration (CUDA) on recovery file creation. #176 - par2cmdline - GitHub]</ref>
 +
 
 +
While <code>Par3</code> has yet to be finalized as of writing in 2025, the "2022-01-28 ALPHA DRAFT" specifications addresses interesting flaws that has existed since its conception:
 +
  Major differences from Parchive 2.0 are:
 +
  ...(redacted for brevity)
 +
  * replace MD5 hash (It is both slow and less secure.)
 +
  ...(redacted for brevity)
 +
 
 +
  Part of "support any linear code" is to fix the major bug in Parchive 2.0. Parchive 2.0 did not do Reed-Solomon encoding as it promised. There was a major mistake in the paper that Parchive 2.0 relied on.
 +
  The problem manifested as a bug in Parchive 1.0 and, while Parchive 2.0 reduced its occurrence, it did not fix the problem. Parchive 2.0 did not use an always invertible matrix; it essentially used a random
 +
  matrix, which (luckily) is invertible with high probability. Parchive 3.0 fixes that bug.
 +
 
 +
  The other part of "support any linear code" is supporting codes beside Reed-Solomon. Reed-Solomon has excellent data protection, but is slow to compute. LDPC and sparse random matrices will speed things
 +
  up dramatically, with a slight increase in errors that cannot be recovered from.
 +
<ref>[https://parchive.github.io/doc/Parity_Volume_Set_Specification_v3.0.html#design-goals Parity Volume Set Specification 3.0 (2022-01-28 ALPHA DRAFT) - GitHub]</ref>
  
 
== Identification ==
 
== Identification ==
Line 11: Line 33:
 
A '''Par1''' file starts with the following byte sequence:
 
A '''Par1''' file starts with the following byte sequence:
  
<code>50 41 52 00 00 00 00 00</code>
+
{{magic|50 41 52 00 00 00 00 00}}
  
 
This corresponds to the ASCII text string <code>PAR</code>, followed by 5 null bytes.  
 
This corresponds to the ASCII text string <code>PAR</code>, followed by 5 null bytes.  
Line 17: Line 39:
 
A '''Par2''' file starts with the bytes:
 
A '''Par2''' file starts with the bytes:
  
<code>50 41 52 32 00 50 4B 54</code>
+
{{magic|50 41 52 32 00 50 4B 54}}
  
 
This corresponds to ASCII text string <code>PAR2</code>, followed by a null byte and the text string <code>PKT</code>.  
 
This corresponds to ASCII text string <code>PAR2</code>, followed by a null byte and the text string <code>PKT</code>.  
Line 23: Line 45:
 
Finally, a '''Par3''' file can be identified by the following 4-byte sequence:
 
Finally, a '''Par3''' file can be identified by the following 4-byte sequence:
  
<code>50 41 33 00</code>
+
{{magic|50 41 33 00}}
  
 
This corresponds to the text string <code>PA3</code>, followed by a null byte.  
 
This corresponds to the text string <code>PA3</code>, followed by a null byte.  
  
 
== Specifications ==
 
== Specifications ==
* [http://parchive.sourceforge.net/docs/specifications/parity-volume-spec-1.0/article-spec.html Parity Volume Set Specification v1.0]
+
{|
* [http://parchive.sourceforge.net/docs/specifications/parity-volume-spec/article-spec.html Parity Volume Set Specification 2.0]
+
|'''Specification version'''
* [https://web.archive.org/web/20100911002706/http://hp.vector.co.jp/authors/VA021385/par3_spec_prop.htm proposal for Parchive Specification 3.0]
+
|'''SourceForge/Internet Archive link'''
 +
|'''GitHub link'''
 +
|-
 +
| Parity Volume Set Specification v1.0
 +
| [http://parchive.sourceforge.net/docs/specifications/parity-volume-spec-1.0/article-spec.html SourceForge]
 +
| [https://parchive.github.io/doc/Parity%20Volume%20Set%20Specification%20v1.0.html GitHub]
 +
|-
 +
| Parity Volume Set Specification 2.0
 +
| [http://parchive.sourceforge.net/docs/specifications/parity-volume-spec/article-spec.html SourceForge]
 +
| [https://parchive.github.io/doc/Parity%20Volume%20Set%20Specification%20v2.0.html GitHub]
 +
|-
 +
| proposal for Parchive Specification 3.0
 +
| [https://web.archive.org/web/20100911002706/http://hp.vector.co.jp/authors/VA021385/par3_spec_prop.htm hp.vector.co.jp IA mirror]
 +
| [https://parchive.github.io/doc/Parity_Volume_Set_Specification_v3.0.html GitHub]
 +
|}
 +
 
 +
== par2 Examples ==
 +
Create uniformed recovery file sizes with 100% redundancy for example.dwarfs
 +
  par2 create -u -r100 example.dwarfs
 +
''This makes it more like Par1''<ref>[https://github.com/parchive/par2cmdline?tab=readme-ov-file#why-is-par-20-better-than-par-10 Why is PAR 2.0 better than PAR 1.0? - par2cmdline - GitHub]</ref>
  
 
== Software ==
 
== Software ==
 
+
* Windows
* [https://github.com/Parchive/par2cmdline par2cmdline] (Linux par2 tool)
+
** [http://www.quickpar.org.uk/index.htm QuickPar] (par2, GUI)
* [http://hp.vector.co.jp/authors/VA021385/ MultiPar] (Windows, GUI)
+
** [https://github.com/Yutaka-Sawada/MultiPar MultiPar] (GUI)
 +
* Mac
 +
** [https://gp.home.xs4all.nl/Site/MacPAR_deLuxe.html MacPAR deLuxe]
 +
* Linux
 +
** [https://pypar2.fingelrest.net/ PyPar2]
 +
** [https://github.com/Parchive/par2cmdline par2cmdline]
 +
** [https://github.com/Parchive/par3cmdline par3cmdline]
  
 
== Sample files ==
 
== Sample files ==
 +
=== Par1 sample files ===
 +
See [https://discmaster.textfiles.com/search?q=%22par%22&mode=deep&extension=par&format=parityArchiveVolumeSet Search results with par extensions - Discmaster.textfiles.com] for sample <code>Par1</code> files.
 +
 +
<code>Par1</code> files are usually distributed in a set, containing <code><name>.par</code> and <code>.p<num></code>, where <code><name></code> is the name of the file, typically to be created as a parity archive of, and <code><num></code> is an integer that starts with <code>01</code>, incrementing for each and every related <code>Par1</code> archive.<ref>[https://web.archive.org/web/20250612232413/https://www.techsono.com/files/par2 Par2 Files Explained in Plain English - Internet Archive copy]</ref>
 +
 +
See Also:
 +
* [https://discmaster.textfiles.com/search?mode=deep&extension=p01&format=parityArchiveVolumeSet Search results with p01 extensions - Discmaster.textfiles.com]
 +
* [https://discmaster.textfiles.com/search?mode=deep&extension=p02&format=parityArchiveVolumeSet Search results with p02 extensions - Discmaster.textfiles.com]
 +
 +
=== Par2 sample files ===
 +
See [https://discmaster.textfiles.com/search?mode=deep&extension=par2&format=parityArchiveVolumeSet&format=unknown&sortBy=name.keyword Search results with par2 extensions and are likely parity archive - Discmaster.textfiles.com] for samples.
 +
 +
These files are usually distributed in a set, containing <code><name>.par2</code> and <code><name>.vol<numA>+<numB>.par2</code>, where <code>name</code> is the name of the file, typically to be created as a parity archive of, and <code><num></code> is an incrementing number, and is often starts with <code>0</code> for <numA>.<ref>[https://web.archive.org/web/20250612232413/https://www.techsono.com/files/par2 Par2 Files Explained in Plain English - Internet Archive copy]</ref>
 +
 +
Additionally, <code>Par2</code> files bear <code>.par2</code> extension, making identification easier and less ambiguous compared to <code>Par1</code>, which has <code>.par</code> extension, and can be confused with extensions that also begins with the same <code>.par</code>.
  
 
== Links ==
 
== Links ==
 
* [[Wikipedia:Parchive]]
 
* [[Wikipedia:Parchive]]
* [http://www.techsono.com/usenet/files/par2 Par2 Files Explained in Plain English]
+
* [http://www.techsono.com/usenet/files/par2 Par2 Files Explained in Plain English] (Broken link) <sup>[https://web.archive.org/web/20250612232413/https://www.techsono.com/files/par2 [Internet Archive copy]]</sup>
 +
* [https://parchive.github.io/ Parchive project page on GitHub]
 +
* [https://parchive.sourceforge.net/ Parchive project page on SourceForge.net (Legacy)]
 +
 
 +
== References ==
 +
<references/>

Latest revision as of 09:16, 9 August 2025

File Format
Name Parity Volume Set
Ontology
Extension(s) .par, .pxx, .par2, .pa3
Released 2001[1]

Parity Volume Set (also known as parity archive or parchive) is a file format for storing redundant data for one or more input files. These data can be used to repair the input files if they get damaged. The error correction is based on the Reed-Solomon algorithm. Three versions of the format exist: Par1, Par2 and Par3. The Par3 format is in "near-final form"[2], it is used by an old version of MultiPar tool,[3] as well as par3cmdline.[4]

Contents

[edit] Discussion

Historically, these were multi-part archives that was distributed in Usenet (a.k.a., "network news"), but can still be used in prevention of complete data loss during transit or storage. Parchive is like RAID for files instead of a whole file system.

The technology is based on a 'Reed-Solomon Code' implementation that allows for recovery of any 'X' real data-blocks for 'X' parity data-blocks present. (Data-blocks referring to files OR much smaller virtual slices of files).[5]

Modern Par2 software can take advantage of GPU to speed up recovery file creation.[6][7]

While Par3 has yet to be finalized as of writing in 2025, the "2022-01-28 ALPHA DRAFT" specifications addresses interesting flaws that has existed since its conception:

 Major differences from Parchive 2.0 are:
 ...(redacted for brevity)
 * replace MD5 hash (It is both slow and less secure.)
 ...(redacted for brevity)
 
 Part of "support any linear code" is to fix the major bug in Parchive 2.0. Parchive 2.0 did not do Reed-Solomon encoding as it promised. There was a major mistake in the paper that Parchive 2.0 relied on. 
 The problem manifested as a bug in Parchive 1.0 and, while Parchive 2.0 reduced its occurrence, it did not fix the problem. Parchive 2.0 did not use an always invertible matrix; it essentially used a random
 matrix, which (luckily) is invertible with high probability. Parchive 3.0 fixes that bug.
 
 The other part of "support any linear code" is supporting codes beside Reed-Solomon. Reed-Solomon has excellent data protection, but is slow to compute. LDPC and sparse random matrices will speed things 
 up dramatically, with a slight increase in errors that cannot be recovered from.

[8]

[edit] Identification

A Par1 file starts with the following byte sequence:

50 41 52 00 00 00 00 00

This corresponds to the ASCII text string PAR, followed by 5 null bytes.

A Par2 file starts with the bytes:

50 41 52 32 00 50 4B 54

This corresponds to ASCII text string PAR2, followed by a null byte and the text string PKT.

Finally, a Par3 file can be identified by the following 4-byte sequence:

50 41 33 00

This corresponds to the text string PA3, followed by a null byte.

[edit] Specifications

Specification version SourceForge/Internet Archive link GitHub link
Parity Volume Set Specification v1.0 SourceForge GitHub
Parity Volume Set Specification 2.0 SourceForge GitHub
proposal for Parchive Specification 3.0 hp.vector.co.jp IA mirror GitHub

[edit] par2 Examples

Create uniformed recovery file sizes with 100% redundancy for example.dwarfs

 par2 create -u -r100 example.dwarfs

This makes it more like Par1[9]

[edit] Software

[edit] Sample files

[edit] Par1 sample files

See Search results with par extensions - Discmaster.textfiles.com for sample Par1 files.

Par1 files are usually distributed in a set, containing <name>.par and .p<num>, where <name> is the name of the file, typically to be created as a parity archive of, and <num> is an integer that starts with 01, incrementing for each and every related Par1 archive.[10]

See Also:

[edit] Par2 sample files

See Search results with par2 extensions and are likely parity archive - Discmaster.textfiles.com for samples.

These files are usually distributed in a set, containing <name>.par2 and <name>.vol<numA>+<numB>.par2, where name is the name of the file, typically to be created as a parity archive of, and <num> is an incrementing number, and is often starts with 0 for <numA>.[11]

Additionally, Par2 files bear .par2 extension, making identification easier and less ambiguous compared to Par1, which has .par extension, and can be confused with extensions that also begins with the same .par.

[edit] Links

[edit] References

  1. parchive Files - SourceForge.net
  2. Commit 4c1b780 - 2022-01-29 - par3cmdline - GitHub
  3. Par3 support? #46 - MultiPar - GitHub
  4. par3cmdline - GitHub
  5. Parchive: Parity Archive Tool - SourceForge.net
  6. GPU Acceleration via par2j64.exe??? Is it possible? How do I do it? #40 - MultiPar - GitHub
  7. Added support for GPU acceleration (CUDA) on recovery file creation. #176 - par2cmdline - GitHub
  8. Parity Volume Set Specification 3.0 (2022-01-28 ALPHA DRAFT) - GitHub
  9. Why is PAR 2.0 better than PAR 1.0? - par2cmdline - GitHub
  10. Par2 Files Explained in Plain English - Internet Archive copy
  11. Par2 Files Explained in Plain English - Internet Archive copy
Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox