CSV
Contents |
CSV - description
CSV - comma separated values - is a text-based format typically used for the storage or exchange of database-like records. In essence, CSV files consist of a series of records each of which contains a number of fields. The fields are separated by a known delimiter - canonically a comma - and the records are typically separated by whatever constitutes a newline on the system which generated the CSV file. A quote character is used to surround fields which themselves contain the delimiter character or the quote character, and in some implementations is used to surround any field which contains non-alphanumeric characters. The quote character is typically " but is often '.
A simplistic and quite possibly syntactically invalid BNF definition for CSV is as follows:
<CSVFile> ::= <Record>*
<Record> ::= { <Field> (<Delimiter> <Field>)* } <EOL>
<Field> ::= <SimpleField> <QuotedField>
<SimpleField> :== AlphaNum* ; Any sequence of alpha-numeric characters
<QuotedField> :== <QuoteChar> <Anychar>* <QuoteChar> ; See below for quite how flexible <anychar> is
<Delimiter> :== " | ' ;but note that they generally must match
Implementations vary in their interpretation and generation of CSV files. The best, as ever, are strict in what they generate but generous in what they accept. Known variants of what is acceptable include:
Whether quoted fields must be quoted with " or ' or whether either is acceptable
Whether <EOL> is ASCII NL, CR, CR NL, NL CR or any combination of these. (For instance, some implementations expecting a bare NL but seeing a record ending in CR NL will treat the CR as part of the final field; some will see it as a record delimiter on its own, making a blank record following; some will correctly recognise CR NL as a variant form of <EOL>)
Whether all records must contain the same number of fields or not
Whether special interpretation can be given to the first record, naming the fields in subsequent records (implementations that accept this will typically expect every record to contain the same number of fields as the first record)
Whether quotes only appear around fields which themselves contain either a quote character, a delimiter or a newline or whether quotes can be placed around any field
Whether quotes inside a field are doubled or escaped. E.g. if the quote character is '' and a field's value is you're should the field appear as you\'re or youre ?
Examples
Other descriptions
Code
--KevinAshley (talk) 03:28, 1 November 2012 (UTC)