CSV

From Just Solve the File Format Problem
Revision as of 03:28, 1 November 2012 by KevinAshley (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Contents

CSV - description

CSV - comma separated values - is a text-based format typically used for the storage or exchange of database-like records. In essence, CSV files consist of a series of records each of which contains a number of fields. The fields are separated by a known delimiter - canonically a comma - and the records are typically separated by whatever constitutes a newline on the system which generated the CSV file. A quote character is used to surround fields which themselves contain the delimiter character or the quote character, and in some implementations is used to surround any field which contains non-alphanumeric characters. The quote character is typically " but is often '.

A simplistic and quite possibly syntactically invalid BNF definition for CSV is as follows:

<CSVFile> ::= <Record>*

<Record> ::= { <Field> (<Delimiter> <Field>)* } <EOL>

<Field> ::= <SimpleField> <QuotedField>

<SimpleField> :== AlphaNum*  ; Any sequence of alpha-numeric characters

<QuotedField> :== <QuoteChar> <Anychar>* <QuoteChar>  ; See below for quite how flexible <anychar> is

<Delimiter> :== " | '  ;but note that they generally must match

Implementations vary in their interpretation and generation of CSV files. The best, as ever, are strict in what they generate but generous in what they accept. Known variants of what is acceptable include:

Whether quoted fields must be quoted with " or ' or whether either is acceptable

Whether <EOL> is ASCII NL, CR, CR NL, NL CR or any combination of these. (For instance, some implementations expecting a bare NL but seeing a record ending in CR NL will treat the CR as part of the final field; some will see it as a record delimiter on its own, making a blank record following; some will correctly recognise CR NL as a variant form of <EOL>)

Whether all records must contain the same number of fields or not

Whether special interpretation can be given to the first record, naming the fields in subsequent records (implementations that accept this will typically expect every record to contain the same number of fields as the first record)

Whether quotes only appear around fields which themselves contain either a quote character, a delimiter or a newline or whether quotes can be placed around any field

Whether quotes inside a field are doubled or escaped. E.g. if the quote character is '' and a field's value is you're should the field appear as you\'re or youre ?



Examples

Other descriptions

Code

--KevinAshley (talk) 03:28, 1 November 2012 (UTC)

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox