Sources with the BigML Dashboard

3.1 Comma-Separated Values

The CSV (Comma Separated Values) file format is a well-known format that has long been used for exchanging data between applications.

Your CSV files must conform to the following rules before creating a source in BigML:

  • A CSV file uses plain text to store tabular data.

  • In a CSV file, each line of the file is a record.

  • Each record is usually separated by a comma (“,”) but other separators like the semi-colon (“;”), the colon (“:”), or the pipe “|”, can also be used.

  • Each record must contain exactly the same number of fields.

  • Fields can be quoted using double quotes (“”).

  • Fields that contain commas (or the corresponding separator), double quotes, or line separators must be quoted.

  • The character encoding must be UTF-8.

  • Optionally, a CSV file can use the first line as a header to provide the names of each field.

BigML automatically parses your CSV files and is capable of dealing with most variants of the above options. It also provides you with different configuration options. (See Chapter 6 .)