Sources with the BigML Dashboard

6.2 Single Field or Multiple Fields

The Single Field or Multiple Fields switch allows you to tell BigML if your source is composed of only one field of type items.

6.2.1 Auto-detection of single, item-type fields

Sources containing a field of type items may be submitted without surrounding quotes, in which case the input will appear to have a varying number of columns in each row. Figure 6.2 shows an excerpt of a single-field source. BigML will attempt to detect this case, rather than assume a “square” CSV format with a large number of bad rows. (See Figure 6.3 ). The criteria are as follows:

  • The proportion of rows, whose column counts differ from the most frequent count, is greater than 0.25.

  • There are no missing values as items.

  • There are no items greater in length than 64 characters.

basket
citrus fruit,semi-finished bread,margarine,ready soups
tropical fruit,yogurt,coffee
whole milk
pip fruit,yogurt,cream cheese ,meat spreads
other vegetables,whole milk,condensed milk,long life bakery
product
whole milk,butter,yogurt,rice,abrasive cleaner
rolls/buns
other vegetables,UHT-milk,rolls/buns,bottled beer,liquor
(appetizer)
pot plants
whole milk,cereals
tropical fruit,other vegetables,white bread,bottled
water,chocolate
citrus fruit,tropical fruit,whole
milk,butter,curd,yogurt,flour,bottled
water,dishes
beef
frankfurter,rolls/buns,soda
chicken,tropical fruit
Figure 6.2 An example of single field file with an item-type field
\includegraphics[]{images/sources/source-single-field}
Figure 6.3 Source with a single field of type items

When a single-column source is detected, its separator is set to the empty string (""). There is no separator when there are not at least two columns to separate. You can also indicate that a source consists of a single column by setting the separator to the empty string ("").

Conversely, erroneous single-column auto-detections can be overridden via an update of the source by setting an items separator that is not the empty string.