Datasets with the BigML Dashboard

2 Understanding Datasets

A dataset is a structured version of your data. BigML computes both general statistics for the dataset and individual statistics per field. This chapter describes the technicalities behind datasets.

Figure 2.1 shows how BigML lists all fields, the field type, and the general statistics, including:

  • Count: the number of instances containing data for this field.

  • Missing: the number of instances missing a value for this field.

  • Errors: information about ill-formatted fields that includes the total format errors for the field and a sample of the ill-formatted tokens.

\includegraphics[]{images/understanding}
Figure 2.1 Dataset basic view

The Histograms communicate the underlying distributions of your data. Depending on the size of your dataset and the number of unique values, these histograms may either be exact or may be approximations. Read this blog post for more details.

The following Subsections describe how BigML processes the data and computes the statistics differently for each field type.