Datasets with the BigML Dashboard

1 Introduction

A dataset is a structured version of your data. BigML computes some basic statistics for each one of the fields of these datasets. The main goal of datasets is enabling effective Data Wrangling of your data, so you can build the right BigML model for your problem. This is a key step to ultimately achieve the best results for your Machine Learning tasks.

In this chapter we assume you understand what a Source is, the formats BigML accepts, the types of fields allowed in the source, the types of sources BigML supports, size limits, etc. If you would like to dive deeper into sources and learn all the details, we recommend that you read the Sources with the BigML Dashboard document [ 22 ] .

BigML also provides you with a large variety of datasets, available in BigML Gallery, which you can clone and reuse. We explain how to get them in section 13.1 .

This chapter contains comprehensive description of BigML datasets including how they can be created with just 1-click (see Chapter 3 ), and all configuration options available (see Chapter 4 ). Chapter 2 explains the technicalities behind datasets and how BigML computes statistics for each field. Chapter 5 helps you understand how BigML represents datasets in the Dashboard and the options available for you to configure your Dataset to best fit your needs.

In addition, BigML presents the dynamic scatterplot visualization, a way to analyze your data to get better features for your Machine Learning models. (See Chapter 6 for more details). You can also find other options like filtering and sampling your dataset (see Chapter 7 ) and transforming your data, such as creating new fields, aggregating instances, joining and merging different datasets (see Chapter 8 ). The process of transforming your dataset is a fundamental step towards the creation of an effective Machine Learning solution. Moreover, you can add descriptive information to your dataset (Chapter 11 ), export it to several formats and download it to your machine (see section 9.2 and section 9.1 ), move it to another project (Chapter 14 ), and delete it permanently from your account (Chapter 16 ).

In BigML, the second tab of the main menu of your Dashboard allows you to list all of your available datasets (Figure 1.1 ). In this dataset list view you can see for each dataset, the Source Details, Name, Age (time since the source was created), Size, Number of Models, Ensembles, Logistic Regressions, Clusters, Anomalies, and Associations created. The search menu option in the top right corner of the dataset list view allows you to search your datasets by name. This is very handy when you have a large number of datasets, and you cannot list them all in the same page.

\includegraphics[]{images/dataset-listing}
Figure 1.1 Datasets list view

By default, every time you start a new project, your list of datasets will be empty. (See Figure 1.2 .)

\includegraphics[]{images/empty-dataset}
Figure 1.2 Empty Dashboard dataset view

Finally, the icon in Figure 1.3 represents a dataset.

\includegraphics[width=2cm]{images/dataset-icon}
Figure 1.3 Dataset Icon