Datasets with the BigML Dashboard

17 Takeaways

This document explains Datasets in detail. We finish it with a list of key points:

A dataset is a structured version of your data. On the one hand, BigML computes some general statistics, and on the other hand, BigML computes

statistics for each one of the fields.
A dataset is created processing your source. BigML computes basic statistics per each Field.
You can create datasets from sources that have previously been uploaded to BigML.
Datasets are the input to create a models, ensembles, logistic regressions, Evaluations, Clusterings, anomalies and associations. (See Figure 17.1 .)
A model, a cluster, an anomaly, an association, a batch prediction (using models, ensembles, or logistic regressions), a batch centroid, or a batch anomaly score can produce a dataset as an output. (See Figure 17.2 .)
It is not required for a dataset be entirely loaded into memory for it to be processed.
Often the transformations required for a dataset to optimally solve a given problem can be long, complex, and easy to get lost in. With BigML datasets, you do not risk losing track of the sequence of transformations you apply to your data.
You can easily update field types after the dataset creation. You need to configure the source of your dataset and update the changes.
You can create a dataset with just 1-click or select the size and the fields you want to include.
You can transform your original dataset and create a new one by splitting your dataset in two different subsets, sampling it, filtering it, and adding new fields to your dataset. (See Figure 17.3 .)
The Non-preferred fields fields and the Objective Field are inherited when you split your dataset in two subsets, when you sample it, filter it, or add new fields to your dataset. Also when you clone it from the BigML Gallery.
You can use the Flatline editor to perform powerful transformations with your dataset.
You can export and download your dataset to CSV format to use it in your local environment.
You can export and download your dataset to TDE format to use it in Tableau platform.
You can programmatically create, list, delete, and use your dataset for models creation, and later make predictions with them through the BigML API and the BigML bindings.
You can furnish your dataset with descriptive information (name, description, tags, and category) and also every individual field (name, label, and description).
There are three levels of privacy for BigML datasets: private, shared and public.
You can clone an existing dataset from BigML Gallery.
You can share your dataset in the BigML Gallery, either for free or with earnings.
You can only assign a dataset to a specific project.
You can move a dataset between projects.
You can stop the dataset creation.
You can permanently delete a dataset.

\includegraphics[]{images/datasets-workflow-input} — Figure 17.1 Using a dataset as the input to create your resources

\includegraphics[]{images/datasets-workflow-output} — Figure 17.2 Resources that produce a dataset as output

\includegraphics[]{images/datasets-workflow-new} — Figure 17.3 Creating new datasets from your original dataset