Datasets with the BigML Dashboard

5.1 Dataset Layout

Figure 5.1 shows the general overview of a dataset. At the top part, from left to right, you find the navigation and status menu. This menu lets you see the privacy status of your dataset, view the source used to create this dataset, view resources created with this dataset and quickly access them through the counters, and the creation status of other resources you are creating with this dataset. Next to this menu you see the dataset name, followed by the actions and information menu, at the top right. This menu allows you to access the configure options, the 1-click actions, the 1-click scripts, and the More info panels. Both menus, navigation and status and actions and information are explained in subsection 5.1.1 .

Below the navigation and status menu you can see the dynamic scatterplot icon, which gives you access to the dynamic scatterplot view, a different way to visualize your dataset (explained in Chapter 6 ). On the right side, below the actions and information menu, there is a search box that lets you quickly find the fields containing the word you type in.

The middle part of Figure 5.1 shows a table with six columns, where the rows are the fields and the columns have the following information for each one of the fields: field name, the field type, count (instances with valid values), missing (instances with missing values), errors (instances with errors), and histograms. This information represents the general statistics computed by BigML (count, missing, and errors), and the statistics for each field displayed in a histogram, explained in Chapter 2 .

The red exclamation mark in the “state” field means that BigML has discarded this field as input field for training a model, since this field may have similar values for all the instances or very different values for each of the instances.

At the bottom left corner of Figure 5.1 , there is a dropdown that lets you select the number of fields you want to see in the same dataset view: 10, 25, 50 or 100. In the center, you can see the number of fields of this view and total number of fields contained in your dataset. Finally, if your dataset has a large number of fields that cannot all fit in the same dataset view, you can select the page at the bottom right corner.

\includegraphics[]{images/layout-overview}
Figure 5.1 Dataset layout overview

5.1.1 Dataset Top Menus

  • Navigation and Status:

    At the top left corner of the dataset view you can see the menu options shown in Figure 5.2 .

    \includegraphics[width=0.5\textwidth ]{images/left-menu}
    Figure 5.2 Navigation and status menu options of the dataset list view
    • The privacy menu option indicates whether the dataset you have open in the dataset view is public in the BigML Gallery or private, which means that only you can view that dataset unless you decide to share it with others. This process is explained in section 13.2 .

    • The view source menu option lets you see the source used to create the dataset you have open in the dataset view. If you deleted the source after creating the dataset, the view source menu option will no longer be a link to the source, since there will be no source available. This is indicated in the dataset list view, where the source icon will show a red cross. (See Figure 5.3 .)

      \includegraphics[]{images/deleted-source}
      Figure 5.3 Dataset list view shows a source that has been deleted
    • The counters menu option allows you to see the resources created with the dataset you have open in the dataset view.

    • The resource statuses menu option indicates when a dataset is being used to create a Resource. Thus, when you perform a Task, this menu option remains completed when there are no resources requested, and the status changes to: unknown, error found, waiting, queued, started, in-progress, summarized, and completed, when you request a task. In many cases, resources progress so quickly through some of the statuses that you will not see them appear on the Dashboard. The statuses you will see most often are in-progress and completed.

      In BigML, tasks are asynchronous, this means that the request to create a resource exits right away without waiting for its completion. You can request the creation of several resources in a very short period of time, and they will either run in parallel or will be queued, so the order of tasks is maintained. Some tasks may take a few minutes to process, depending on the size of your dataset and the subscription plan you have purchased, which determines how many tasks you may run in parallel at a given time.


  • Actions and Information:

    At the top right corner of the dataset view you can see the menu options shown in Figure 5.4 .

    \includegraphics[width=0.5\textwidth ]{images/right-menu}
    Figure 5.4 Actions and information menu options of the dataset list view
    • The Configure options gives you access to the different configuration panels for models, ensembles, logistic regressions, clusters, anomalies and associations. This menu also lets you access the configuration panels to split your dataset, sample it, filter it, and to add new fields to your dataset. These options are explained in section 7.1 , section 7.2 , section 7.3 and section 8.1 , respectively.

    • The 1-click actions gives you access to create your models, ensembles, logistic regressions, clusters, anomalies, or associations with just 1-click with default values. This menu also lets you automatically split your dataset, export it in the CSV or Tableau file format, move it to other projects, and delete it. These options are explained in subsection 7.1.1 , section 9.1 , section 9.2 , Chapter 14 and Chapter 16 , respectively.

    • The 1-click scripts , lets you add your Machine Learning Scripts to execute them anytime, with just 1-click, regardless of the view, from the BigML Dashboard.

    • The More info option leads you to three panels with information about your dataset. The details panel shows the size, number of fields, and number of instances contained in your dataset. The info panel where you can update the name of your dataset, add a description, tags, and assign a category (see Chapter 11 ). And the privacy panel with privacy details of your dataset (see Chapter 12 ).