Cluster Analysis with the BigML Dashboard

7.2 Predicting Centroids

As shown in Figure 7.6 , BigML provides two options to create centroids from your cluster:

  • centroid: to predict single instances.

  • batch centroid: to predict multiple instances in batch.

\includegraphics[]{images/cluster-predictions/cluster-predict-pop-up}
Figure 7.6 Centroid option from cluster pop up menu

7.2.1 Centroid

BigML allows you to quickly find the nearest centroid for single data instances by providing a form containing the fields used by the cluster, so you can easily set the input data and get an immediate response. This option is only available from the BigML Dashboard for clusters with less than 100 fields. If you want to make single instance predictions for clusters with a higher number of fields, you can use the BigML API.

Follow the steps detailed below to create a single prediction:

  1. Choose the centroid option under the cluster 1-click menu (see Figure 7.7 ).

    \includegraphics[]{images/cluster-predictions/cluster-predict-1-click}
    Figure 7.7 Predict option from cluster 1-click menu

    Alternatively, click the centroid option in the pop up menu in the list view as shown in Figure 7.6 .

  2. You will be redirected to the prediction form where you will find all the fields used by the cluster.

    \includegraphics[]{images/cluster-predictions/centroid-fields}
    Figure 7.8 Single predictions form
  3. Set input values for the cluster fields. Depending on the field type, you will need to input the values differently:

    • Numeric fields: move the slider or input a specific value in the edition box.

    • Categorical fields: select one class from the selector.

    • Text fields: write one or several terms in the free text box.

    • Date-time fields: select the appropriate values from the selector.

    • Items fields: when you write the first three characters of an item name, several items matching those characters will appear, so you can select the right one. You can input more than one item for a field.

  4. Get the centroids along with the distance displayed on the top of the form. BigML predictions are synchronous, i.e., when you send the input data you get an immediate response. Moreover, single centroids are calculated locally so when you configure the input data you can see how predictions change immediately. (Read more about local predictions in Local Predictions ).

    \includegraphics[]{images/cluster-predictions/single-pred}
    Figure 7.9 Single predictions view
  5. Optionally, Save the prediction so you can get a view of the single clusters predictions (see subsection 7.4.1 ) and also to find it afterwards in the prediction list view.

    \includegraphics[]{images/cluster-predictions/predictions-section}
    Figure 7.10 Save single cluster predictions

Local Predictions

BigML provides Local predictionss from the BigML Dashboard for single instance predictions. Local predictions allow you to get a real-time prediction without consuming any credits or requiring an internet connection. This is possible because your cluster is saved in the browser’s memory so when the input values change, BigML immediately calculates the nearest centroid in a matter of microseconds.

Centroid with Images

BigML clusters can be trained from images using extracted image features (section 2.4 ). Because image features are automatically generated numeric fields, creating centroids with images is the same as creating other centroids. The only thing different is input fields of images.

Note: When the input fields contain images, in order to create the centroid, BigML will extract image features automatically to match what were used in the dataset to train the cluster.

\includegraphics[]{images/cluster-predictions/cluster-predict-image-select-single}
Figure 7.11 Select a single image source in the image input field

The cluster in Figure 7.11 , “butterfly-grape-horse”, was created from a dataset containing image features extracted from a pre-trained CNN, ResNet-18. Creating a centroid using the cluster will be directed to the prediction form which presents all input fields used by the cluster. One of them is the image field. Because this is a single centroid, which is a single prediction, an image is input by using a single image source. Clicking on the input field box, single image sources available will be in the dropdown list. There is also a search box which can be used to locate specific ones.

\includegraphics[]{images/cluster-predictions/cluster-predict-image-list-components}
Figure 7.12 List the components of a composite source

Oftentimes single image sources were used for creating a composite source, they become component sources of the composite source. Or an image was uploaded as a part of an archive file (zip/tar) which created a composite source. In those cases, the composite source will be shown in the dropdown list, along with an icon “List components”. In the example in Figure 7.12 , predict-images.zip is a composite source, click on the icon to show its component sources.

\includegraphics[]{images/cluster-predictions/cluster-pred-image-select-components}
Figure 7.13 Select a component of a composite source

After the component sources of the composite are listed, scroll the dropdown list to find the desired one, then click to select it, as shown in Figure 7.13 . There is also a search box to locate specific component sources.

\includegraphics[]{images/cluster-predictions/cluster-pred-image-centroid-created}
Figure 7.14 A centroid with images

After a new centroid is created, as shown in Figure 7.14 , the predicted cluster is at the top of the form along with its distance. The centroid interface is the same as ones created by non-image clusters. Everything described earlier in this section (subsection 7.2.1 ) applies.

7.2.2 Batch Centroid

BigML batch centroids allow you to make predictions for multiple instances simultaneously. All you need is the cluster you want to use to make predictions and a dataset containing the instances for which you want to calculate the nearest centroids. BigML will create a prediction for each instance in the dataset. Follow the steps detailed below to create a batch centroid:

  1. Select the batch centroid option under the cluster 1-click menu (see Figure 7.15 ) or the create batch centroid option from the pop up menu of the list view (see Figure 7.15 ).

    \includegraphics[]{images/cluster-predictions/batchpred-clusters-1-click}
    Figure 7.15 Batch centroid option from cluster 1-click menu
    \includegraphics[]{images/cluster-predictions/batchpred-clusters-pop-up}
    Figure 7.16 Batch centroid option from cluster pop up menu
  2. Select the dataset containing all the instances you want to predict. The instances should contain the input values for all the fields used by the cluster. Remember that BigML batch centroids can handle missing data in your prediction dataset only for categorical, text and items fields but not for numeric fields (see section 4.4 ). Instances with missing data for numeric fields will be ignored.

  3. Optionally, select the cluster you want to use for the prediction. BigML pre-selects the cluster you created the batch centroid from at step 1, but you can change it at any time in the batch centroid view by selecting another cluster from the custer selector displayed in the right pane.

    \includegraphics[]{images/cluster-predictions/batchpred-clusters-1}
    Figure 7.17 Select dataset for batch predictions
  4. Once the cluster and the dataset are selected, the batch centroid configuration options will appear along with a preview of the prediction output (a CSV file). The default format includes all your cluster fields and adds a last column with the cluster predictions.

    Note: BigML does not include by default the calculated distance from the centroid, so you will have to configure your output file to include that information. You can find a detailed explanation of all configuration options in section 7.3 .

    \includegraphics[]{images/cluster-predictions/batchpred-clusters-2}
    Figure 7.18 Configuration options displayed and output preview
  5. By default, BigML generates an output Dataset with your batch centroids that you can later find in your datasets section in the BigML Dashboard. This dataset can be helpful to analyze your results afterwards. This option is active by default, but you can deactivate it by clicking in the icon shown in Figure 7.19

    \includegraphics[]{images/cluster-predictions/batchpred-clusters-3}
    Figure 7.19 Create dataset from batch predictions
  6. Once you have your batch centroid configured, click in the green Centroid button to generate your batch prediction.

    \includegraphics[]{images/cluster-predictions/batchpred-clusters-3-5}
    Figure 7.20 Predicting batch centroids
  7. When the batch centroid is created, you will be able to download the CSV file containing all your dataset instances along with a prediction for each one of them. (See Figure 7.21 ).

    \includegraphics[]{images/cluster-predictions/batchpred-clusters-4}
    Figure 7.21 Download batch centroid output CSV file
  8. If you didn’t disable the option to create a dataset, explained in step 4, you will also be able to access the output dataset from the batch centroid view.

    \includegraphics[]{images/cluster-predictions/batchpred-clusters-5}
    Figure 7.22 View batch predictions output dataset

Batch Centroid with Images

BigML clusters can be trained from images using extracted image features (section 2.4 ). The input of a batch centroid is a dataset. So when creating a batch centroid with images, the dataset has to have the same image features used to train the cluster. The image features are in the dataset used to create the cluster.

\includegraphics[]{images/cluster-predictions/cluster-batchpred-images}
Figure 7.23 Batch centroid using an image dataset

As shown in Figure 7.23 , the input for the batch centroid is selected as predict-images resnet18, which is a dataset consisting of six images and contains image features extracted from a pre-trained CNN, ResNet-18.

Image features are configured at the source level. For more information about the image features and how to configure them, please refer to section Image Analysis of the Sources with the BigML Dashboard [ 22 ] .

For the rest of batch centroids with images, including batch centroid configuration options and output datasets, everything stated earlier in current section (subsection 7.2.2 ) applies.