Cluster Analysis with the BigML Dashboard

7.4 Visualizing Cluster Predictions

Centroid visualization changes depending on whether you are predicting one single instance centroid or multiple instances using the batch centroid option (see subsection 7.4.1 ).

7.4.1 Single Centroid

For single instance predictions, you can find the nearest centroid at the top of the form along with its distance from the input data. (See Figure 7.26 .) You can change the value of the displayed input fields any time to have your prediction recalculated in real time.

\includegraphics[]{images/cluster-predictions/single-pred}
Figure 7.26 Single predictions view

7.4.2 Batch Predictions

For batch centroids, you always get a CSV file and an optional output dataset.

Output CSV File

From the batch centroid view, you can access the CSV file containing your predictions for each of your dataset instances in the last column (see Figure 7.27 .)

\includegraphics[]{images/cluster-predictions/batchpred-clusters-4}
Figure 7.27 Download batch centroid output CSV file

You can configure several options to customize your CSV file including the separator for the columns, the name of your centroid and distance column, the dataset fields you want to include, and whether you want to include the headers for your fields. You can find a detailed explanation of those options in subsection 7.3.2 .

Note: by default, BigML does not include the centroid distance in your output file. Click the relevant option from the output settings panel if you want to include it.

See an output CSV file example in Figure 7.28 where the two last columns contain the cluster and the distance for each instance.

Pregnancies,Glucose,Blood pressure,BMI,Age,cluster,distance
  3,78,50,32,26,Cluster 3,0.40608
  7,100,0,0,30,32,Cluster 6,0.35249 1,103,30,38,33,Cluster 3,0.53011
  1,97,66,15,17,22,Cluster 5,0.24655 5,117,92,0,38,Cluster 2,0.25536
  10,122,78,31,45,Cluster 1,0.36629 11,138,76,0,35,Cluster 2,0.37024
  3,180,64,25,26,Cluster 3,0.52466 7,133,84,0,37,Cluster 2,0.36563
Figure 7.28 An example of a batch centroid CSV file

Output Dataset

By default BigML automatically creates a dataset out of your batch centroid. (See subsection 7.3.2 .) You can access your output dataset from the batch centroid view as shown in Figure 7.30 .

\includegraphics[]{images/cluster-predictions/batchpred-clusters-5}
Figure 7.29 View batch centroid output dataset

In the output dataset you can find an additional field (named by default “cluster”) containing the nearest centroid for each one of your instances. If you configured your batch centroid to include the distance you will be able to find it in the last field of your output dataset as shown in Figure 7.30 .

\includegraphics[]{images/cluster-predictions/batchpred-clusters-dataset2}
Figure 7.30 Batch centroid output dataset

Batch Centroid 1-Click Actions

From the batch centroid view you can perform the following actions shown in Figure 7.31

  • batch centroid again: this option will redirect you to the batch centroid creation view where you will have the same cluster and prediction dataset already selected. This option will rapidly create the batch centroid using a different configuration.

  • batch centroid with another dataset: this option is an easy way to create a batch centroid using the same cluster and a different dataset.

  • batch centroid using another cluster: this option will easily create a batch centroid using the same dataset and a different cluster.

  • new batch centroid: this option will redirect you to the batch centroid creation view where you can select a prediction dataset and a cluster to create your batch centroid.

\includegraphics[]{images/cluster-predictions/batchpred-1-click}
Figure 7.31 Batch centroid 1-click actions