Cluster Analysis with the BigML Dashboard

5.1 Cluster Visualization

BigML cluster visualization is available for all clusters. (See Figure 5.1 .)

\includegraphics[]{images/clusters/clusters-view}
Figure 5.1 Clusters visualization

It conveys a wealth of information through a natural representation of cluster groups:

  • Each cluster is associated to a circle, a standard representation for clusters.

    Different cluster groups have different colors associated.

    The distances between clusters in the visualization is related to the actual distances. However, it is not strictly to scale, as to ensure that all clusters fit the screen. (See Figure 5.1 .)

    If you click a cluster, all the clusters will be rearranged so the one you selected is in the middle. This is useful to show the relative distances between the selected cluster and the rest of clusters.

    Note: while the distances between clusters in the visualization is related to the actual distances, it is not strictly to scale. This ensures that all clusters fit the screen. (See Figure 5.1 .)

  • When you mouse over a cluster, additional information will be displayed in a tooltip and in a data panel on the right side.

    The tooltip includes the cluster name, ID, and the number of instances that it comprises.

    The data panel shows the distance histogram (see subsection 5.1.1 ) for the data points comprising that cluster as well as the centroid information subsection 5.1.2 ). You can freeze the panel by pressing shift; press escape to release the view.

If you click on the “Summary View” tab on the top right, the summary information of all the clusters will be listed in the panel on the right side. (See Figure 5.2 .)

\includegraphics[]{images/clusters/summary-view}
Figure 5.2 Summary view for clusters

The summary information includes the cluster name, its distance histogram and its number of instances. There may be more clusters than what are shown in the right panel, but if you mouse over a cluster in the diagram on the left, the panel will slide to show the corresponding cluster summary.

You can mouse over the cluster name bar to show its resource ID, or click on the pencil icon to edit the cluster name. In addition, you can mouse over a histogram to inspect its bins (ranges and instances), and mouse over the stats icon (sigma) to see the statistical summary of the instances in the cluster.

5.1.1 Distance Histogram

The distance histogram represents the distribution of distances from the cluster’s center to each of the points that fall into the Cluster neighborhood.

\includegraphics[]{images/clusters/dist-histogram}
Figure 5.3 Distance histogram for clusters

5.1.2 Centroid

The cluster centroid is the center of the cluster. It is computed by using the mean for each numeric field and the mode for categorical ones.

\includegraphics[]{images/clusters/centroid-data}
Figure 5.4 Centroid data inspector

For text and items fields, you will get a tag cloud where you can see the terms or items that minimize the average cosine distance between the centroid and the points in its neighborhood.

\includegraphics[]{images/clusters/centroid-text}
Figure 5.5 Centroid tag cloud for text and items fields

5.1.3 Cluster Visualization with Images

When clusters have images, their visualization is the same as described in the previous sections of this chapter. Additionally, there are image previews in the data panel.

\includegraphics[]{images/clusters/cluster-image-cluster-view}
Figure 5.6 Cluster view with images

As shown in Figure 5.6 , in the cluster view, when users mouse-over a cluster, its information will be displayed in a tooltip, as well as in the data panel on the right side. The data panel not only includes the distance histogram and the centroid information, it also presents a list of thumbnail images. The thumbnail images serve as previews of the images and can be changed by using the reloading icon next to them. Clicking on a thumbnail will bring up the close-up view of the image.

\includegraphics[]{images/clusters/cluster-image-summary-view}
Figure 5.7 Cluster summary view with images

In the summary view, as shown in Figure 5.7 , the data panel includes the cluster name, its distance histogram, its number of instances, as well as a list of thumbnail images as previews.