Cluster Analysis with the BigML Dashboard
6.1 Cluster Summary
The cluster summary gives you a summarized view of your cluster, including the following metrics: data distribution, cluster metrics, centroids, and intercentroid distance. (See Figure 6.2 .)
Data distribution: data distribution within the clusters: for each cluster, the percentage of data instances that belong to that cluster is given. The “global” cluster always includes all of the data instances, i.e. it accounts for 100% of them.
Cluster metrics: a summary of the distances between the data instances expressed in terms of various aggreagate measures:
total_ss: the total sum of squares of the distances between each data instance and the global centroid;
within_ss: the total sum of squares of the distances between each data instance and the centroid it belongs to;
between_ss: the total sum of squares of the distances between each centroid and the global centroid;
ratio_ss: the ratio of between_ss and total_ss. This is a measure of how well your data instances can be grouped into clusters.
Centroids: general statistics for each of the idnetified clusters, including the global one.
Intercentroid distance: distribution of distances between centroids.