Cluster Analysis with the BigML Dashboard

15 Takeaways

This document covered Clusterings in detail. We conclude it with a list of key points:

  • BigML Clusters can learn how your data instances group together based on their similarity.

  • Each cluster group is represented by its center, called Centroids.

  • To build a cluster you just need a dataset. (See Figure 15.1 ).

  • A cluster can be an input to a prediction, to a batch prediction, to a dataset, or to a BigML model. (See Figure 15.1 ).

  • Create centroids or batch centroids from a cluster to know to which instance group previously unseen data instances belong.

  • You can also create clusters using BigML REST API or the BigML bindings for your language of choice.

  • Create a BigML model or a dataset from a cluster to further analyze the instances that belong to any given group of instances discovered by training the cluster. For example, a BigML model may help you identify which fields are more relevant in determining whether a data instance should be considered member of a cluster group.

  • Numeric fields are automatically scaled to prevent their different magnitudes from biasing the calculation.

  • BigML provides two different methods to do the clustering: K-means and G-means. Use G-means when you do not know how many cluster groups can be found.

  • When you create a BigML Cluster from a dataset, you can define a number of options, such as the number K of clusters (K-means) or the critical value (G-means), field scaling and weighting, and sampling.

  • BigML visualizes clusters through circles of different colors that represent found centroids. Each circle is sized according to the number of instances that belong to the corresponding cluster group.

  • You can use BigML Clusters to calculate the nearest centroid to a given data instance or to a number of instances.

  • You can download clusters in several languages, including Python, JSON PML, and Node.js, to use for local computation.

  • At any time you can update a cluster’s descriptive information, move a cluster to a different project, rename it, or delete it permanently.

\includegraphics[]{images/clusters/cluster-workflows}
Figure 15.1 Cluster Workflows