Cluster Analysis with the BigML Dashboard

Cluster Analysis with the BigML Dashboard
Cluster Configuration Options
Clustering algorithms

4.1 Clustering algorithms

BigML provides two distinct algorithms for clustering. Choose between K-means and G-means [ 32 ] to create your cluster. (See Figure 4.3 .)

\includegraphics[]{images/clusters/cluster-algorithm-options} — Figure 4.3 Cluster options: clustering algorithm

4.1.1 K-means algorithm

Using the K-means algorithm requires that you already know the number of cluster groups (the K in K-means) that are present in your dataset. If you do not know it, an inappropriate choice of K may yield poor results.

The maximum number of clusters that you can specify is 300.

4.1.2 G-means algorithm

If you do not know which is the optimal number of cluster groups present in your dataset, you can have BigML discover it by using G-means. G-means solves the problem of trying to find the number of clusters by iteratively taking existing clusters and testing whether the Cluster neighborhood appears Gaussian in its distribution (based on Anderson-Darling tests). See this blogpost for more information.

G-means will yield a maximum of 128 clusters.