Cluster Analysis with the BigML Dashboard

2.3 On the repeatability of Clusters

As mentioned, standard K-means provides results that vary strongly from run to run due to the random selection of initial clusters. However, BigML clusters strive to ensure that obtained results are repeatable, to some extent. In particular, when applying K-means||, BigML will ensure that the same initial cluster selection is done for each given dataset.

As an end effect, if you choose the K-means algorithm, BigML Clusters are repeatable when you use the same dataset to create them. Alternatively, if you create different datasets from the same datasource, no guarantee can be provided as to the repeatability of the results.

The same considerations apply to BigML G-means repeatability.