Cluster Analysis with the BigML Dashboard

4.3 Critical value

When you select the G-means algorithm, you can choose a critical value. (See Figure 4.5 .)

\includegraphics[]{images/clusters/cluster-critical}
Figure 4.5 Critical value options

The critical value determines how “strict” the G-means algorithm is when identifying clusters. When you select G-means, BigML iteratively tests new clusters looking for Gaussian distributions in the clusters’ neighborhoods. If a new cluster does not pass the test, it is split into two new clusters. The critical value sets how strict this statistical test should be when deciding if the underlying data looks Gaussian. A critical value of 1 means the data must look very Gaussian to pass the test, so it can lead to more clusters being detected. Alternatively, higher critical values loosens the Gaussian constraint and leads to fewer clusters.

By default, BigML uses a value of 5, which seems to work well in most cases. A range between 1 and 10 is acceptable.