Cluster Analysis with the BigML Dashboard

4.6 Scale fields & auto-scaled fields

Datasets often contain fields with very different magnitudes. For example, two fields such as age and salary. Since clusters compute the Euclidean distance between numeric values, salary will dominate the clustering.

BigML provides two options to re-scale your fields (Figure 4.9 ):

  • Scale fields: set a specific scale for your fields using an integer multiplier. This will increase their influence as many times as the number you set to calculate the clusters. If the auto-scaled option is enabled, it will be applied first so you can control how much weight you assign to a particular field relative to others. For example, if you want the age to be twice as influential as salary you just need to set auto-scaled to true and assign a multiplier of 2 to the age field.

  • Auto-scaling: when the auto-scaled option is enabled, all the numeric fields will be scaled so that their standard deviations are 1. This ensures each field will have equal influence.

\includegraphics[]{images/clusters/clusters-scaling}
Figure 4.9 Cluster options: field scaling