Cluster Analysis with the BigML Dashboard

4.7 Weights

This option allows you to assign individual weights to each instance by choosing a special weight field. This is useful when you have an unbalanced dataset, where data instances ofa given kind, e.g., those indicating a fraudulent transaction, are scarce in comparison to other ones. In such case, you may want to assign more weight to the scarce instances so they are equivalent to the abundant ones.

\includegraphics[]{images/clusters/cluster-weight-field}
Figure 4.10 Cluster options: weight field

The selected field must be numeric, and it must not contain any missing values. The field selected for weighting purposes will not be taken into account as an input to calculate cluster distances. You can select an existing field in your dataset, or you can create a new one to assign customized weights.

For example, below you can find a transactional dataset example for which we included a field called “Weight”. This field indicates that fraudulent instances weigh 10 more times than non fraudulent ones. BigML Flatline editor is a powerful tool for adding new fields to your dataset, such as a weight field. Another field that could be used in this example may be the transaction “Amount” so that transactions with higher amounts will have higher weights in the cluster.

Trans. ID

Products

Online

Amount $

Fraud

Weight

xxxxxx098

XYZGH

yes

3,218

FALSE

1

xxxxxx345

VBHGF

no

1,200

FALSE

1

xxxxxx123

UYFHJ

yes

5,000

FALSE

1

xxxxxx567

HSNKI

no

390

FALSE

1

xxxxxx789

SHSYA

yes

500

TRUE

10

xxxxxx093

DFSTU

yes

423

FALSE

1

xxxxxx012

TYISJ

yes

60,000

FALSE

1

xxxxxx342

SJSOP

no

789

FALSE

1

xxxxxx908

IOPKJ

no

9,450

FALSE

1

xxxxxx334

HIOPN

yes

50,678

TRUE

10

Table 4.1 Weight Field example for transactional dataset