Anomaly Detection with the BigML Dashboard

4.3 Constraints

Constraints parameter is an experimental option that makes Isolation Forest trees more sensitive to anomalous data. Constraints add more predicates to a node split when building the trees for the Isolation Forest, so an instance gets isolated earlier, thus anomaly scores are higher.

For example, in a normal situation, if constraints are disabled, each tree split yields two branches with one predicate each:

  • Monthly-salary > $2,000

  • Monthly-salary =< $2,000

By contrast, if constraints are enabled, each branch will have extra predicates picked randomly:

  • Monthly-salary > $2,000 AND Occupation=employed

  • Monthly-salary =< $2,000 AND Occupation=student OR Occupation=employed

If one instance has a Monthly-salary=4,000 and Occupation=student, in the first case, it meets the rule Monthly-salary > $2,000, so at least another split is needed in order to isolate the instance. However, in the second case, it will not meet either branch rule, so it will be isolated faster than in the first case, hence its anomaly score will be higher.

This option tends to inflate the anomaly scores and it is more costly in terms of computational costs, but it can also make the trees more effective at flagging anomalous data, especially with categorical data.

\includegraphics[]{images/an-constrains}
Figure 4.4 Anomalies constraints