Anomaly Detection with the BigML Dashboard

2 Understanding Anomalies

This chapter describes internal details about the BigML anomalies, providing the foundations to understand the configuration options to create an anomaly detector.

Anomaly detection tasks try to find data points in a dataset following patterns that significantly differ from the rest of the instances. To achieve this, BigML uses a state-of-the-art algorithm called Isolation Forest, explained in the following section (section 2.1 ). An anomaly score is calculated for each of the anomalous instances along with an indicator of each input field contribution, known as field importance.

An advantage of this method, is that BigML anomalies can support categorical and numeric fields as well as missing values as input data (explained in section 2.2 ).

At the end of this chapter, you can find an example illustrating how to interpret anomalies in BigML (section 2.3 ).