Anomaly Detection with the BigML Dashboard
5 Visualizing Anomalies
BigML anomaly view is composed of two main blocks: TOP ANOMALIES (on the left) and DATA INSPECTOR (on the right). (See Figure 5.1 .)
TOP ANOMALIES is a list containing the top anomalous instances found in the dataset ranked by their anomaly scores. The list contains the top 10 anomalous instances by default, unless you configured the number of anomalies in advanced (see section 4.1 ). For each anomalous instance you get the following information:
Anomaly score: it is always a number between 0% and 100%. Higher values indicate more anomalous instances. Usually a score of 60% or higher is a solid basis for a given instance to be considered anomalous. Learn more about anomaly score in section 2.1 .
Note: the 60% threshold is no longer valid if the parameter Constraints is enabled since scores tend to be inflated. (See section 4.3 .)
Field importances: you can see a histogram indicating the contribution of the input fields to the anomaly score. Each field importance can range from 0% to 100%. Learn more about field importances in section 2.1 .
When you mouse over an instance from the TOP ANOMALIES list, you can see the values per field in the DATA INSPECTOR on the right. The fields in the DATA INSPECTOR are ordered by importance, so fields with contributing more to the anomaly score for that instance will appear at the top. At the end of the list, you will find the fields selected as ID fields and the text and items fields, which are not used to compute the anomaly score (see section 4.4 ). Apart from the instance values for each field, you can also see the field histogram and statistics. (You can find an explanation of fields statistics in the section Understanding Datasets of the Datasets document.)
By clicking on the icon in the top left of the DATA INSPECTOR, you can also see and copy the instance values in CSV and JSON format. (See Figure 5.4 .)